-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] herd runtime parameter support for npu #585
base: main
Are you sure you want to change the base?
Conversation
e583a78
to
318d6ed
Compare
// FIXME: setting the insertion point to the end is a hack for | ||
// RTP POC, so that the sync is after the rtp | ||
// writes and the herd lock aquire. | ||
// builder.setInsertionPoint(dma->getBlock()->getTerminator()); | ||
builder.setInsertionPointAfter(dma); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: comment out last line and uncomment previous to enable, otherwise end up with runtime sequence like,
aiex.runtime_sequence @mul(%arg0: memref<1024xi32>, %arg1: memref<1024xi32>, %arg2: memref<1024xi32>) {
aiex.npu.dma_memcpy_nd(0, 0, %arg0[0, 0, 0, 0][1, 1, 2, 512][0, 0, 512, 1]) {id = 0 : i64, metadata = @airMemcpyId4} : memref<1024xi32>
aiex.npu.dma_memcpy_nd(0, 0, %arg1[0, 0, 0, 0][1, 1, 2, 512][0, 0, 512, 1]) {id = 1 : i64, metadata = @airMemcpyId5} : memref<1024xi32>
aiex.npu.dma_memcpy_nd(0, 0, %arg2[0, 0, 0, 0][1, 1, 2, 512][0, 0, 512, 1]) {id = 2 : i64, metadata = @airMemcpyId6} : memref<1024xi32>
aiex.npu.sync {channel = 0 : i32, column = 0 : i32, column_num = 1 : i32, direction = 0 : i32, row = 0 : i32, row_num = 1 : i32}
aiex.npu.rtp_write(@__air_herd_rtp_0_2, 0, 32)
aiex.npu.write32 {address = 126976 : ui32, column = 0 : i32, row = 2 : i32, value = 1 : ui32}
}
where the sync happens before the herd runs (i.e. before the write32 releasing the herd lock)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better solution is to require the source program to be correct w.r.t. synchronization. Unfortunately mlir-air discards that synchronization information before this point in the lowering, and tries to enforce a synchronization on writes to external memory here instead.
Add lowering of air.herd_load to npu.rtp_write
work in progress to lower
air.herd
integer operands to mlir-aie runtime parameters. i.eaie.buffer
+npu.rtp_write
+memref.load