-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intrinsics support for Zihintntl extension #30
Comments
The specification requires the memory access to be the "immediately subsequent instruction". I see two possible solutions. Proposal one: provide NTL loads and stores (similar to the atomics builtins):
However, this will probably only work reasonably well for single-GPR/FPR-memory transfers. E.g. vector memory accesses probably need a _ntl variant of the vector load/store intrinsics. Proposal two: no intrinsics, but a NTL function attribute that does not block inlining
|
I was also envisioning the intrinsic would emit the load or store in addition to the HINT. This matches how the non-temporal store intrinsics work on x86: the intrinsic actually performs the store, rather than annotating a separate assignment. |
That sounds good to me.
I don't like idea of function attribute approach but that inspire me another possible solution for that: variable attribute:
and any load store with pointer with this attribute will add a hint instruction. |
I like the pointer attribute approach, if it's feasible to implement. And of course the x86-style intrinsic can be implemented using the pointer attribute approach with a simple wrapper function. |
X86 has MOVNTI instruction for non-temporal store of GPR as part of SSE2. |
Yes, that's a better idea than a function attribute! |
We need to make sure the implement effort on both compiler for variable attribute, I saw load/store in LLVM IR has encode non-temporal, but we might need to extend that to able to express different domain, so I think we need introduce new intrinsic for NTL load/store at first stage. https://llvm.org/docs/LangRef.html#load-instruction
|
In any case, it seems we have a path to some solution. |
SiFive folks is implementing builtin now. |
Looks like we should also wire this up to the Here's the equivalent patterns for x86:
|
Proposal for the intrinsic: #47 |
The Zihintntl extension has recently passed AR. The spec is here: https://github.com/riscv/riscv-isa-manual/blob/10eea63205f371ed649355f4cf7a80716335958f/src/zihintntl.tex
During the AR, we wanted to raise the issue of whether and how the extension would be exposed in the RISC-V C API. Can y'all ponder the following and opine?
In x86, for example,
_mm_stream_pi
(https://github.com/gcc-mirror/gcc/blob/e75da2ace6b6f634237259ef62cfb2d3d34adb10/gcc/config/i386/xmmintrin.h#L1279-L1291) is roughly equivalent toc.ntl.all; sd
in RISC-V.(ARMv8 has LDNP/STNP instructions, but I couldn't find an intrinsics mapping for them.)
Zihintntl is more general than x86's solution in a few dimensions:
ntl.all
, AFAIK)With this in mind, the questions for the RISC-V C API folks are: how do we expose this facility in the RISC-V C API? How much of its generality do we expose? Do you foresee any impediments?
cc @kito-cheng @ptomsich
The text was updated successfully, but these errors were encountered: