This example shows how to load and execute a HIP module in runtime without linking it to the rest of the code during compilation.
- Set up the name of the compiled module code object file
(*.co)
, located in the same directory. - Define kernel launch parameters.
- Initialize input and output vectors in host memory.
- Allocate arrays and copy the input and output vectors to the device memory.
- Get the module path from the module file name.
- Load module by
hipModuleLoad()
. - Fetch a reference to the kernel by
hipModuleGetFunction()
. - Create and fill the array with kernel arguments.
- Launch the kernel on the default stream by
hipModuleLaunchKernel()
. - Copy the result back to the host.
- Free input and output arrays on device memory.
- Compare input and output vectors. The result of the comparison is printed to standard output.
The kernel module needs to be compiled as a non-linked device code object file (*.co
), in one of the following ways:
- hipcc --genco --offload-arch=[TARGET GPU] [INPUT FILE] -o [OUTPUT FILE]
- clang++ --cuda-device-only --offload-arch=[TARGET GPU] [INPUT FILE] -o [OUTPUT FILE]
where the parameters are:
- [TARGET GPU]
: GPU architecture (e.g. gfx908
or gfx90a:xnack-
).
- [INPUT FILE]
: Name of the file containing kernels (e.g. module.hip
).
- [OUTPUT FILE]
: Name of the generated code object file (e.g. module.co
).
The main.hip
example file is compiled similarly as in the other examples.
-
The
hipModuleLoad(hipModule_t *module, const char *file_name)
will load a HIP module in execution time from the path that is given as an input parameter or return an error. -
The
hipModuleGetFunction(hipFunction_t *kernel_function, hipModule_t module, const char *kernel_name)
will fetch a reference to the__global__
kernel function in the HIP module. -
hipModuleLaunchKernel
will launch kernel function on the device. The input parameters are:hipFunction_t kernel_function
Kernel function.unsigned int gridDimX
: Number of blocks in the dimension X.unsigned int gridDimY
: Number of blocks in the dimension Y.unsigned int gridDimZ
: Number of blocks in the dimension Z.unsigned int blockDimX
: Number of threads in the dimension X in a block.unsigned int blockDimY
: Number of threads in the dimension Y in a block.unsigned int blockDimZ
: Number of threads in the dimension Z in a block.unsigned int sharedMemBytes
: Amount of dynamic shared memory that will be available to each workgroup, in bytes. (Not used in this example.)hipStream_t stream
: The device stream, on which the kernel should be dispatched. (hipStreamDefault
int this example.)void **kernelParams
: Pointer to the arguments needed by the kernel. Note that this parameter is not yet implemented, and thus the extra parameter (the last one described in this list) should be used to pass arguments to the kernel. (Therebynullptr
is used in the example.)void **extra
: Pointer to all extra arguments passed to the kernel. They must be in the memory layout and alignment expected by the kernel. The list of arguments must end withHIP_LAUNCH_PARAM_END
.
__global__
threadIdx
hipGetLastError
hipGetSymbolAddress
hipGetSymbolSize
hipMalloc
hipMemcpy
hipMemcpyHostToDevice
hipMemcpyDeviceToHost
hipFree
hipModuleLoad
hipModuleGetFunction
hipModuleLaunchKernel