Memory transfer and kernel execution are the most important parameters in parallel computing, especially in high performance computing (HPC) and machine learning. Memory bottlenecks are the main problem why we are not able to get the highest performance, therefore obtaining the memory transfer timing and kernel execution timing plays key role in application optimization.
This example showcases measuring kernel and memory transfer timing using HIP events. The kernel under measurement is a trivial one that performs square matrix transposition.
- A number of parameters are defined that control the problem details and the kernel launch.
- Input data is set up in host memory.
- The necessary amount of device memory is allocated.
- A pair of
hipEvent
objects are defined and initialized. - Time measurement is started on the
start
event. - Memory transfer from host to device of the input data is performed.
- The time measurement is stopped using the
stop
event. The execution time is calculated via thestart
andstop
events and it is printed to the standard output. - The kernel is launched, and its runtime is measured similarly using the
start
andstop
events. - The result data is copied back to the host, and the execution time of the copy is measured similarly.
- The allocated device memory is freed and the event objects are released.
- The result data is validated by comparing it to the product of the reference (host) implementation. The result of the validation is printed to the standard output.
- The
hipEvent_t
type defines HIP events that can be used for synchronization and time measurement. The events must be initialized usinghipEventCreate
before usage and destroyed usinghipEventDestroy
after they are no longer needed. - The events have to be queued on a device stream in order to be useful, this is done via the
hipEventRecord
function. The stream itself is a list of jobs (memory transfers, kernel executions and events) that execute sequentially. When the event is processed by the stream, the current machine time is recorded to the event. This can be used to measure execution times on the stream. In this example, the default stream is used. - The time difference between two recorded events can be accessed using the function
hipEventElapsedTime
. - An event can be used to synchronize the execution of the jobs on a stream with the execution of the host. A call to
hipEventSynchronize
blocks the host until the provided event is scheduled on its stream.
threadIdx
,blockIdx
,blockDim
hipMalloc
hipFree
hipMemcpy
hipEventCreate
hipEventRecord
hipEventElapsedTime
hipEventSynchronize