Skip to content

Commit

Permalink
Memory Allocation Tracking (#1142)
Browse files Browse the repository at this point in the history
* Initial commit: Need to implement wrapper function to collect data and test that wrapper function is correctly replacing core HSA functions

* Attempted to implement wrapper implementation for hsa memory allocation functions. Need to modify generate record files and test if implementation is working as expected

* Debugging and implementing generateCSV function

* Memory allocation size and starting address outputted to csv and json file formats

* Formatting

* Initial setup for OTF2 and Perfetto generation

* Collecting agent id for memory_allocation and formatting

* Modified memory_allocation.cpp to set up code for AMD_EXT commands

* Support for memory_pool_allocate added

* Removed accidently added file

* Made flag optional and added more OTF2 and Perfetto code. Needs testing to ensure perfetto and OTF2 works

* Formatting

* Fixed perfetto and otf2 output

* Fixed flag issue due to incorrect buffer use

* Updated documentation

* Small cleaning and comments

* Added test for HSA memory allocation tracing

* Fixed summary test validation errors due to allocation tracing. Added type to location_base to create unique event ids for allocation due to OTF2 trace error

* Decreased lower limit of hip calls for test

* Modified summary tests to vary number of allocate requests

* Minor fixes to address comments. Still need to address OTF2 comments

* Fix docs and changed OTF2 to use enum for type specified in location_base construction

* Fixed schema error

* Added vmem command tracking. Need to add test

* Updated test to work with vmem command and updated generateCSV to output int instead of hex string.

* OTF2 enum update and mispelling fix

* CI does not support Virtual Memory API. Removed vmem test. Will add back if CI is modifed to suport vmem API

* Update CMakeLists.txt for memory allocation test

* Updated summary test

* Minor fixes to address comments

* Moved domain_type.hpp enum to before LAST

* Fixed compile errors and formatting

* Fixed stats summary domain name error

* Added rocprofv3 test

* Page migration test fix

* Undo page migration test changes. Failures do not appear to have to do with memory allocation
  • Loading branch information
itrowbri authored Nov 19, 2024
1 parent 0d764eb commit 3bd7773
Show file tree
Hide file tree
Showing 53 changed files with 2,388 additions and 135 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ Full documentation for ROCprofiler-SDK is available at [rocm.docs.amd.com/projec
- Start and end timestamp columns to the counter collection csv output
- Check to force tools to initialize context id with zero
- Support to specify hardware counters for collection using rocprofv3 as `rocprofv3 --pmc [COUNTER [COUNTER ...]]`
- Memory Allocation Tracing

### Changed

Expand Down
12 changes: 10 additions & 2 deletions source/bin/rocprofv3.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,13 +141,13 @@ def add_parser_bool_argument(gparser, *args, **kwargs):
aggregate_tracing_options,
"-r",
"--runtime-trace",
help="Collect tracing data for HIP runtime API, Marker (ROCTx) API, RCCL API, Memory operations (copies and scratch), and Kernel dispatches. Similar to --sys-trace but without tracing HIP compiler API and the underlying HSA API.",
help="Collect tracing data for HIP runtime API, Marker (ROCTx) API, RCCL API, Memory operations (copies, scratch, and allocation), and Kernel dispatches. Similar to --sys-trace but without tracing HIP compiler API and the underlying HSA API.",
)
add_parser_bool_argument(
aggregate_tracing_options,
"-s",
"--sys-trace",
help="Collect tracing data for HIP API, HSA API, Marker (ROCTx) API, RCCL API, Memory operations (copies and scratch), and Kernel dispatches.",
help="Collect tracing data for HIP API, HSA API, Marker (ROCTx) API, RCCL API, Memory operations (copies, scratch, and allocations), and Kernel dispatches.",
)

basic_tracing_options = parser.add_argument_group("Basic tracing options")
Expand All @@ -173,6 +173,11 @@ def add_parser_bool_argument(gparser, *args, **kwargs):
"--memory-copy-trace",
help="For collecting Memory Copy Traces. This was part of HIP and HSA traces in previous rocprof versions but is now a separate option",
)
add_parser_bool_argument(
basic_tracing_options,
"--memory-allocation-trace",
help="For collecting Memory Allocation Traces. Displays starting address, allocation size, and agent where allocation occurred.",
)
add_parser_bool_argument(
basic_tracing_options,
"--scratch-memory-trace",
Expand Down Expand Up @@ -686,6 +691,7 @@ def _write_env_value():
"marker_trace",
"kernel_trace",
"memory_copy_trace",
"memory_allocation_trace",
"scratch_memory_trace",
"rccl_trace",
):
Expand All @@ -697,6 +703,7 @@ def _write_env_value():
"marker_trace",
"kernel_trace",
"memory_copy_trace",
"memory_allocation_trace",
"scratch_memory_trace",
"rccl_trace",
):
Expand Down Expand Up @@ -724,6 +731,7 @@ def _write_env_value():
["rccl_trace", "RCCL_API_TRACE"],
["kernel_trace", "KERNEL_TRACE"],
["memory_copy_trace", "MEMORY_COPY_TRACE"],
["memory_allocation_trace", "MEMORY_ALLOCATION_TRACE"],
["scratch_memory_trace", "SCRATCH_MEMORY_TRACE"],
]
).items():
Expand Down
4 changes: 4 additions & 0 deletions source/docs/data/memory_allocation_trace.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"Kind","Operation","Agent_Id","Allocation_Size","Starting_Address","Correlation_Id","Start_Timestamp","End_Timestamp"
"MEMORY_ALLOCATION","MEMORY_ALLOCATION_ALLOCATE",0,1024,140341497356288,1,65788054621500,65788055678893
"MEMORY_ALLOCATION","MEMORY_ALLOCATION_ALLOCATE",0,1024,140341497348096,1,65788055691832,65788056666844
"MEMORY_ALLOCATION","MEMORY_ALLOCATION_ALLOCATE",0,1024,140341497339904,1,65788056672061,65788057643457
57 changes: 51 additions & 6 deletions source/docs/how-to/using-rocprofv3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,11 @@ Here is the sample of commonly used ``rocprofv3`` command-line options. Some opt
- Output control

* - ``-r`` \| ``--runtime-trace``
- Collects HIP (runtime), memory copy, marker, scratch memory, and kernel dispatch traces.
- Collects HIP (runtime), memory copy, memory allocation, marker, scratch memory, and kernel dispatch traces.
- Application Tracing

* - ``-s`` \| ``--sys-trace``
- Collects HIP, HSA, memory copy, marker, scratch memory, and kernel dispatch traces.
- Collects HIP, HSA, memory copy, memory allocation, marker, scratch memory, and kernel dispatch traces.
- Application Tracing

* - ``--hip-trace``
Expand All @@ -78,6 +78,10 @@ Here is the sample of commonly used ``rocprofv3`` command-line options. Some opt
- Collects memory copy traces.
- Application tracing

* - ``--memory-allocation-trace``
- Collects memory allocation traces.
- Application tracing

* - ``--scratch-memory-trace``
- Collects scratch memory operations traces.
- Application tracing
Expand Down Expand Up @@ -356,6 +360,30 @@ Here are the contents of ``memory_copy_trace.csv`` file:

For the description of the fields in the output file, see :ref:`output-file-fields`.

Memory allocation trace
+++++++++++++++++++++++++

To trace memory allocations during the application run, use:

.. code-block:: shell
rocprofv3 –-memory-allocation-trace -- < app_path >
The above command generates a ``memory_allocation_trace.csv`` file prefixed with the process ID.

.. code-block:: shell
$ cat 6489_memory_allocation_trace.csv
Here are the contents of ``memory_allocation_trace.csv`` file:

.. csv-table:: Memory allocation trace
:file: /data/memory_allocation_trace.csv
:widths: 10,10,10,10,10,10,20,20
:header-rows: 1

For the description of the fields in the output file, see :ref:`output-file-fields`.

Runtime trace
+++++++++++++++

Expand All @@ -374,7 +402,7 @@ memory operations (copies and scratch).
rocprofv3 –-runtime-trace -- < app_relative_path >
Running the above command generates ``hip_api_trace.csv``, ``kernel_trace.csv``, ``memory_copy_trace.csv``, ``scratch_memory_trace.csv``,and ``marker_api_trace.csv`` (if ``ROCTx`` APIs are specified in the application) files prefixed with the process ID.
Running the above command generates ``hip_api_trace.csv``, ``kernel_trace.csv``, ``memory_copy_trace.csv``, ``scratch_memory_trace.csv``, ``memory_allocation_trace.csv``, and ``marker_api_trace.csv`` (if ``ROCTx`` APIs are specified in the application) files prefixed with the process ID.

System trace
++++++++++++++
Expand All @@ -385,7 +413,7 @@ This is an all-inclusive option to collect all the above-mentioned traces.
rocprofv3 –-sys-trace -- < app_relative_path >
Running the above command generates ``hip_api_trace.csv``, ``hsa_api_trace.csv``, ``kernel_trace.csv``, ``memory_copy_trace.csv``, and ``marker_api_trace.csv`` (if ``ROCTx`` APIs are specified in the application) files prefixed with the process ID.
Running the above command generates ``hip_api_trace.csv``, ``hsa_api_trace.csv``, ``kernel_trace.csv``, ``memory_copy_trace.csv``, ``memory_allocation_trace.csv``, and ``marker_api_trace.csv`` (if ``ROCTx`` APIs are specified in the application) files prefixed with the process ID.

Scratch memory trace
++++++++++++++++++++++
Expand Down Expand Up @@ -464,6 +492,8 @@ Properties
Dispatch Traces.
- **``memory_copy_trace``** *(boolean)*: For Collecting Memory
Copy Traces.
- **``memory_allocation_trace``** *(boolean)*: For Collecting Memory
Allocation Traces.
- **``scratch_memory_trace``** *(boolean)*: For Collecting
Scratch Memory operations Traces.
- **``stats``** *(boolean)*: For Collecting statistics of enabled
Expand All @@ -479,8 +509,8 @@ Properties
- **``hsa_image_trace``** *(boolean)*: For Collecting HSA API
Traces (Image-extension API).
- **``sys_trace``** *(boolean)*: For Collecting HIP, HSA, Marker
(ROCTx), Memory copy, Scratch memory, and Kernel dispatch
traces.
(ROCTx), Memory copy, Memory allocation, Scratch memory, and
Kernel dispatch traces.
- **``mangled_kernels``** *(boolean)*: Do not demangle the kernel
names.
- **``truncate_kernels``** *(boolean)*: Truncate the demangled
Expand Down Expand Up @@ -990,3 +1020,18 @@ Properties
- **`src_agent_id`** *(object, required)*: Source Agent ID.
- **`handle`** *(integer, required)*: Handle of the agent.
- **`bytes`** *(integer, required)*: Bytes copied.
- **`memory_allocation`** *(array)*: Memory allocation records.
- **Items** *(object)*
- **`size`** *(integer, required)*: Size of the Marker API record.
- **`kind`** *(integer, required)*: Kind of the Marker API.
- **`operation`** *(integer, required)*: Operation of the Marker API.
- **`correlation_id`** *(object, required)*: Correlation ID information.
- **`internal`** *(integer, required)*: Internal correlation ID.
- **`external`** *(integer, required)*: External correlation ID.
- **`start_timestamp`** *(integer, required)*: Start timestamp.
- **`end_timestamp`** *(integer, required)*: End timestamp.
- **`thread_id`** *(integer, required)*: Thread ID.
- **`agent_id`** *(object, required)*: Agent ID.
- **`handle`** *(integer, required)*: Handle of the agent.
- **`starting_address`** *(string, required)*: Starting address of allocation.
- **`allocation_size`** *(integer, required)*: Size of allocation.
84 changes: 84 additions & 0 deletions source/docs/rocprofv3-schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -1374,6 +1374,90 @@
"bytes"
]
}
},
"memory_allocation": {
"type": "array",
"description": "Memory allocation records.",
"items": {
"type": "object",
"properties": {
"size": {
"type": "integer",
"description": "Size of the Marker API record."
},
"kind": {
"type": "integer",
"description": "Kind of the Marker API."
},
"operation": {
"type": "integer",
"description": "Operation of the Marker API."
},
"correlation_id": {
"type": "object",
"description": "Correlation ID information.",
"properties": {
"internal": {
"type": "integer",
"description": "Internal correlation ID."
},
"external": {
"type": "integer",
"description": "External correlation ID."
}
},
"required": [
"internal",
"external"
]
},
"start_timestamp": {
"type": "integer",
"description": "Start timestamp."
},
"end_timestamp": {
"type": "integer",
"description": "End timestamp."
},
"thread_id": {
"type": "integer",
"description": "Thread ID."
},
"agent_id": {
"type": "object",
"description": "Agent ID.",
"properties": {
"handle": {
"type": "integer",
"description": "Handle of the agent."
}
},
"required": [
"handle"
]
},
"starting_address": {
"type": "integer",
"description": "Starting address of allocation"
},
"allocation_size": {
"type": "integer",
"description": "allocation_size"
}
},
"required": [
"size",
"kind",
"operation",
"correlation_id",
"start_timestamp",
"end_timestamp",
"thread_id",
"agent_id",
"starting_address",
"allocation_size"
]
}
}
}
}
Expand Down
7 changes: 6 additions & 1 deletion source/docs/rocprofv3_input_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,11 @@
"description": "For Collecting Memory Copy Traces"
},

"memory_allocation_trace": {
"type": "boolean",
"description": "For Collecting Memory Allocation Traces"
},

"scratch_memory_trace": {
"type": "boolean",
"description": "For Collecting Scratch Memory operations Traces"
Expand Down Expand Up @@ -98,7 +103,7 @@

"sys_trace" : {
"type": "boolean",
"description": "For Collecting HIP, HSA, Marker (ROCTx), Memory copy, Scratch memory, and Kernel dispatch traces"
"description": "For Collecting HIP, HSA, Marker (ROCTx), Memory copy, Memory allocation, Scratch memory, and Kernel dispatch traces"
},

"mangled_kernels": {
Expand Down
22 changes: 22 additions & 0 deletions source/include/rocprofiler-sdk/buffer_tracing.h
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,28 @@ typedef struct
/// ::rocprofiler_memory_copy_operation_t)
} rocprofiler_buffer_tracing_memory_copy_record_t;

/**
* @brief ROCProfiler Buffer Memory Allocation Tracer Record.
*/
typedef struct
{
uint64_t size; ///< size of this struct
rocprofiler_buffer_tracing_kind_t kind;
rocprofiler_memory_allocation_operation_t operation;
rocprofiler_correlation_id_t correlation_id; ///< correlation ids for record
rocprofiler_thread_id_t thread_id; ///< id for thread that triggered copy
rocprofiler_timestamp_t start_timestamp; ///< start time in nanoseconds
rocprofiler_timestamp_t end_timestamp; ///< end time in nanoseconds
rocprofiler_agent_id_t agent_id; ///< agent information for memory allocation
uint64_t starting_address; ///< starting address for memory allocation
uint64_t allocation_size; ///< size for memory allocation
/// @var kind
/// @brief ::ROCPROFILER_BUFFER_TRACING_MEMORY_ALLOCATION
/// @var operation
/// @brief Specification of the memory allocation function (@see
/// ::rocprofiler_memory_allocation_operation_t
} rocprofiler_buffer_tracing_memory_allocation_record_t;

/**
* @brief ROCProfiler Buffer Kernel Dispatch Tracer Record.
*/
Expand Down
13 changes: 13 additions & 0 deletions source/include/rocprofiler-sdk/callback_tracing.h
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,19 @@ typedef struct
uint64_t bytes; ///< bytes copied
} rocprofiler_callback_tracing_memory_copy_data_t;

/**
* @brief ROCProfiler Memory Copy Allocation Tracer Record.
*/
typedef struct
{
uint64_t size; ///< size of this struct
rocprofiler_timestamp_t start_timestamp; ///< start time in nanoseconds
rocprofiler_timestamp_t end_timestamp; ///< end time in nanoseconds
rocprofiler_agent_id_t agent_id; ///< agent id for memory allocation
uint64_t starting_address; ///< starting address for memory allocation
uint64_t allocation_size; ///< size of memory allocation
} rocprofiler_callback_tracing_memory_allocation_data_t;

/**
* @brief ROCProfiler Scratch Memory Callback Data.
*/
Expand Down
2 changes: 2 additions & 0 deletions source/include/rocprofiler-sdk/cxx/hash.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ ROCPROFILER_CXX_SPECIALIZE_HANDLE_HASHER(rocprofiler_callback_thread_t)
ROCPROFILER_CXX_SPECIALIZE_HANDLE_HASHER(hsa_agent_t)
ROCPROFILER_CXX_SPECIALIZE_HANDLE_HASHER(hsa_signal_t)
ROCPROFILER_CXX_SPECIALIZE_HANDLE_HASHER(hsa_executable_t)
ROCPROFILER_CXX_SPECIALIZE_HANDLE_HASHER(hsa_region_t)
ROCPROFILER_CXX_SPECIALIZE_HANDLE_HASHER(hsa_amd_memory_pool_t)

#undef ROCPROFILER_CXX_SPECIALIZE_HANDLE_HASHER
} // namespace std
Loading

0 comments on commit 3bd7773

Please sign in to comment.