From cfbac1964010c3f46f8bafcd55e694ae1d63926c Mon Sep 17 00:00:00 2001 From: "Jonathan R. Madsen" Date: Thu, 1 Aug 2024 02:59:35 -0500 Subject: [PATCH] Tracing Documentation (#997) * Update callback_services.md * Callback tracing services * Intercept table * Buffer tracing --- source/docs/buffered_services.md | 248 ++++++++++++++++++++++- source/docs/callback_services.md | 332 ++++++++++++++++++++++++++++++- source/docs/intercept_table.md | 95 ++++++++- 3 files changed, 668 insertions(+), 7 deletions(-) diff --git a/source/docs/buffered_services.md b/source/docs/buffered_services.md index dffea541..77d09027 100644 --- a/source/docs/buffered_services.md +++ b/source/docs/buffered_services.md @@ -2,12 +2,250 @@ For the buffered approach, supported buffer record categories are enumerated in `rocprofiler_buffer_category_t` category field. -## Buffered Tracing Services - ## Overview -In buffered approach, callbacks are receieved for batches of records from an internal (background) thread. Supported buffered tracing services are enumerated in `rocprofiler_buffer_tracing_kind_t`. +In buffered approach, callbacks are receieved for batches of records from an internal (background) thread. +Supported buffered tracing services are enumerated in `rocprofiler_buffer_tracing_kind_t`. Configuring +a buffer tracing service requires the creation of a buffer. When the buffer is "flushed", either implicitly +or explicitly, a callback to the tool will be invoked which provides an array of one or more buffer records. +A buffer can be explicitly flushed via the `rocprofiler_flush_buffer` function. + +## Subscribing to Buffer Tracing Services + +During tool initialization, tools configure callback tracing via the `rocprofiler_configure_buffer_tracing_service` +function. However, before invoking `rocprofiler_configure_buffer_tracing_service`, the tool must create a buffer +for the tracing records. + +### Creating a Buffer + +```cpp +rocprofiler_status_t +rocprofiler_create_buffer(rocprofiler_context_id_t context, + size_t size, + size_t watermark, + rocprofiler_buffer_policy_t policy, + rocprofiler_buffer_tracing_cb_t callback, + void* callback_data, + rocprofiler_buffer_id_t* buffer_id); +``` + +The `size` parameter is the size of the buffer in bytes and will be rounded up to the nearest +memory page size (defined by `sysconf(_SC_PAGESIZE)`); the default memory page size on Linux +is 4096 bytes (4 KB). + +The `watermark` parameter specifies the number of bytes at which +the buffer should be "flushed", i.e. when the records in the buffer should invoke the +`callback` parameter to deliver the records to the tool. For example, if a buffer has a size +of 4096 bytes and the watermark is set to 48 bytes, six 8-byte records can be placed in the +buffer before `callback` is invoked. However, every 64-byte record that is placed in the +buffer will trigger a flush. It is safe to set the `watermark` to any value between +zero and the buffer size. + +The `policy` parameter specifies the behavior for when a record is larger than the +amount of free space in the current buffer. For example, if a buffer has a size of +4000 bytes with a watermark set to 4000 bytes and 3998 of the bytes in the buffer +have been populated with records, the `policy` dictates how to handle an incoming record > +2 bytes. The `ROCPROFILER_BUFFER_POLICY_DISCARD` policy dictates that all records greater +than should 2 bytes should be dropped until the tool _explicitly_ flushes the buffer via +a `rocprofiler_flush_buffer` function call whereas the `ROCPROFILER_BUFFER_POLICY_LOSSLESS` +policy dictates that the current buffer should be swapped out for an empty buffer and placed +in that new buffer and former (full) buffer should be _implicitly_ flushed. + +The `callback` parameter is the function that rocprofiler-sdk should invoke when flushing +the buffer; the value of the `callback_data` parameter will be passed as one of the arguments +to the `callback` function. + +The `buffer_id` parameter is an output parameter for the function call and will have a +non-zero handle field after successful buffer creation. + +### Creating a Dedicated Thread for Buffer Callbacks + +By default, all buffers will use the same (default) background thread created by rocprofiler-sdk to +invoke their callback. However, rocprofiler-sdk provides an interface for tools to specify the +creation of an additional background thread for one or more of their buffers. + +Callback threads for buffers are created via the `rocprofiler_create_callback_thread` function: + +```cpp +rocprofiler_status_t +rocprofiler_create_callback_thread(rocprofiler_callback_thread_t* cb_thread_id); +``` + +Buffers are assigned to that callback thread via the `rocprofiler_assign_callback_thread` function: + +```cpp +rocprofiler_status_t +rocprofiler_assign_callback_thread(rocprofiler_buffer_id_t buffer_id, + rocprofiler_callback_thread_t cb_thread_id); +``` + +#### Buffer Callback Thread Creation and Assignment Example + +```cpp +{ + // create a context + auto context_id = rocprofiler_context_id_t{}; + rocprofiler_create_context(&context_id); + + // create a buffer associated with the context + auto buffer_id = rocprofiler_buffer_id_t{}; + rocprofiler_create_buffer(context_id, ..., &buffer_id); + + // specify that a new callback thread should be created and provide + // and assign the identifier for it to the "thr_id" variable + auto thr_id = rocprofiler_callback_thread_t{}; + rocprofiler_create_callback_thread(&thr_id); + + // assign the buffer callback to be delivered on this thread + rocprofiler_assign_callback_thread(buffer_id, thr_id); +} +``` + +### Configuring Buffer Tracing Services + +```cpp +rocprofiler_status_t +rocprofiler_configure_buffer_tracing_service(rocprofiler_context_id_t context_id, + rocprofiler_buffer_tracing_kind_t kind, + rocprofiler_tracing_operation_t* operations, + size_t operations_count, + rocprofiler_buffer_id_t buffer_id); +``` + +The `kind` parameter is a high-level specifier of which service to trace (also known as a "domain"). +Domain examples include, but are not limited to, the HIP API, the HSA API, and kernel dispatches. +For each domain, there are (often) various "operations", which can be used to restrict the callbacks +to a subset within the domain. For domains which correspond to APIs, the "operations" are the functions +which compose the API. If all operations in a domain should be traced, the `operations` and `operations_count` +parameters can be set to `nullptr` and `0`, respectively. If the tracing domain should be restricted to a subset +of operations, the tool library should specify a C-array of type `rocprofiler_tracing_operation_t` and the +size of the array for the `operations` and `operations_count` parameter. + +Similar to `rocprofiler_configure_callback_tracing_service`, +`rocprofiler_configure_buffer_tracing_service` will return an error if a buffer service for given context +and given domain is configured more than once. + +#### Example + +```cpp +{ + auto ctx = rocprofiler_context_id_t{}; + // ... creation of context, etc. ... + + // buffer parameters + constexpr auto KB = 1024; // 1024 bytes + constexpr auto buffer_size = 16 * KB; + constexpr auto watermark = 15 * KB; + constexpr auto policy = ROCPROFILER_BUFFER_POLICY_LOSSLESS; + + // buffer handle + auto buffer_id = rocprofiler_buffer_id_t{}; + + // create a buffer associated with the context + rocprofiler_create_buffer( + context_id, buffer_size, watermark, policy, callback_func, nullptr, &buffer_id); + + // configure HIP runtime API function records to be placed in buffer + rocprofiler_configure_buffer_tracing_service( + ctx, ROCPROFILER_BUFFER_TRACING_HIP_RUNTIME_API, nullptr, 0, buffer_id); + + // configure kernel dispatch records to be placed in buffer + // (more than one service can use the same buffer) + rocprofiler_configure_buffer_tracing_service( + ctx, ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH, nullptr, 0, buffer_id); + + // ... etc. ... +} +``` + +## Buffer Tracing Callback Function + +Rocprofiler-sdk buffer tracing callback functions have the signature: + +```cpp +typedef void (*rocprofiler_buffer_tracing_cb_t)(rocprofiler_context_id_t context, + rocprofiler_buffer_id_t buffer_id, + rocprofiler_record_header_t** headers, + size_t num_headers, + void* data, + uint64_t drop_count); +``` + +The `rocprofiler_record_header_t` data type provides three pieces of information: + +1. Category (`rocprofiler_buffer_category_t`) +2. Kind +3. Payload + +The category is used to distinguish the classification of the buffer record. For all +services configured via `rocprofiler_configure_buffer_tracing_service`, the category will +be equal to the value of `ROCPROFILER_BUFFER_CATEGORY_TRACING`. The meaning of the kind +field is dependent on the category but when the category is `ROCPROFILER_BUFFER_CATEGORY_TRACING`, +the kind value will be equivalent to the is used +to distinguish the `rocprofiler_buffer_tracing_kind_t` value passed to +`rocprofiler_configure_buffer_tracing_service`, e.g. `ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH`. +Once the category and kind have been determined, the payload can be casted: + +```cpp +{ + if(header->category == ROCPROFILER_BUFFER_CATEGORY_TRACING && + header->kind == ROCPROFILER_BUFFER_TRACING_HIP_RUNTIME_API) + { + auto* record = + static_cast(header->payload); + + // ... etc. ... + } +} +``` + +### Buffer Tracing Callback Function Example + +```cpp +void +buffer_callback_func(rocprofiler_context_id_t context, + rocprofiler_buffer_id_t buffer_id, + rocprofiler_record_header_t** headers, + size_t num_headers, + void* user_data, + uint64_t drop_count) +{ + for(size_t i = 0; i < num_headers; ++i) + { + auto* header = headers[i]; + + if(header->category == ROCPROFILER_BUFFER_CATEGORY_TRACING && + header->kind == ROCPROFILER_BUFFER_TRACING_HIP_RUNTIME_API) + { + auto* record = + static_cast(header->payload); + + // ... etc. ... + } + else if(header->category == ROCPROFILER_BUFFER_CATEGORY_TRACING && + header->kind == ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH) + { + auto* record = + static_cast(header->payload); + + // ... etc. ... + } + else + { + throw std::runtime_error{"unhandled record header category + kind"}; + } + } +} +``` + +## Buffer Tracing Record -## HSA API Tracing +Unlike callback tracing records, there is no common set of data for each buffer tracing record. However, +many buffer tracing records contain a `kind` field and an `operation` field. +The name of a tracing kind can be obtained via the `rocprofiler_query_buffer_tracing_kind_name` function. +The name of an operation specific to a tracing kind can be obtained via the `rocprofiler_query_buffer_tracing_kind_operation_name` +function. One can also iterate over all the buffer tracing kinds and operations for each tracing kind via the +`rocprofiler_iterate_buffer_tracing_kinds` and `rocprofiler_iterate_buffer_tracing_kind_operations` functions. -## Kernel Tracing +The buffer tracing record data types can be found in the `rocprofiler-sdk/buffer_tracing.h` header +(`source/include/rocprofiler-sdk/buffer_tracing.h` in the [rocprofiler-sdk GitHub repository](https://github.com/ROCm/rocproifler-sdk)). diff --git a/source/docs/callback_services.md b/source/docs/callback_services.md index 4cb2c43a..6744d9d4 100644 --- a/source/docs/callback_services.md +++ b/source/docs/callback_services.md @@ -2,6 +2,336 @@ ## Overview +Callback tracing services provide immediate callbacks to a tool on the current CPU thread when a given event occurs. +For example, when tracing an API function, e.g. `hipSetDevice`, callback tracing invokes a user-specified callback +before and after the traced function executes on the thread which is invoking the API function. + +## Subscribing to Callback Tracing Services + +During tool initialization, tools configure callback tracing via the `rocprofiler_configure_callback_tracing_service` +function: + +```cpp +rocprofiler_status_t +rocprofiler_configure_callback_tracing_service(rocprofiler_context_id_t context_id, + rocprofiler_callback_tracing_kind_t kind, + rocprofiler_tracing_operation_t* operations, + size_t operations_count, + rocprofiler_callback_tracing_cb_t callback, + void* callback_args); +``` + +The `kind` parameter is a high-level specifier of which service to trace (also known as a "domain"). +Domain examples include, but are not limited to, the HIP API, the HSA API, and kernel dispatches. +For each domain, there are (often) various "operations", which can be used to restrict the callbacks +to a subset within the domain. For domains which correspond to APIs, the "operations" are the functions +which compose the API. If all operations in a domain should be traced, the `operations` and `operations_count` +parameters can be set to `nullptr` and `0`, respectively. If the tracing domain should be restricted to a subset +of operations, the tool library should specify a C-array of type `rocprofiler_tracing_operation_t` and the +size of the array for the `operations` and `operations_count` parameter. + +`rocprofiler_configure_callback_tracing_service` will return an error if a callback service for given context +and given domain is configured more than once. For example, if one only wanted to trace two functions within +the HIP runtime API, `hipGetDevice` and `hipSetDevice`, the following code would accomplish this objective: + +```cpp +{ + auto ctx = rocprofiler_context_id_t{}; + // ... creation of context, etc. ... + + // array of operations (i.e. API functions) + auto operations = std::array{ + ROCPROFILER_HIP_RUNTIME_API_ID_hipSetDevice, + ROCPROFILER_HIP_RUNTIME_API_ID_hipGetDevice + }; + + rocprofiler_configure_callback_tracing_service(ctx, + ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API, + operations.data(), + operations.size(), + callback_func, + nullptr); + // ... etc. ... +} +``` + +But the following code would be invalid: + +```cpp +{ + auto ctx = rocprofiler_context_id_t{}; + // ... creation of context, etc. ... + + // array of operations (i.e. API functions) + auto operations = std::array{ + ROCPROFILER_HIP_RUNTIME_API_ID_hipSetDevice, + ROCPROFILER_HIP_RUNTIME_API_ID_hipGetDevice + }; + + for(auto op : operations) + { + // after the first iteration, will return ROCPROFILER_STATUS_ERROR_SERVICE_ALREADY_CONFIGURED + rocprofiler_configure_callback_tracing_service(ctx, + ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API, + &op, + 1, + callback_func, + nullptr); + } + + // ... etc. ... +} +``` + +## Callback Tracing Callback Function + +Rocprofiler-sdk callback tracing callback functions have the signature: + +```cpp +typedef void (*rocprofiler_callback_tracing_cb_t)(rocprofiler_callback_tracing_record_t record, + rocprofiler_user_data_t* user_data, + void* callback_data) +``` + +The `record` parameter contains the information to uniquely identify a tracing record type and has the +following definition: + +```cpp +typedef struct rocprofiler_callback_tracing_record_t +{ + rocprofiler_context_id_t context_id; + rocprofiler_thread_id_t thread_id; + rocprofiler_correlation_id_t correlation_id; + rocprofiler_callback_tracing_kind_t kind; + uint32_t operation; + rocprofiler_callback_phase_t phase; + void* payload; +} rocprofiler_callback_tracing_record_t; +``` + +The underlying type of `payload` field above is typically unique to a domain and, less frequently, an operation. +For example, for the `ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API` and `ROCPROFILER_CALLBACK_TRACING_HIP_COMPILER_API`, +the payload should be casted to `rocprofiler_callback_tracing_hip_api_data_t*` -- which will contain the arguments +to the function and (in the exit phase) the return value of the function. The payload field will only be a valid +pointer during the invocation of the callback function(s). + +The `user_data` parameter can be used to store data in between callback phases. It is a unique for every +instance of an operation. For example, if the tool library wishes to store the timestamp of the +`ROCPROFILER_CALLBACK_PHASE_ENTER` phase for the ensuing `ROCPROFILER_CALLBACK_PHASE_EXIT` callback, +this data can be stored in a method similar to below: + +```cpp +void +callback_func(rocprofiler_callback_tracing_record_t record, + rocprofiler_user_data_t* user_data, + void* cb_data) +{ + auto ts = rocprofiler_timestamp_t{}; + rocprofiler_get_timestamp(&ts); + + if(record.phase == ROCPROFILER_CALLBACK_PHASE_ENTER) + { + user_data->value = ts; + } + else if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT) + { + auto delta_ts = (ts - user_data->value); + // ... etc. ... + } + else + { + // ... etc. ... + } +} +``` + +The `callback_data` argument will be the value of `callback_args` passed to `rocprofiler_configure_callback_tracing_service` +in [the previous section](#subscribing-to-callback-tracing-services). + +## Callback Tracing Record + +The name of a tracing kind can be obtained via the `rocprofiler_query_callback_tracing_kind_name` function. +The name of an operation specific to a tracing kind can be obtained via the `rocprofiler_query_callback_tracing_kind_operation_name` +function. One can also iterate over all the callback tracing kinds and operations for each tracing kind via the +`rocprofiler_iterate_callback_tracing_kinds` and `rocprofiler_iterate_callback_tracing_kind_operations` functions. +Lastly, for a given `rocprofiler_callback_tracing_record_t` object, rocprofiler-sdk supports generically iterating over +the arguments of the payload field for many domains. + +As mentioned above, within the `rocprofiler_callback_tracing_record_t` object, +an opaque `void* payload` is provided for accessing domain specific information. +The data types generally follow the naming convention of `rocprofiler_callback_tracing__data_t`, +e.g., for the tracing kinds `ROCPROFILER_BUFFER_TRACING_HSA_{CORE,AMD_EXT,IMAGE_EXT,FINALIZE_EXT}_API`, +the payload should be casted to `rocprofiler_callback_tracing_hsa_api_data_t*`: + +```cpp +void +callback_func(rocprofiler_callback_tracing_record_t record, + rocprofiler_user_data_t* user_data, + void* cb_data) +{ + static auto hsa_domains = std::unordered_set{ + ROCPROFILER_BUFFER_TRACING_HSA_CORE_API, + ROCPROFILER_BUFFER_TRACING_HSA_AMD_EXT_API, + ROCPROFILER_BUFFER_TRACING_HSA_IMAGE_EXT_API, + ROCPROFILER_BUFFER_TRACING_HSA_FINALIZER_API}; + + if(hsa_domains.count(record.kind) > 0) + { + auto* payload = static_cast(record.payload); + + hsa_status_t status = payload->retval.hsa_status_t_retval; + if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT && status != HSA_STATUS_SUCCESS) + { + const char* _kind = nullptr; + const char* _operation = nullptr; + + rocprofiler_query_callback_tracing_kind_name(record.kind, &_kind, nullptr); + rocprofiler_query_callback_tracing_kind_operation_name( + record.kind, record.operation, &_operation, nullptr); + + // message that + fprintf(stderr, "[domain=%s] %s returned a non-zero exit code: %i\n", _kind, _operation, status); + } + } + else if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT) + { + auto delta_ts = (ts - user_data->value); + // ... etc. ... + } + else + { + // ... etc. ... + } +} +``` + +### Sample `rocprofiler_iterate_callback_tracing_kind_operation_args` + +```cpp +int +print_args(rocprofiler_callback_tracing_kind_t domain_idx, + uint32_t op_idx, + uint32_t arg_num, + const void* const arg_value_addr, + int32_t arg_indirection_count, + const char* arg_type, + const char* arg_name, + const char* arg_value_str, + int32_t arg_dereference_count, + void* data) +{ + if(arg_num == 0) + { + const char* _kind = nullptr; + const char* _operation = nullptr; + + rocprofiler_query_callback_tracing_kind_name(domain_idx, &_kind, nullptr); + rocprofiler_query_callback_tracing_kind_operation_name( + domain_idx, op_idx, &_operation, nullptr); + + fprintf(stderr, "\n[%s] %s\n", _kind, _operation); + } + + char* _arg_type = abi::__cxa_demangle(arg_type, nullptr, nullptr, nullptr); + + fprintf(stderr, " %u: %-18s %-16s = %s\n", arg_num, _arg_type, arg_name, arg_value_str); + + free(_arg_type); + + // unused in example + (void) arg_value_addr; + (void) arg_indirection_count; + (void) arg_dereference_count; + (void) data; + + return 0; +} + +void +callback_func(rocprofiler_callback_tracing_record_t record, + rocprofiler_user_data_t* user_data, + void* cb_data) +{ + if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT && + record.kind == ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API && + (record.operation == ROCPROFILER_HIP_RUNTIME_API_ID_hipLaunchKernel || + record.operation == ROCPROFILER_HIP_RUNTIME_API_ID_hipMemcpyAsync)) + { + rocprofiler_iterate_callback_tracing_kind_operation_args( + record, print_args, record.phase, nullptr)); + } +} +``` + +Sample Output: + +```console + +[HIP_RUNTIME_API] hipLaunchKernel + 0: void const* function_address = 0x219308 + 1: rocprofiler_dim3_t numBlocks = {z=1, y=310, x=310} + 2: rocprofiler_dim3_t dimBlocks = {z=1, y=32, x=32} + 3: void** args = 0x7ffe6d8dd3c0 + 4: unsigned long sharedMemBytes = 0 + 5: ihipStream_t* stream = 0x17b40c0 + +[HIP_RUNTIME_API] hipMemcpyAsync + 0: void* dst = 0x7f06c7bbb010 + 1: void const* src = 0x7f0698800000 + 2: unsigned long sizeBytes = 393625600 + 3: hipMemcpyKind kind = DeviceToHost + 4: ihipStream_t* stream = 0x25dfcf0 +``` + ## Code Object Tracing -## HSA API Tracing +The code object tracing service is a critical component for obtaining information regarding +asynchronous activity on the GPU. The `rocprofiler_callback_tracing_code_object_load_data_t` +payload (kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`, operation=`ROCPROFILER_CODE_OBJECT_LOAD`) +provides a unique identifier for a bundle of one or more GPU kernel symbols which have been loaded +for a specific GPU agent. For example, if your application is leveraging a multi-GPU system system +containing 4 Vega20 GPUs and 4 MI100 GPUs, there will at least 8 code objects loaded: one code +object for each GPU. Each code object will be associated with a set of kernel symbols: +the `rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t` payload +(kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`, operation=`ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER`) +provides a globally unique identifier for the specific kernel symbol along with the kernel name and +several other static properties of the kernel (e.g. scratch size, scalar general purpose register count, etc.). +Note: two otherwise identical kernel symbols (same kernel name, scratch size, etc.) which are part of +otherwise identical code objects but the code objects are loaded for different GPU agents ***will*** have unique +kernel identifiers. Furthermore, if the same code object (and it's kernel symbols) are unloaded and then +re-loaded, that code object and all of it's kernel symbols ***will*** be given new unique identifiers. + +In general, when a code object is loaded and unloaded, here is the sequence of events: + +1. Callback: code object load + - kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT` + - operation=`ROCPROFILER_CODE_OBJECT_LOAD` + - phase=`ROCPROFILER_CALLBACK_PHASE_LOAD` +2. Callback: kernel symbol load + - kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT` + - operation=`ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER` + - phase=`ROCPROFILER_CALLBACK_PHASE_LOAD` + - Repeats for each kernel symbol in code object +3. Application Execution +4. Callback: kernel symbol unload + - kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT` + - operation=`ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER` + - phase=`ROCPROFILER_CALLBACK_PHASE_UNLOAD` + - Repeats for each kernel symbol in code object +5. Callback: code object unload + - kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT` + - operation=`ROCPROFILER_CODE_OBJECT_LOAD` + - phase=`ROCPROFILER_CALLBACK_PHASE_UNLOAD` + +Note: rocprofiler-sdk does not provide an interface to query this information outside of the +code object tracing service. If you wish to be able to associate kernel names with kernel tracing records, +a tool is personally responsible for making a copy of the relevant information when the code objects and +kernel symbol are loaded (however, any constant string fields like the (`const char* kernel_name` field) +need not to be copied, these are guaranteed to be valid pointers until after rocprofiler-sdk finalization). +If a tool decides to delete their copy of the data associated with a given code object or kernel symbol +identifier when the code object and kernel symbols are unloaded, it is highly recommended to flush +any/all buffers which might contain references to that code object or kernel symbol identifiers before +deleting the associated data. + +For a sample of code object tracing, please see the `samples/code_object_tracing` example in the +[rocprofiler-sdk GitHub repository](https://github.com/ROCm/rocproifler-sdk). diff --git a/source/docs/intercept_table.md b/source/docs/intercept_table.md index 7a7c4049..54a95093 100644 --- a/source/docs/intercept_table.md +++ b/source/docs/intercept_table.md @@ -1,3 +1,96 @@ # Runtime Intercept Tables -Discussion on how access the raw runtime intercept tables of HSA and HIP (i.e. ExaTracer requirements by LTTng). +Although most tools will want to leverage the callback or buffer tracing services for tracing the HIP, HSA, and ROCTx +APIs, rocprofiler-sdk does provide access to the raw API dispatch tables. Each of the aforementioned APIs are +designed similar to the following sample. + +## Dispatch Table Overview + +### Forward Declaration of public C API function + +```cpp +extern "C" +{ +// forward declaration of public C API function +int +foo(int) __attribute__((visibility("default"))); +} +``` + +### Internal Implementation of API function + +```cpp +namespace impl +{ +int +foo(int val) +{ + // real implementation + return (2 * val); +} +} +``` + +### Dispatch Table Implementation + +```cpp +namespace impl +{ +struct dispatch_table +{ + int (*foo_fn)(int) = nullptr; +}; + +// invoked once: populates the dispatch_table with function pointers to implementation +dispatch_table*& +construct_dispatch_table() +{ + static dispatch_table* tbl = new dispatch_table{}; + tbl->foo_fn = impl::foo; + + // in between above and below, rocprofiler-sdk gets passed the pointer + // to the dispatch table and has the opportunity to wrap the function + // pointers for interception + + return tbl; +} + +// constructs dispatch table and stores it in static variable +dispatch_table* +get_dispatch_table() +{ + static dispatch_table*& tbl = construct_dispatch_table(); + return tbl; +} +} // namespace impl +``` + +### Implementaiton of public C API function + +```cpp +extern "C" +{ +// implementation of public C API function +int +foo(int val) +{ + return impl::get_dispatch_table()->foo_fn(val); +} +} +``` + +### Dispatch Table Chaining + +rocprofiler-sdk is given an opportunity within `impl::construct_dispatch_table()` to +save the original value(s) of the function pointers such as `foo_fn` and install +it's own function pointers in its place -- this results in the public C API function `foo` +calling into the rocprofiler-sdk function pointer, which then in turn, calls the original +function pointer to `impl::foo` (this is called "chaining"). Once rocprofiler-sdk +has made any necessary modifications to the dispatch table, tools which indicated +they also want access to the raw dispatch table via `rocprofiler_at_intercept_table_registration` +will be passed the pointer to the dispatch table. + +## Sample + +For a demo of dispatch table chaining, please see the `samples/intercept_table` example in the +[rocprofiler-sdk GitHub repository](https://github.com/ROCm/rocproifler-sdk).