Skip to content

Commit

Permalink
rocprofv3 doc updates (#982)
Browse files Browse the repository at this point in the history
* updating rocprofv3

* using rocprofv3

* review updates

* naming standardization

* Update source/docs/how-to/using-rocprofv3.rst

Co-authored-by: Leo Paoletti <[email protected]>

* review comments

* adding API references

* kernel filtering

* Remove Sphinx warn as error

To bypass false warning for linking between rst and md

* remove unused (duplicate) refs in _toc.yml.in

---------

Co-authored-by: Gopesh Bhardwaj <[email protected]>
Co-authored-by: Leo Paoletti <[email protected]>
Co-authored-by: Sam Wu <[email protected]>
Co-authored-by: Peter Jun Park <[email protected]>
  • Loading branch information
5 people authored Aug 2, 2024
1 parent cfbac19 commit 69caa62
Show file tree
Hide file tree
Showing 15 changed files with 193 additions and 245 deletions.
17 changes: 9 additions & 8 deletions source/docs/_toc.yml.in
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,6 @@ defaults:

root: index
subtrees:
- entries:
- file: what-is-rocprof-sdk
- file: buffered_services.md
- file: callback_services.md
- file: counter_collection_services.md
- file: intercept_table.md
- file: pc_sampling.md
- file: tool_library_overview.md
- caption: Install
entries:
- file: install/installation
Expand All @@ -23,8 +15,17 @@ subtrees:
- file: how-to/samples
- caption: API reference
entries:
- file: api-reference/buffered_services
- file: api-reference/callback_services
- file: api-reference/counter_collection_services
- file: api-reference/intercept_table
- file: api-reference/pc_sampling
- file: api-reference/tool_library
- file: _doxygen/html/index
title: API library
- caption: Conceptual
entries:
- file: conceptual/comparing-with-legacy-tools
- caption: License
entries:
- file: license
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Buffered Services
# Buffered services

For the buffered approach, supported buffer record categories are enumerated in `rocprofiler_buffer_category_t` category field.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Callback Tracing Services
# Callback tracing services

## Overview

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Counter Collection Services
# Counter collection services

## Definitions

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Runtime Intercept Tables
# Runtime intercept tables

Although most tools will want to leverage the callback or buffer tracing services for tracing the HIP, HSA, and ROCTx
APIs, rocprofiler-sdk does provide access to the raw API dispatch tables. Each of the aforementioned APIs are
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# PC Sampling Method
# PC sampling method

PC Sampling is a profiling method that uses statistical approximation of the kernel execution by sampling GPU program counters. Furthermore, the method periodically chooses an active wave (in a round robin manner) and snapshot it's program counter (PC). The process takes place on every compute unit simultaneously which makes it device-wide PC sampling. The outcome is the histogram of samples that says how many times each kernel instruction was sampled.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -143,18 +143,6 @@ tool_init(rocprofiler_client_finalize_t fini_func,
Otherwise, ROCprofiler-SDK invokes the `finalize` callback via an `atexit` handler.
## Agent Information
## Contexts
## Configuring Services
## Synchronous Callbacks
## Asynchronous Callbacks for Buffers
## Recommendations
## Full `rocprofiler_configure` Sample
All of the snippets from the previous sections have been combined here for convenience.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,22 +1,15 @@
.. meta::
:description: Documentation of the installation, configuration, use of the ROCProfiler SDK, and rocprofv3 command-line tool
:keywords: ROCProfiler SDK tool, ROCProfiler SDK library, rocprofv3, ROCm, API, reference
:description: Documentation of the installation, configuration, use of the ROCprofiler-SDK, and rocprofv3 command-line tool
:keywords: ROCprofiler-SDK tool, ROCprofiler-SDK library, rocprofv3, ROCm, API, reference

.. _what-is-rocprof-sdk:
.. _comparing-with-legacy-tools:

==========================
What is ROCprofiler-SDK?
==========================
========================================================
Comparing ROCprofiler-SDK to other ROCm profiling tools
========================================================

ROCprofiler-SDK is a tooling infrastructure for profiling general-purpose GPU compute applications running on the ROCm software.
It supports application tracing to provide a big picture of the GPU application execution and kernel profiling to provide low-level hardware details from the performance counters.
The ROCprofiler-SDK library provides runtime-independent APIs for tracing runtime calls and asynchronous activities such as GPU kernel dispatches and memory moves. The tracing includes callback APIs for runtime API tracing and activity APIs for asynchronous activity records logging.

In summary, ROCprofiler-SDK combines `ROCProfiler <https://rocm.docs.amd.com/projects/rocprofiler/en/latest/index.html>`_ and `ROCTracer <https://rocm.docs.amd.com/projects/roctracer/en/latest/index.html>`_.
You can utilize the ROCprofiler-SDK to develop a tool for profiling and tracing HIP applications on ROCm software.

ROCprofiler-SDK is an improved version that enables more efficient implementations and better thread safety while avoiding problems that plague the former implementations of ROCProfiler and ROCTracer.
Here are the distinct ROCprofiler-SDK features:
ROCprofiler-SDK is an improved version of ROCm profiling tools that enables more efficient implementations and better thread safety while avoiding problems that plague the former implementations of ROCProfiler and ROCTracer.
Here are the distinct ROCprofiler-SDK features, which also highlight the improvements over ROCProfiler and ROCTracer:

- Improved tool initialization
- Support for simultaneous use of the same services by multiple tools
Expand All @@ -25,10 +18,7 @@ Here are the distinct ROCprofiler-SDK features:
- Backward ABI compatibility
- PC sampling (beta implementation)

Improvements over ROCProfiler and ROCTracer
----------------------------------------------------

The former implementations allow a tool to access any of the services provided by ROCProfiler or ROCTracer such as API tracing, kernel tracing, etc., by calling ``roctracer_init()`` when a ROCm runtime is initially loaded.
The former implementations allow a tool to access any of the services provided by ROCProfiler or ROCTracer, such as API tracing and kernel tracing, by calling ``roctracer_init()`` when an ROCm runtime is initially loaded.
As the calling tool is not required to specify during initialization, the services it needs to use, the libraries must be effectively prepared for any service to be available anytime.
This behavior introduces unnecessary overhead and makes thread-safe data management difficult, as tools generally don't use all the available services.
For example, ROCTracer always installs wrappers around every runtime API and adds indirection overhead through the ROCTracer library to check for the current service configuration in a thread-safe manner.
Expand Down
2 changes: 2 additions & 0 deletions source/docs/data/counter_collection.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
"Correlation_Id","Dispatch_Id","Agent_Id","Queue_Id","Process_Id","Thread_Id","Grid_Size","Kernel_Name","Workgroup_Size","LDS_Block_Size","Scratch_Size","VGPR_Count","SGPR_Count","Counter_Name","Counter_Value"
0,1,1,139892123975680,5619,5619,1048576,"matrixTranspose(float*, float*, int)",16,0,0,8,16,"SQ_WAVES",65536
5 changes: 5 additions & 0 deletions source/docs/data/kernel_names.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"Correlation_Id","Dispatch_Id","Agent_Id","Queue_Id","Process_Id","Thread_Id","Grid_Size","Kernel_Name","Workgroup_Size","LDS_Block_Size","Scratch_Size","VGPR_Count","SGPR_Count","Counter_Name","Counter_Value"
4,4,1,1,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
8,8,1,2,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
12,12,1,3,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
16,16,1,4,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
4 changes: 2 additions & 2 deletions source/docs/how-to/samples.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ The samples are provided to help you see the profiler in action.

## Finding samples

After the ROCm build is installed:
The ROCm installation provides sample programs and `rocprofv3` tool.

- Sample programs are installed here:

Expand Down Expand Up @@ -35,7 +35,7 @@ ctest -V
```

:::{note}
Running a few of these tests require you to install Pandas and pytest first.
Running a few of these tests require you to install [pandas](https://pandas.pydata.org/) and [pytest](https://docs.pytest.org/en/stable/) first.
:::

```bash
Expand Down
Loading

0 comments on commit 69caa62

Please sign in to comment.