Skip to content

Commit

Permalink
Cherry pick documentation content for 6.3 (#34)
Browse files Browse the repository at this point in the history
* SDK doc updates (#1183)

* correcting usage example

* rccl trace

* Adding Navi power state limitation

* Addressed feedback

* kernel-rename

* kokkos trace

* more information on kookos tracing

* Corecting tool library hardcoding

* summary domains

* Updating domain stats file

* updating images

* rocprofv3 default behavior update

* Removing README from API documentation

* Added missing description in Topics

* Fixed wrong rendering of README in API document

* Fixing Topics in API docs

* Removing API doc for details/rccl.h

* Addressed review comments

(cherry picked from commit 7ea9ced)

* updating roctx documentation for functions (#30)

updating roctx documentation for funcitons

(cherry picked from commit 6d2e70d)

---------

Co-authored-by: Gopesh Bhardwaj <[email protected]>
  • Loading branch information
alexxu-amd and bgopesh authored Dec 5, 2024
1 parent ac7ef96 commit 7a5bdac
Show file tree
Hide file tree
Showing 13 changed files with 295 additions and 43 deletions.
19 changes: 15 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,13 +73,24 @@ Please report in the Github Issues.
- **Need for Cold Restart**: In the event of a hardware freeze, you may need to perform a cold restart (turning the hardware off and on) to restore normal operations.
Please use this beta feature cautiously. It may affect your system's stability and performance. Proceed at your own risk.

- At this point, We do not recommend stress-testing the beta implementation.
- At this point, We do not recommend stress-testing the beta implementation.

- Correlation IDs provided by the PC sampling service are verified only for HIP API calls.
- Correlation IDs provided by the PC sampling service are verified only for HIP API calls.

- Timestamps in PC sampling records might not be 100% accurate.
- Timestamps in PC sampling records might not be 100% accurate.

- Using PC sampling on multi-threaded applications might fail with `HSA_STATUS_ERROR_EXCEPTION`.Furthermore, if three or more threads launch operations to the same agent, and if PC sampling is enabled, the `HSA_STATUS_ERROR_EXCEPTION` might appear.
- Using PC sampling on multi-threaded applications might fail with `HSA_STATUS_ERROR_EXCEPTION`.Furthermore, if three or more threads launch operations to the same agent, and if PC sampling is enabled, the `HSA_STATUS_ERROR_EXCEPTION` might appear.

- Navi3x requires a stable power state for counter collection.
Currently, this state needs to be set by the user.
To do so, set "power_dpm_force_performance_level" to be writeable for non-root users, then set performance level to profile_standard:

```bash
sudo chmod 777 /sys/class/drm/card0/device/power_dpm_force_performance_level
echo profile_standard >> /sys/class/drm/card0/device/power_dpm_force_performance_level
```

Recommended: "profile_standard" for counter collection and "auto" for all other profiling. Use rocm-smi to verify the current power state. For multiGPU systems (includes integrated graphics), replace "card0" by the desired card.

> [!WARNING]
> The latest mainline version of AQLprofile can be found at [https://repo.radeon.com/rocm/misc/aqlprofile/](https://repo.radeon.com/rocm/misc/aqlprofile/). However, it's important to note that updates to the public AQLProfile may not occur as frequently as updates to the rocprofiler-sdk. This discrepancy could lead to a potential mismatch between the AQLprofile binary and the rocprofiler-sdk source.
2 changes: 1 addition & 1 deletion source/docs/api-reference/tool_library.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ myst:

# ROCprofiler-SDK tool library

The tool library utilizes APIs from `rocprofiler-sdk` and `rocprofiler-register` libraries for profiling and tracing HIP applications. This document provides information to help you design a tool by utilizing the `rocprofiler-sdk` and `rocprofiler-register` libraries efficiently. The command-line tool `rocprofv3` is also built on `librocprofiler-sdk-tool.so.0.4.0`, which uses these libraries.
The tool library utilizes APIs from `rocprofiler-sdk` and `rocprofiler-register` libraries for profiling and tracing HIP applications. This document provides information to help you design a tool by utilizing the `rocprofiler-sdk` and `rocprofiler-register` libraries efficiently. The command-line tool `rocprofv3` is also built on `librocprofiler-sdk-tool.so.X.Y.Z`, which uses these libraries.

## ROCm runtimes design

Expand Down
8 changes: 7 additions & 1 deletion source/docs/conceptual/comparing-with-legacy-tools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -383,4 +383,10 @@ ROCprofiler-SDK introduces a new command-line tool, `rocprofv3`, which is a more
Timing Difference Between rocprofv3 and rocprofv1/v2
========================================================

Rocprofv3 has improved the accuracy of timing information by reducing the tool overhead required to collect data and reducing the interference to the timing of the kernel being measured. The result of this work is a reduction in variance of kernel times received for the same kernel execution and more accurate timing in general. These changes have not been backported (and will not be backported) to rocprofv1/v2, so there can be substantial (20%) differences in execution time reported by v1/v2 vs v3 for a single kernel execution. Over a large number of samples of the same kernel, the difference in average execution time is in the low single digit percentage time with a much tighter variance of results on rocprofv3. We have included testing in the test suite to verify the timing information outputted by rocprofv3 to ensure that the values we are returning are accurate.
``rocprofv3`` has improved the accuracy of timing information by reducing the tool overhead required to collect data and reducing the interference to the timing of the kernel being measured. The result of this work is a reduction in variance of kernel times received for the same kernel execution and more accurate timing in general. These changes have not been backported (and will not be backported) to rocprofv1/v2, so there can be substantial (20%) differences in execution time reported by v1/v2 vs v3 for a single kernel execution. Over a large number of samples of the same kernel, the difference in average execution time is in the low single digit percentage time with a much tighter variance of results on rocprofv3. We have included testing in the test suite to verify the timing information outputted by rocprofv3 to ensure that the values we are returning are accurate.

========================================================
Default run of rocprofv3 and rocprofv1/v2
========================================================

``rocprofv3`` has a different default behavior than rocprofv1/v2 when being run without any option. The default behavior of rocprofv3 is to collect all available agents on the system and to output it in ``csv`` format. The default behavior of rocprofv1/v2 was to output the `kernel traces` in CSV format. In rocprofv3, kernel traces can be obtained by using ``--kernel-trace`` option.
2 changes: 2 additions & 0 deletions source/docs/data/hip_domain_stats.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
"Name","Calls","TotalDurationNs","AverageNs","Percentage","MinNs","MaxNs","StdDev"
"HIP_API",13,458514859,35270373.769231,100.00,2300,352276613,99315857.546240
22 changes: 22 additions & 0 deletions source/docs/data/rccl_trace.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
"Domain","Function","Process_Id","Thread_Id","Correlation_Id","Start_Timestamp","End_Timestamp"
"RCCL_API","ncclGetVersion",1834151,1834151,416,18413845573432,18413845577374
"RCCL_API","ncclGetUniqueId",1834151,1834151,1116,18413961300878,18413963267869
"RCCL_API","ncclGetUniqueId",1834151,1834151,1481,18414166449182,18414166720831
"RCCL_API","ncclGroupStart",1834151,1834151,1482,18414166723772,18414166726834
"RCCL_API","ncclGroupEnd",1834151,1834151,1490,18414166823575,18414380520973
"RCCL_API","ncclCommInitAll",1834151,1834151,1477,18414166402665,18414380522536
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89098,18414380660695,18414380661652
"RCCL_API","ncclAllReduce",1834151,1834151,89097,18414380653860,18414380693574
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89108,18414380694631,18414380694659
"RCCL_API","ncclAllReduce",1834151,1834151,89107,18414380694212,18414380704722
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89117,18414380706650,18414380706677
"RCCL_API","ncclAllReduce",1834151,1834151,89116,18414380705574,18414380715055
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89126,18414380715749,18414380715774
"RCCL_API","ncclAllReduce",1834151,1834151,89125,18414380715463,18414380723944
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89135,18414380724688,18414380724715
"RCCL_API","ncclAllReduce",1834151,1834151,89134,18414380724395,18414380732209
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89154,18414380746383,18414380746411
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89157,18414380749863,18414380749889
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89160,18414380751671,18414380751696
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89163,18414380753326,18414380753353
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89166,18414380755128,18414380755154
Binary file added source/docs/data/rocprofv3_hip_memcpy_summary.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added source/docs/data/rocprofv3_memcpy_summary.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added source/docs/data/rocprofv3_summary.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 7a5bdac

Please sign in to comment.