Skip to content

Commit

Permalink
Accumulation metrics support and update counter collection API to aql…
Browse files Browse the repository at this point in the history
…profile_v2 (#915)

* Updating to v3 API

* General fixes

* Extending dimension bits to 54

* Disabling agent profiling tests

* Fixed unit test

* Adding accumulate metric support for parsing counters (#609)

* Adding accumulate metric support for parsing counters

* Adding metric flag

* Updating tests

* source formatting (clang-format v11) (#610)

Co-authored-by: Manjunath-Jakaraddi <[email protected]>

* source formatting (clang-format v11) (#614)

Co-authored-by: jrmadsen <[email protected]>

* Adding evaluate ast test

* source formatting (clang-format v11) (#633)

Co-authored-by: Manjunath-Jakaraddi <[email protected]>

* Update scanner generated file

* Adding flags to events for aqlprofile

* Fix Mi200 failing test

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Manjunath-Jakaraddi <[email protected]>
Co-authored-by: jrmadsen <[email protected]>

* Revert "Extending dimension bits to 54"

This reverts commit 3cd6628452484044a93e129f27974f996a0e4c08.

* Removing CU dimension

* Fixing merge conflicts

* Revert "Disabling agent profiling tests"

This reverts commit 7e01518ed8c51fbb0c3b2575e1e0b8f9ddfa8237.

* Fixing merge conflicts

* Fix parser tests

* Adding accumulate metric documentation

* Update counter_collection_services.md

* Update index.md

* fix nested expression use

* Update source/lib/rocprofiler-sdk/counters/evaluate_ast.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Doc update

---------

Co-authored-by: Benjamin Welton <[email protected]>
Co-authored-by: Manjunath P Jakaraddi <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Manjunath-Jakaraddi <[email protected]>
Co-authored-by: jrmadsen <[email protected]>
Co-authored-by: Manjunath-Jakaraddi <[email protected]>
  • Loading branch information
7 people authored Jul 2, 2024
1 parent 29d8b14 commit a78753d
Show file tree
Hide file tree
Showing 31 changed files with 778 additions and 471 deletions.
1 change: 1 addition & 0 deletions source/docs/_toc.yml.in
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ subtrees:
- file: buffered_services
- file: pc_sampling
- file: intercept_table
- file: counter_collection_services
- file: _doxygen/html/index
- file: samples
- file: rocprofv3
Expand Down
14 changes: 14 additions & 0 deletions source/docs/counter_collection_services.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Derived Metrics

## Accumulate metric
### Expression
expr=accumulate(<basic_level_counter>, <resolution>)
### Description
- The accumulate metric is used to sum the values of a basic level counter over a specified number of cycles. By setting the resolution parameter, you can control the frequency of the summing operation:
- HIGH_RES: Sums up the basic counter every clock cycle. Captures the value every single cycle for higher accuracy, suitable for fine-grained analysis.
- LOW_RES: Sums up the basic counter every four clock cycles. Reduces the data points and provides less detailed summing, useful for reducing data volume.
- NONE: Does nothing and is equivalent to collecting basic_level_counter. Outputs the value of the basic counter without any summing operation.

### Usage (derived_counters.xml)
<metric name="MeanOccupancyPerCU" expr=accumulate(SQ_LEVEL_WAVES,HIGH_RES)/reduce(GRBM_GUI_ACTIVE,max)/CU_NUM descr="Mean occupancy per compute unit."></metric>
- MeanOccupancyPerCU: This metric calculates the mean occupancy per compute unit. It uses the accumulate function with HIGH_RES to sum the SQ_LEVEL_WAVES counter at every clock cycle. This sum is then divided by GRBM_GUI_ACTIVE and the number of compute units (CU_NUM) to derive the mean occupancy.
2 changes: 2 additions & 0 deletions source/lib/rocprofiler-sdk/agent.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -795,6 +795,8 @@ construct_agent_cache(::HsaApiTable* table)
"{}",
fmt::join(rocp_hsa_agent_node_ids.begin(), rocp_hsa_agent_node_ids.end(), ", "));

get_agent_caches().clear();
get_agent_mapping().clear();
get_agent_mapping().reserve(get_agent_mapping().size() + rocp_agents.size());

auto hsa_agent_node_map = std::unordered_map<uint32_t, hsa_agent_t>{};
Expand Down
10 changes: 5 additions & 5 deletions source/lib/rocprofiler-sdk/aql/helpers.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,9 @@ get_block_counters(rocprofiler_agent_id_t agent, const aqlprofile_pmc_event_t& e

rocprofiler_status_t
set_dim_id_from_sample(rocprofiler_counter_instance_id_t& id,
hsa_agent_t agent,
hsa_ven_amd_aqlprofile_event_t event,
uint32_t sample_id)
aqlprofile_agent_handle_t agent,
aqlprofile_pmc_event_t event,
size_t sample_id)
{
auto callback =
[](int, int sid, int, int coordinate, const char*, void* userdata) -> hsa_status_t {
Expand All @@ -82,8 +82,8 @@ set_dim_id_from_sample(rocprofiler_counter_instance_id_t& id,
return HSA_STATUS_SUCCESS;
};

if(hsa_ven_amd_aqlprofile_iterate_event_coord(
agent, event, sample_id, callback, static_cast<void*>(&id)) != HSA_STATUS_SUCCESS)
if(aqlprofile_iterate_event_coord(agent, event, sample_id, callback, static_cast<void*>(&id)) !=
HSA_STATUS_SUCCESS)
{
return ROCPROFILER_STATUS_ERROR_AQL_NO_EVENT_COORD;
}
Expand Down
6 changes: 3 additions & 3 deletions source/lib/rocprofiler-sdk/aql/helpers.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,9 @@ get_dim_info(rocprofiler_agent_id_t agent,
// Set dimension ids into id for sample
rocprofiler_status_t
set_dim_id_from_sample(rocprofiler_counter_instance_id_t& id,
hsa_agent_t agent,
hsa_ven_amd_aqlprofile_event_t event,
uint32_t sample_id);
aqlprofile_agent_handle_t agent,
aqlprofile_pmc_event_t event,
size_t sample_id);

rocprofiler_status_t
set_profiler_active_on_queue(const AmdExtTable& api,
Expand Down
136 changes: 33 additions & 103 deletions source/lib/rocprofiler-sdk/aql/packet_construct.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,14 +66,15 @@ CounterPacketConstruct::CounterPacketConstruct(rocprofiler_agent_id_t
for(unsigned block_index = 0; block_index < query_info.instance_count; ++block_index)
{
_metrics.back().instances.push_back(
{static_cast<hsa_ven_amd_aqlprofile_block_name_t>(query_info.id),
block_index,
event_id});
{.block_index = block_index,
.event_id = event_id,
.flags = aqlprofile_pmc_event_flags_t{x.flags()},
.block_name = static_cast<hsa_ven_amd_aqlprofile_block_name_t>(query_info.id)});

_metrics.back().events.push_back(
{.block_index = block_index,
.event_id = event_id,
.flags = aqlprofile_pmc_event_flags_t{0},
.flags = aqlprofile_pmc_event_flags_t{x.flags()},
.block_name = static_cast<hsa_ven_amd_aqlprofile_block_name_t>(query_info.id)});

bool validate_event_result;
Expand All @@ -86,114 +87,45 @@ CounterPacketConstruct::CounterPacketConstruct(rocprofiler_agent_id_t
&validate_event_result) != HSA_STATUS_SUCCESS);
ROCP_FATAL_IF(!validate_event_result)
<< "Invalid Metric: " << block_index << " " << event_id;
_event_to_metric[std::make_tuple(
static_cast<hsa_ven_amd_aqlprofile_block_name_t>(query_info.id),
block_index,
event_id)] = x;
_event_to_metric[_metrics.back().events.back()] = x;
}
}
_events = get_all_events();
}

std::unique_ptr<hsa::CounterAQLPacket>
CounterPacketConstruct::construct_packet(const AmdExtTable& ext)
CounterPacketConstruct::construct_packet(const CoreApiTable& coreapi, const AmdExtTable& ext)
{
auto pkt_ptr = std::make_unique<hsa::CounterAQLPacket>(ext.hsa_amd_memory_pool_free_fn);
auto& pkt = *pkt_ptr;
if(_events.empty())
{
ROCP_TRACE << "No events for pkt";
return pkt_ptr;
}
pkt.empty = false;

const auto* agent_cache =
const auto* agent =
rocprofiler::agent::get_agent_cache(CHECK_NOTNULL(rocprofiler::agent::get_agent(_agent)));
if(!agent_cache)
{
ROCP_FATAL << "No agent cache for agent id: " << _agent.handle;
}

pkt.profile = hsa_ven_amd_aqlprofile_profile_t{
agent_cache->get_hsa_agent(),
HSA_VEN_AMD_AQLPROFILE_EVENT_TYPE_PMC, // SPM?
_events.data(),
static_cast<uint32_t>(_events.size()),
nullptr,
0u,
hsa_ven_amd_aqlprofile_descriptor_t{.ptr = nullptr, .size = 0},
hsa_ven_amd_aqlprofile_descriptor_t{.ptr = nullptr, .size = 0}};
auto& profile = pkt.profile;
if(!agent) ROCP_FATAL << "No agent cache for agent id: " << _agent.handle;

hsa_amd_memory_pool_access_t _access = HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED;
ext.hsa_amd_agent_memory_pool_get_info_fn(agent_cache->get_hsa_agent(),
agent_cache->kernarg_pool(),
ext.hsa_amd_agent_memory_pool_get_info_fn(agent->get_hsa_agent(),
agent->kernarg_pool(),
HSA_AMD_AGENT_MEMORY_POOL_INFO_ACCESS,
static_cast<void*>(&_access));
// Memory is accessable by both the GPU and CPU, unlock the command buffer for
// sharing.
if(_access == HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED)
{
throw std::runtime_error(
fmt::format("Agent {} does not allow memory pool access for counter collection",
agent_cache->get_hsa_agent().handle));
}

CHECK_HSA(hsa_ven_amd_aqlprofile_start(&profile, nullptr), "could not generate packet sizes");
hsa::CounterAQLPacket::CounterMemoryPool pool;

if(profile.command_buffer.size == 0 || profile.output_buffer.size == 0)
{
throw std::runtime_error(
fmt::format("No command or output buffer size set. CMD_BUF={} PROFILE_BUF={}",
profile.command_buffer.size,
profile.output_buffer.size));
}
if(_access == HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED) pool.bIgnoreKernArg = true;

// Allocate buffers and check the results
auto alloc_and_check = [&](auto& pool, auto** mem_loc, auto size) -> bool {
bool malloced = false;
size_t page_aligned = getPageAligned(size);
if(ext.hsa_amd_memory_pool_allocate_fn(
pool, page_aligned, 0, static_cast<void**>(mem_loc)) != HSA_STATUS_SUCCESS)
{
*mem_loc = malloc(page_aligned);
malloced = true;
}
else
{
CHECK(*mem_loc);
hsa_agent_t agent = agent_cache->get_hsa_agent();
// Memory is accessable by both the GPU and CPU, unlock the command buffer for
// sharing.
LOG_IF(FATAL,
ext.hsa_amd_agents_allow_access_fn(1, &agent, nullptr, *mem_loc) !=
HSA_STATUS_SUCCESS)
<< "Error: Allowing access to Command Buffer";
}
return malloced;
};

// Build command and output buffers
pkt.command_buf_mallocd = alloc_and_check(
agent_cache->cpu_pool(), &profile.command_buffer.ptr, profile.command_buffer.size);
pkt.output_buffer_malloced = alloc_and_check(
agent_cache->kernarg_pool(), &profile.output_buffer.ptr, profile.output_buffer.size);
memset(profile.output_buffer.ptr, 0x0, profile.output_buffer.size);

CHECK_HSA(hsa_ven_amd_aqlprofile_start(&profile, &pkt.start), "failed to create start packet");
CHECK_HSA(hsa_ven_amd_aqlprofile_stop(&profile, &pkt.stop), "failed to create stop packet");
CHECK_HSA(hsa_ven_amd_aqlprofile_read(&profile, &pkt.read), "failed to create read packet");
pkt.start.header = HSA_PACKET_TYPE_VENDOR_SPECIFIC << HSA_PACKET_HEADER_TYPE;
pkt.stop.header = HSA_PACKET_TYPE_VENDOR_SPECIFIC << HSA_PACKET_HEADER_TYPE;
pkt.read.header = HSA_PACKET_TYPE_VENDOR_SPECIFIC << HSA_PACKET_HEADER_TYPE;
ROCP_TRACE << fmt::format("Following Packets Generated (output_buffer={}, output_size={}). "
"Start Pkt: {}, Read Pkt: {}, Stop Pkt: {}",
profile.output_buffer.ptr,
profile.output_buffer.size,
pkt.start,
pkt.read,
pkt.stop);
return pkt_ptr;
pool.allocate_fn = ext.hsa_amd_memory_pool_allocate_fn;
pool.allow_access_fn = ext.hsa_amd_agents_allow_access_fn;
pool.free_fn = ext.hsa_amd_memory_pool_free_fn;
pool.api_copy_fn = coreapi.hsa_memory_copy_fn;
pool.fill_fn = ext.hsa_amd_memory_fill_fn;

pool.gpu_agent = agent->get_hsa_agent();
pool.cpu_pool_ = agent->cpu_pool();
pool.kernarg_pool_ = agent->kernarg_pool();

const auto* aql_agent = rocprofiler::agent::get_aql_agent(agent->get_rocp_agent()->id);
if(aql_agent == nullptr) throw std::runtime_error("Could not get AQL agent!");

if(_events.empty()) ROCP_TRACE << "No events for pkt";

return std::make_unique<hsa::CounterAQLPacket>(*aql_agent, pool, _events);
}

ThreadTraceAQLPacketFactory::ThreadTraceAQLPacketFactory(const hsa::AgentCache& agent,
Expand Down Expand Up @@ -255,10 +187,10 @@ ThreadTraceAQLPacketFactory::construct_unload_marker_packet(uint64_t id)
return std::make_unique<hsa::CodeobjMarkerAQLPacket>(tracepool, id, 0, 0, false, true);
}

std::vector<hsa_ven_amd_aqlprofile_event_t>
std::vector<aqlprofile_pmc_event_t>
CounterPacketConstruct::get_all_events() const
{
std::vector<hsa_ven_amd_aqlprofile_event_t> ret;
std::vector<aqlprofile_pmc_event_t> ret;
for(const auto& metric : _metrics)
{
ret.insert(ret.end(), metric.instances.begin(), metric.instances.end());
Expand All @@ -267,11 +199,9 @@ CounterPacketConstruct::get_all_events() const
}

const counters::Metric*
CounterPacketConstruct::event_to_metric(const hsa_ven_amd_aqlprofile_event_t& event) const
CounterPacketConstruct::event_to_metric(const aqlprofile_pmc_event_t& event) const
{
if(const auto* ptr = rocprofiler::common::get_val(
_event_to_metric,
std::make_tuple(event.block_name, event.block_index, event.counter_id)))
if(const auto* ptr = rocprofiler::common::get_val(_event_to_metric, event))
{
return ptr;
}
Expand Down
42 changes: 30 additions & 12 deletions source/lib/rocprofiler-sdk/aql/packet_construct.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,24 @@
#include "lib/rocprofiler-sdk/thread_trace/att_core.hpp"
#include "rocprofiler-sdk/fwd.h"

inline bool
operator==(aqlprofile_pmc_event_t lhs, aqlprofile_pmc_event_t rhs)
{
if(lhs.block_name != rhs.block_name) return false;
if(lhs.block_index != rhs.block_index) return false;
if(lhs.event_id != rhs.event_id) return false;
return lhs.flags.raw == rhs.flags.raw;
}

inline bool
operator<(aqlprofile_pmc_event_t lhs, aqlprofile_pmc_event_t rhs)
{
if(lhs.block_name != rhs.block_name) return lhs.block_name < rhs.block_name;
if(lhs.block_index != rhs.block_index) return lhs.block_index < rhs.block_index;
if(lhs.event_id != rhs.event_id) return lhs.event_id < rhs.event_id;
return lhs.flags.raw < rhs.flags.raw;
}

namespace rocprofiler
{
namespace aql
Expand All @@ -55,11 +73,12 @@ class CounterPacketConstruct
public:
CounterPacketConstruct(rocprofiler_agent_id_t agent,
const std::vector<counters::Metric>& metrics);
std::unique_ptr<hsa::CounterAQLPacket> construct_packet(const AmdExtTable&);
std::unique_ptr<hsa::CounterAQLPacket> construct_packet(const CoreApiTable&,
const AmdExtTable&);

const counters::Metric* event_to_metric(const hsa_ven_amd_aqlprofile_event_t& event) const;
std::vector<hsa_ven_amd_aqlprofile_event_t> get_all_events() const;
const std::vector<aqlprofile_pmc_event_t>& get_counter_events(const counters::Metric&) const;
const counters::Metric* event_to_metric(const aqlprofile_pmc_event_t& event) const;
std::vector<aqlprofile_pmc_event_t> get_all_events() const;
const std::vector<aqlprofile_pmc_event_t>& get_counter_events(const counters::Metric&) const;

rocprofiler_agent_id_t agent() const { return _agent; }

Expand All @@ -73,16 +92,15 @@ class CounterPacketConstruct
protected:
struct AQLProfileMetric
{
counters::Metric metric;
std::vector<hsa_ven_amd_aqlprofile_event_t> instances;
std::vector<aqlprofile_pmc_event_t> events;
counters::Metric metric;
std::vector<aqlprofile_pmc_event_t> instances;
std::vector<aqlprofile_pmc_event_t> events;
};

rocprofiler_agent_id_t _agent;
std::vector<AQLProfileMetric> _metrics;
std::vector<hsa_ven_amd_aqlprofile_event_t> _events;
std::map<std::tuple<hsa_ven_amd_aqlprofile_block_name_t, uint32_t, uint32_t>, counters::Metric>
_event_to_metric;
rocprofiler_agent_id_t _agent;
std::vector<AQLProfileMetric> _metrics;
std::vector<aqlprofile_pmc_event_t> _events;
std::map<aqlprofile_pmc_event_t, counters::Metric> _event_to_metric;
};

class ThreadTraceAQLPacketFactory
Expand Down
41 changes: 39 additions & 2 deletions source/lib/rocprofiler-sdk/aql/tests/aql_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,38 @@ using namespace rocprofiler::counters::test_constants;

namespace rocprofiler
{
AmdExtTable&
get_ext_table()
{
static auto _v = []() {
auto val = AmdExtTable{};
val.hsa_amd_memory_pool_get_info_fn = hsa_amd_memory_pool_get_info;
val.hsa_amd_agent_iterate_memory_pools_fn = hsa_amd_agent_iterate_memory_pools;
val.hsa_amd_memory_pool_allocate_fn = hsa_amd_memory_pool_allocate;
val.hsa_amd_memory_pool_free_fn = hsa_amd_memory_pool_free;
val.hsa_amd_agent_memory_pool_get_info_fn = hsa_amd_agent_memory_pool_get_info;
val.hsa_amd_agents_allow_access_fn = hsa_amd_agents_allow_access;
val.hsa_amd_memory_fill_fn = hsa_amd_memory_fill;
return val;
}();
return _v;
}

CoreApiTable&
get_api_table()
{
static auto _v = []() {
auto val = CoreApiTable{};
val.hsa_iterate_agents_fn = hsa_iterate_agents;
val.hsa_agent_get_info_fn = hsa_agent_get_info;
val.hsa_queue_create_fn = hsa_queue_create;
val.hsa_queue_destroy_fn = hsa_queue_destroy;
val.hsa_signal_wait_relaxed_fn = hsa_signal_wait_relaxed;
return val;
}();
return _v;
}

auto
findDeviceMetrics(const hsa::AgentCache& agent, const std::unordered_set<std::string>& metrics)
{
Expand Down Expand Up @@ -122,7 +154,9 @@ TEST(aql_profile, packet_generation_single)
{
auto metrics = rocprofiler::findDeviceMetrics(agent, {"SQ_WAVES"});
CounterPacketConstruct pkt(agent.get_rocp_agent()->id, metrics);
auto test_pkt = pkt.construct_packet(get_ext_table());
auto test_pkt =
pkt.construct_packet(rocprofiler::get_api_table(), rocprofiler::get_ext_table());

EXPECT_TRUE(test_pkt);
}

Expand All @@ -141,13 +175,15 @@ TEST(aql_profile, packet_generation_multi)
auto metrics =
rocprofiler::findDeviceMetrics(agent, {"SQ_WAVES", "TA_FLAT_READ_WAVEFRONTS"});
CounterPacketConstruct pkt(agent.get_rocp_agent()->id, metrics);
auto test_pkt = pkt.construct_packet(get_ext_table());
auto test_pkt =
pkt.construct_packet(rocprofiler::get_api_table(), rocprofiler::get_ext_table());
EXPECT_TRUE(test_pkt);
}

hsa_shut_down();
}

/*
class TestAqlPacket : public rocprofiler::hsa::CounterAQLPacket
{
public:
Expand Down Expand Up @@ -183,3 +219,4 @@ TEST(aql_profile, test_aql_packet)
// Why is this valid?
TestAqlPacket test_pkt2(false);
}
*/
Loading

0 comments on commit a78753d

Please sign in to comment.