Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Make streamer test to use concurrent metric groups #123

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions utils/test_harness/tools/include/test_harness_metric.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,9 @@ void metric_validate_streamer_marker_data(
std::vector<uint32_t> &streamerMarkerValues,
uint32_t &streamer_marker_values_index);

std::vector<zet_metric_group_handle_t> get_concurrent_metric_group(
ze_device_handle_t device,
std::vector<zet_metric_group_handle_t> &metricGroupHandleList);
}; // namespace level_zero_tests

#endif /* TEST_HARNESS_SYSMAN_METRIC_HPP */
41 changes: 40 additions & 1 deletion utils/test_harness/tools/src/test_harness_metric.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,31 @@ bool check_metric_type_ip(ze_device_handle_t device, std::string groupName,
return check_metric_type_ip(groupHandle, includeExpFeature);
}

std::vector<zet_metric_group_handle_t> get_concurrent_metric_group(
ze_device_handle_t device,
std::vector<zet_metric_group_handle_t> &metricGroupHandleList) {

uint32_t concurrentGroupCount = 0;
EXPECT_EQ(ZE_RESULT_SUCCESS,
zetDeviceGetConcurrentMetricGroupsExp(
device, metricGroupHandleList.size(),
metricGroupHandleList.data(), nullptr, &concurrentGroupCount));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you know what size to allocate for metricGroupHandleList ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its coming as a reference to get_concurrent_metric_group method right. So its already appropriately allocated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok its defined by metricGroupHandleList.size()

std::vector<uint32_t> countPerConcurrentGroup(concurrentGroupCount);
EXPECT_EQ(ZE_RESULT_SUCCESS,
zetDeviceGetConcurrentMetricGroupsExp(
device, metricGroupHandleList.size(),
metricGroupHandleList.data(), countPerConcurrentGroup.data(),
&concurrentGroupCount));

std::vector<zet_metric_group_handle_t> concurrentMetricGroupList;
uint32_t metricGroupCountInConcurrentGroup = countPerConcurrentGroup[0];

for (uint32_t i = 0; i < metricGroupCountInConcurrentGroup; i++) {
concurrentMetricGroupList.push_back(metricGroupHandleList[i]);
}
return concurrentMetricGroupList;
}

std::vector<metricGroupInfo_t> optimize_metric_group_info_list(
std::vector<metricGroupInfo_t> &metricGroupInfoList,
uint32_t percentOfMetricGroupForTest, const char *metricGroupName) {
Expand Down Expand Up @@ -270,6 +295,7 @@ get_metric_group_info(ze_device_handle_t device,
get_metric_group_handles(device);

std::vector<metricGroupInfo_t> matchedGroupsInfo;
std::vector<zet_metric_group_handle_t> concurrentMetricGroupHandles;

for (auto metricGroupHandle : metricGroupHandles) {
zet_metric_group_properties_t metricGroupProp = {};
Expand All @@ -288,12 +314,25 @@ get_metric_group_info(ze_device_handle_t device,
continue;
}

concurrentMetricGroupHandles.push_back(metricGroupHandle);
matchedGroupsInfo.emplace_back(
metricGroupHandle, metricGroupProp.name, metricGroupProp.description,
metricGroupProp.domain, metricGroupProp.metricCount);
}

return matchedGroupsInfo;
concurrentMetricGroupHandles =
Copy link
Contributor

@matcabral matcabral Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the rationale to change existing tests to use getConcurrentMetricGroups() instead of creating a new test? I think the value of this API will be when we have platforms with multiple metrics domains (or metrics sources) and effectively confirm the groups can be collected concurrently by opening different streamers (one per group) and/or tracers (with many groups )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is not to test the functionality of getConcurrentMetricGroups. Currently in all streamer tests we are iterating over all metric groups. This was resulting in timeouts on Elmo runs. Hence the idea here is to somehow reduce the run-time by reducing the metric groups that the tests run on. Santhosh had suggested we can just run for 1 metric group from each source, which should be enough for all the tests. This new implementation does exactly this. It takes the first concurrent group returned by the getConcurrentMetricGroups API and only uses that for the tests. This today is expected to have 1 from OA and 1 from IP sampling.

  uint32_t metricGroupCountInConcurrentGroup = countPerConcurrentGroup[0];

  for (uint32_t i = 0; i < metricGroupCountInConcurrentGroup; i++) {
    concurrentMetricGroupList.push_back(metricGroupHandleList[i]);
  }
  return concurrentMetricGroupList;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, makes sense. Just a few comments:

  • Leave some comments in the code to make your explanation easy to find. No need to mention Elmo, but the idea.
  • There is an env var to reduce the % of tests used. How does it impact here?

get_concurrent_metric_group(device, concurrentMetricGroupHandles);
std::vector<metricGroupInfo_t> concurrentMatchedGroupsInfo;

for (auto groupsInfo : matchedGroupsInfo) {
if (count(concurrentMetricGroupHandles.begin(),
concurrentMetricGroupHandles.end(),
groupsInfo.metricGroupHandle)) {
concurrentMatchedGroupsInfo.push_back(groupsInfo);
}
}

return concurrentMatchedGroupsInfo;
}

std::vector<metricGroupInfo_t> get_metric_type_ip_group_info(
Expand Down
Loading