Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove extension defines #148

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

bashbaug
Copy link
Contributor

@bashbaug bashbaug commented Sep 8, 2021

Fixes #146

This PR removes many of the extension defines from the C++ bindings and enables support for extensions based on defines in the OpenCL headers instead.

Specifically, the following extensions defines are removed:

  • CL_HPP_USE_CL_DEVICE_FISSION (for cl_ext_device_fission)
  • CL_HPP_USE_CL_SUB_GROUPS_KHR (for cl_khr_subgroups)
  • CL_HPP_USE_IL_KHR (for cl_khr_il_program)
  • CL_HPP_USE_CL_IMAGE2D_FROM_BUFFER_KHR (for cl_khr_image2d_from_buffer)

The extension define for CL_HPP_USE_DX_INTEROP (for cl_khr_d3d10_sharing) has not been removed since the DX interop APIs require including the DX-specific header and hence should be enabled conditionally.

Some implementation notes:

  • We can differentiate between the core OpenCL 1.2 sub-device creation and the cl_ext_device_fission sub-device creation based on the function overload, since cl_device_partition_property is a different type than cl_device_partition_property_ext (intptr_t vs. cl_ulong).
  • For subgroups, we'll try the core OpenCL 2.1 APIs first if we're compiling for OpenCL 2.1 or newer and the platform is an OpenCL 2.1 platform, and we'll fall back to the extension APIs if the core APIs are unsupported or return an error.
  • Similarly, for IL programs, we'll try the core OpenCL 2.1 APIs first and only fall back to the extension APIs if the core APIs are unsupported or return an error.
  • We do not currently have a runtime check for the extension. I did add a check whether the extension API function pointer was successfully retrieved, though.
  • Because cl_khr_image2d_from_buffer uses the same clCreateImage API for both the core and extension version there is no need to try both. Additionally, there is no runtime check, and the API call will either succeed or return an error.

I think all of this is reasonble and correct but please double-check my reasoning!

A note about testing:

  • I added tests for the different codepaths for cl_khr_subgroups and cl_khr_il_program, but this admittedly isn't perfect: because the C++ bindings store the extension API function pointers in static variables they are "stateful" and therefore must run in a specific order. We should rethink how extensions are handled to improve robustness, especially because the C++ bindings currently will not work for many extension APIs from multiple platforms.

@bashbaug bashbaug force-pushed the no-extension-defines branch from 898d7db to 46775f8 Compare September 15, 2021 18:04
@bashbaug bashbaug marked this pull request as draft October 3, 2021 23:23
@bashbaug
Copy link
Contributor Author

bashbaug commented Oct 3, 2021

I've found a few issues with this PR that I need to fix:

  1. CL_HPP_INIT_CL_EXT_FCN_PTR_ uses clGetExtensionFunctionAddress to get the function pointer, but this was declared deprecated, leading to a deprecation warning when compiling for some combinations of the minimum and target version of OpenCL.
  2. I'm using getDevicePlatformVersion to check whether the core API or extension API should be called, but this is only defined for some combinations of the minimum and target version of OpenCL.

I'll mark this PR as ready for review when I have solved these issues.

The helper functions to get the platform version are used in so
many places they cannot be protected by ifdefs for specific target
OpenCL versions or minimum OpenCL versions.
@bashbaug bashbaug marked this pull request as ready for review October 14, 2021 05:41
@bashbaug
Copy link
Contributor Author

i think this is ready for review again. I've been using this version for a week or so and it seems to be working OK, though I admittedly haven't tested the DX interop very well.

One note: We have some functions to get the platform version that used to be protected by ifdefs in some cases. They're used a lot more now, though, so much that it's not practical to protect them by ifdefs anymore. I think this is OK, but it's one area to pay attention to, if anyone remember why they were protected by ifdefs to begin with. Thanks!

Copy link
Contributor

@MathiasMagnus MathiasMagnus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. Minor stylistic suggestion (can be downvoted) and a suggestion to defer a fix which may cover multiple functions, not just the changed one.

Comment on lines +2405 to 2415
if (devices) {
devices->resize(ids.size());

// Assign to param, constructing with retain behaviour
// to correctly capture each underlying CL object
for (size_type i = 0; i < ids.size(); i++) {
// We do not need to retain because this device is being created
// by the runtime
(*devices)[i] = Device(ids[i], false);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this function is unfortunate in multiple ways, some of which are breaking so I won't list. However:

  • if the user doesn't supply a vector, then the entire function is meaningless and we should shortcircuit the entire thing and not leak subdevice handles.
  • the SDK utility library introduced new error codes for cases when the wrapper itself is misused. This function really has no meaning when a nullptr is provided by the user and we should likely signal that. Whether we repurpose an error code from clCreateSubDevices or start making our own, it's up to us. Silently leaking resources seems wrong.

I'm fine deferring this into a separate issue.

Comment on lines +2406 to +2414
devices->resize(ids.size());

// Assign to param, constructing with retain behaviour
// to correctly capture each underlying CL object
for (size_type i = 0; i < ids.size(); i++) {
// We do not need to retain because this device is being created
// by the runtime
(*devices)[i] = Device(ids[i], false);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spirit of no raw loops. Mild bikeshedding, feel free to discard if more trouble than use.

Suggested change
devices->resize(ids.size());
// Assign to param, constructing with retain behaviour
// to correctly capture each underlying CL object
for (size_type i = 0; i < ids.size(); i++) {
// We do not need to retain because this device is being created
// by the runtime
(*devices)[i] = Device(ids[i], false);
}
devices->reserve(ids.size());
// Assign to param, constructing with retain behaviour
// to correctly capture each underlying CL object
std::transform(ids.cbegin(), ids.cend(), std::back_inserter(devices), [](const cl_device_id& id) {
// We do not need to retain because this device is being created
// by the runtime
return Device(id, false);
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind you, this requires algorithm. iterator is already included.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

revisit handling for extensions
2 participants