Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primary node clarification for VK_EXT_physical_device_drm #2447

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

enunes
Copy link

@enunes enunes commented Oct 21, 2024

The extension already hinted that the primary node represents display capabilities as its intended purpose, but the current text let it ambiguous enough that there has been confusion and divergence in proposed implementations.

Add clarification notes to further define the scope of primary node in the context of this extension.

There has been a long discussion about this issue mainly at https://gitlab.freedesktop.org/mesa/mesa/-/issues/9920 .
That discussion has evolved into solving many problems of split display/render systems, but here I would like to avoid doing that again and focusing only on defining the scope of this extension.

I also discussed with the involved people again recently about proposing this note to the extension.

cc @emersion @cubanismo @fooishbar

Other references:
https://people.freedesktop.org/~cbrill/dri-log/?channel=dri-devel&highlight_names=&date=2023-07-25&show_html=true
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25431
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31635

@CLAassistant
Copy link

CLAassistant commented Oct 21, 2024

CLA assistant check
All committers have signed the CLA.

Physical devices may have no primary node (e.g. if the device does not have
a display subsystem), may have no render node (e.g. if it is a software
rendering engine), or may have neither (e.g. if it is a software rendering
engine without a display subsystem).
Physical devices which don't have a display subsystem implemented as part of
the same underlying DRM device report that as hasPrimary=false.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not what our implementation is doing right now. Is this what other drivers are doing currently?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not what drivers are currently doing but what I meant to put explicitly from what was hinted in the next paragraph (Physical devices may have no primary node (e.g. if the device does not have a display subsystem)).

To my knowledge what drivers in Mesa are doing is:

  • v3dv: uses the other display DRM device as the primary. This is what we would like to fix here and prevent other drivers from following and applications to rely on.
  • panvk: uses the DRM primary (cardN) node of the same render device as the primary, because it technically exists and from the DRM terminology it is in fact the "DRM primary node" (but is not a "display subsystem").
  • pvr: same as panvk (but still pending to upstream Mesa).

I guess what panvk and pvr do is also acceptable to standardize on while similarly achieving the same goal, if we want to settle on something based on existing implementations. If so we also need to change the text to explicitly say that and probably remove parts of the text which associate primary devices as being display subsystems.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither panvk nor pvr are (yet) conformant, and require scary environment variables to unmask them, so I'm comfortable with changing their behaviour to match new wording.

Each physical device is associated with a single underlying DRM device, which
may have one primary node and one render node.
The primary node, in the context of this extension, refers to the capability of
that device node to be used as a display subsystem.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this saying you can return primary node and render node from separate DRM devices on the same Vulkan physical device or not? It's not clear to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this snippet I just meant to clarify the terminology for the context of the extension. As in "VK_EXT_physical_device_drm primary node" != "DRM primary node". This is also due to the existing sentence which says Physical devices may have no primary node (e.g. if the device does not have a display subsystem).

For the overall extension I want to say that it should not mix and return primary node and render node from separate DRM devices on the same Vulkan physical device.

I'm open to suggestions for phrasing this the best way possible.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the overall extension I want to say that it should not mix and return primary node and render node from separate DRM devices on the same Vulkan physical device.

This seems fine to me. I'm less clear whether it would be useful to hide the primary node for DRM devices without a display engine. I don't recall whether there was a leaning or consensus on what was preferred by compositor authors from the FDO issue, but I'd probably be fine with whatever others prefer with the caveat that if we alter the language to prohibit advertising a primary node on displayless GPUs, we should probably bump the extension version so apps can easily detect older drivers that might do something different.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm thinking now that it might be simpler to just drop the "hide the primary node for DRM devices without a display engine" part from this proposal. That is not a strictly needed change, so this keeps some implementations compliant and is still enough to achieve the goal. I guess applications can still check with the DRM API when they get a primary node whether it is a "display subsystem" or not, so this information does not need to be handled by this extension.

So I would modify this MR to propose: 1. focus on clarifying the language and decoupling "primary node" from "display subsystem" so that non-display-subsystem primary nodes of the same DRM render device are unambiguously allowed to go on the primary node if they exist; and 2. still explicitly mention that it should not mix and return primary node and render node from separate DRM devices on the same Vulkan physical device.

I'm fine bumping the extension version in any case. Is that just adding an entry in the changelog for VK_EXT_physical_device_drm.adoc or is the version tracked somewhere else?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version is tracked in the XML file, which generates content in vulkan_core.h.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm pretty comfortable with the suggested language; users can use libdrm to map between render nodes and primary nodes of the same device.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, the important bit is to forbid advertising two separate DRM devices with this extension. I don't have a strong preference regarding the primary device being exposed only if the device is KMS-capable. Some notes:

  • If the device is not KMS-capable, the primary node is fairly useless.
  • In general, applications can't expect to have the permission to open the primary node.

The primary node, in the context of this extension, refers to the capability of that device node to be used as a display subsystem.

That's a bit confusing IMHO. Maybe instead of trying to override what having a primary node means, we could add "If the DRM device is KMS-capable" elsewhere in the extension?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I have a slightly stronger opinion than I thought here. Right now our logic to deduce the primary node and advertise it here doesn't have to know whether there's a display subsystem on the GPU in question. If we have to go and toggle that logic based on whether there's a display subsystem associated with the GPU, which is slightly more than a trivial amount of work in the driver.

So in other words, I'm fine with not advertising two separate DRM devices in primary and render, but I'd have a preference for not requiring drivers to hide primary nodes for displayless GPUs, because it's not what our driver does now and because it's non-trivial to switch to that behavior. Looking at it separately from the principle of least surprise POV, I agree the primary node isn't necessarily that useful on displayless GPUs, and apps can't always access it, but it's there. It seems a little weird to hide it in Vulkan when it isn't hidden on the filesystem.

That said, if everyone is set on the idea of hiding it, I could be persuaded. I'd probably want some time to go make sure there aren't any larger hurdles than anticipated finding the "has display" boolean early enough to set this field up properly though before giving my final thumbs up.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the last (current as of now) update to this PR, I dropped the part about hiding the primary device. After the first round of comments I also realized that explicitly hiding it only makes things more complicated for little benefit.

The current proposal is to just expose the primary node of the render device anyway since it is there. So the focus of the change now is to just expliticly write to not advertise nodes from two different DRM devices.

The extension hinted that the primary node represents display
capabilities as its intended purpose, but the current text let it
ambiguous enough that there has been confusion and proposed
implementations which expose different devices for the primary node.

Add clarification notes to further define the scope of primary node
in the context of this extension.

Signed-off-by: Erico Nunes <[email protected]>
@enunes enunes force-pushed the note-VK_EXT_physical_device_drm branch from ed873ba to 662f8b1 Compare November 7, 2024 16:33
@enunes
Copy link
Author

enunes commented Nov 7, 2024

I pushed a new version considering the points I mentioned in comment #2447 (comment)

Since there is no strong need to hide the primary node (assuming that it belongs to the same DRM device) I think now that it is more intuitive to just include it there, since the primary node does exist and likely is not going away soon. So we don't introduce an entirely new rule here and have to go change every other implementation.

This new revision is less intrusive and just removes the term "display subsystem" and its association with "primary node". It also now explicitly forbids other DRM devices being used, so the only possible option is to expose the primary node of the same device. If it is not useful applications can ignore it, but I guess it is less harm it being there than having to introduce more potentially breaking changes for this extension.

Each physical device can only be associated with a single underlying DRM
device, which may have one primary node and one render node.

Physical devices may have no primary node (e.g. it is not associated with a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that case is possible, actually. A render node always has a primary node.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I think the only case is the same as a software implementation which does not use DRM. Is it better to switch this text to something like "it always has the primary node if it is associated with a DRM device"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants