Numerous segfaults/asserts on mesa 23.2 with ANV_VIDEO_DECODE #41

kkartaltepe · 2023-05-17T06:20:37Z

Intel hardware has some initial support for vulkan video decoding but this application appears to fail in numerous ways when attempting to run it on linux+mesa. Most of these appear to be bugs in the sample and not

Configure the program for ANV's queue families and queue count
Attempt to run the sample with a 10s clip of Big Buck Bunny
Crash.

Trying to get the sample working it seems the fixes/workarounds required were

Only MapMemorywhen using generateColorPatternRgba888 in vk_video_decoder/libs/VkCodecUtils/VulkanVideoUtils.cpp ImageObject::FillImageWithPattern, because vkFillYuv.fillVkImage also attempts map. Mapping twice is invalid by spec and fails on mesa.
Fix dependency on undefined behavior in Shell::AcquireBackBuffer where assert(acquireBuf != nullptr); attempts to check if the backbuffer queue was empty. So you get more reasonable crashes.
disable non-fifo present modes in vk_video_decoder/libs/VkShell/Shell.cpp Shell::ResizeSwapchain, the code appears to depend on AcquireNextImage blocking so if other modes are present it will overrun the backbuffer queue and hang or crash. (there is a vsync option in the config but this doesnt appear to affect decode presentation).
Reduce DPB size checking in vk_video_decoder/libs/NvVideoParser/include/VulkanH264Decoder.h dpb_full() since the newly allocated reference frame has state == 0 it is not counted here despite being counted in the dpb's accounting. The next decode step will then assert that the DPB is out of slots and crash when it attempts to allocate another reference frame.

That should cover everything I needed to get the sample decoder running successfully. It should be noted that incorrect handling of ycbcr sampler attachment, synchronization of cmdbufs, image ownership transitions (due to separate video and present queues), and more result in a large amount of validator noise but the frames presented appear more or less correct.

Thanks for writing up the sample as well it was convenient to have something to test the new functionality with.

The text was updated successfully, but these errors were encountered:

zlatinski · 2023-05-17T17:22:15Z

Thank you for your feedback and suggestions for app improvements, Kurt!
Would you mind submitting a pull request with all of the changes you'd like to see in the sample?

kkartaltepe · 2023-05-18T00:43:58Z

Sure im happy to submit fixes for 1,2 which are are correct. But 3,4 seem like they require more understanding of the pipeline that I don't have (why is dbp_full() called when it is, why are states not set at that time, or what is the synchronization supposed to be between present and decode in the non-fifo case). The fixes of disabling non-fifo modes and reducing dpb size to 15 are definitely not correct at least.

zlatinski · 2023-05-20T00:16:15Z

Hi Kurt, what is the MESA driver that you are using - what version/distro and HW (Intel, AMD)?

About 4.

The next decode step will then assert that the DPB is out of slots and crash when it attempts to allocate another reference frame.

How many DPB slots are allocated for your video content? Also, if the decode queue keeps running and the display does not consume the frames, the decoder will eventually run out of images.

kkartaltepe · 2023-05-20T00:32:25Z

Hi Kurt, what is the MESA driver that you are using - what version/distro and HW (Intel, AMD)?

ANV_VIDEO_DECODE=1 env var can be used to enable the vulkan video decode queues on intel's anv driver. This is with the as of yet unreleased 23.2, testing was done on master branch from about a week back. The rest of the system was standard arch linux.

How many DPB slots are allocated for your video content?

16, its H264 content.

Also, if the decode queue keeps running and the display does not consume the frames, the decoder will eventually run out of images.

I don't think the synchronization is a fault for this particular issue, since it occurs without presentation running (and presentation is on a 120hz display versus 30hz content). The dpb_full() check is used to reclaim slots during VulkanH264Decoder::dpb_picture_end which is after allocation of slots in VulkanVideoParser::FillDpbH264State, but doesnt appear to take into account the newly allocated slots so it wont free any slots despite the dpb having 16 reference frames allocated. So when VulkanVideoParser::FillDpbH264State gets called and expects to be able to allocate, instead it asserts.

zlatinski · 2023-05-23T03:31:41Z

Hi Kurt,
I have some fixes submitted based on your feedback. I'll have a look at dpb_full() again, but I doubt there is an issue with that. 16 total may not be enough, based on the content - please try 24 - just to see if it fixes your issues. Can you please provide the value of pnvsi->nMinNumDpbSlots from VulkanVideoParser::BeginSequence() after you are running your content?

kkartaltepe · 2023-05-23T04:04:32Z

This is the reproducing encoding: https://test-videos.co.uk/vids/bigbuckbunny/mp4/h264/1080/Big_Buck_Bunny_1080_10s_1MB.mp4

Sorry, but on master it seems the code still crashes:

in vk_video_decoder/libs/VkCodecUtils/pattern.cpp:392, with sampleYPtr pointing to 0x1. This seems to go away if I add check on the return value of MapMemory in vk_video_decoder/libs/VkCodecUtils/pattern.cpp:686
in Shell::AcquireBackBuffer(), now on Assertion !"Swapchain queue is empty!"'.

With noPresent = true, it seems the code runs to completion without crashing though.

zlatinski · 2023-05-23T16:46:12Z

in vk_video_decoder/libs/VkCodecUtils/pattern.cpp:392, with sampleYPtr pointing to 0x1. This seems to go away if I add check on the return value of MapMemory in vk_video_decoder/libs/VkCodecUtils/pattern.cpp:686

Sorry, I've missed one YUV conversion mapping location. It is fixed now.

zlatinski · 2023-05-23T16:51:28Z

in Shell::AcquireBackBuffer(), now on Assertion !"Swapchain queue is empty!"'

I don't have a setup to try the Intel's anv MESA driver. I'll make an enquiry to Intel to see if I can find somebody who can look at these two issue.

airlied · 2023-05-24T02:38:45Z

@zlatinski I think the image count calcs are wrong.

caps.maxImageCount can legally be 0, so you shouldn't limit things to max if it's 0.

airlied · 2023-05-24T05:12:41Z

I also think the mailbox present might be wrong, but I'm not sure how best to fix it, I'm a bit rusty here.

It looks like you acquire all the images here in mailbox mode, which is fine, since only one image will be waiting for the display, but the code has no throttling. I'd suggest just using FIFO for vsync mode for now

zlatinski · 2023-05-24T20:28:14Z

maxImageCount

OK, thanks, Dave. Fixed that.

zlatinski · 2023-05-24T20:31:16Z

I also think the mailbox present might be wrong, but I'm not sure how best to fix it, I'm a bit rusty here.

It looks like you acquire all the images here in mailbox mode, which is fine, since only one image will be waiting for the display, but the code has no throttling. I'd suggest just using FIFO for vsync mode for now

The code actually waits on the display images to be flipped in Shell::AcquireBackBuffer() after obtaining the next image.

    // wait until acquire and render semaphores are waited/unsignaled
    AssertSuccess(m_ctx.devCtx->WaitForFences(*m_ctx.devCtx, 1, &acquireBuf->m_fence, true, UINT64_MAX));

airlied · 2023-05-24T20:42:27Z

you are relying on acquireNextImage blocking though once you've allocated 3 images but not presented any of them

I'm not sure what the exact semantics of MAILBOX are here, but we aren't blocking on acquireNextImage at all, so you acquires a bunch of images before ever getting a chance to present them, so run out and hit the assert

zlatinski · 2023-05-25T16:24:48Z

but we aren't blocking on acquireNextImage at all, so you acquires a bunch of images before ever getting a chance to present them, so run out and hit the assert

Oho, Now I see where the issue is. Thanks, @airlied! I can add code to monitor the queue level on the display swap side and wait on the way in when it becomes close to empty. The alternative would be to increase the number of swapchain images, but I don't think we want to do that for resource usage and presentation latency reasons.

charlie-ht · 2023-06-21T10:19:00Z

I tried this with my CoffeeLake-H GT2 [UHD Graphics 630] but I am getting various errors, the first one in noPresent mode is a crash in anv on the VkVideoDecoder::CopyOptimalToLinearImage path (getting the validation layers clean first might better explain this one, but it seems an issue in the queue mgmt of the sample app),

│      126  static void                                                                                                                                                                     │
│      127  anv_blorp_batch_init(struct anv_cmd_buffer *cmd_buffer,                                                                                                                         │
│      128                       struct blorp_batch *batch, enum blorp_batch_flags flags)                                                                                                   │
│      129  {                                                                                                                                                                               │
│      130     if (!(cmd_buffer->queue_family->queueFlags & VK_QUEUE_GRAPHICS_BIT)) {                                                                                                       │
│  >   131        assert(cmd_buffer->queue_family->queueFlags & VK_QUEUE_COMPUTE_BIT);                                                                                                      │
│      132        flags |= BLORP_BATCH_USE_COMPUTE;                                                                                                                                         │
│      133     }                                                                                                                                                                            │
│      134                                                                                                                                                                                  │
│      135     blorp_batch_init(&cmd_buffer->device->blorp, batch, cmd_buffer, flags);                                                                                                      │
│      136  }

The copy command was recorded inside a pool created with this queue family,

        queueProperties[1]:
        -------------------
                minImageTransferGranularity = (1,1,1)
                queueCount                  = 1
                queueFlags                  = QUEUE_VIDEO_DECODE_BIT_KHR
                timestampValidBits          = 36
                present support             = true
                VkQueueFamilyGlobalPriorityPropertiesKHR:
                -----------------------------------------
                        priorityCount  = 2
                        priorities: count = 2
                                QUEUE_GLOBAL_PRIORITY_LOW_KHR
                                QUEUE_GLOBAL_PRIORITY_MEDIUM_KHR

                VkQueueFamilyQueryResultStatusProperties2KHR:
                ---------------------------------------------
                        queryResultStatusSupport = true

                VkVideoQueueFamilyProperties2KHR:
                ---------------------------------
                        videoCodecOperations: count = 2
                                VIDEO_CODEC_OPERATION_DECODE_H264_BIT_EXT
                                VIDEO_CODEC_OPERATION_DECODE_H265_BIT_EXT

Note it doesn't advertise transfer support (I thought this might have been an implicit requirement of the decode queue, but was missing it in the spec). Anyway, the anv driver is expecting compute capability on a queue that is not graphics. Should the sample app check for queue requirements better here, and maybe use a separate queue for copies on this hardware?

BattleAxeVR · 2023-06-21T20:28:43Z

+1 on fixing all the validation errors. There's no way a v1.0 shipped version of anything should have any vulkan warnings, let alone errors. I've fixed a few myself (missing pnext struct pointer in one case), but I actually have to disable the VK validation layer when I enable Vulkan Video decoding in my game engine to make sense of anything.

I still have a way more fundamental problem, namely that I'm using HLSL (via DXC -> SPV compilation) for all my ray tracing shaders and I still can't get the immutable 420 / ycbcr decoding sampler to actually decode it to RGB inline. All I see is red = one channel. Instead I have to use a single GLSL fullscreen shader to decode it to an extra copy surface then use its texture from my RTX (HLSL-based) shaders. That extra copy is irritating and unfortunate, although OTOH computing the mips are actually useful (I'm using vulkan video to decode 120 FPS HDR torch flame videos from Embergen, for my RPG and using those to light up the scene with full GI / multiple bounces etc). I'd love to know if anyone else out there has managed to compile an HLSL shader that can decode 420 video properly from Vulkan Video sources. Every one I've come across, including this sample, is GLSL -> glsllangvalidator.exe ----> SPV pipeline.

charlie-ht · 2023-06-22T13:41:57Z

+1 on fixing all the validation errors.

Since I'm stuck on something else related to the samples app, I made a start on this here: #48

WIP.

Appears to be missed in f289815. The memory should be mapped and unmapped by VulkanDeviceMemoryImpl. Related to nvpro-samples#41. Signed-off-by: Benjamin Cheng <[email protected]>

lolzballs · 2023-06-29T22:36:22Z

I can also reproduce Assertion !"Swapchain queue is empty!"' on both radv and amdvlk. Seems like the same MAILBOX present mode issues that is seen on Intel.

zlatinski · 2023-12-14T01:04:41Z

Fixes for the display swapchain MAILBOX issue from someone who has the HW setup are welcome.

zlatinski · 2023-12-14T02:58:16Z

Can you please try the display again at 1d9eb74?

Appears to be missed in f289815. The memory should be mapped and unmapped by VulkanDeviceMemoryImpl. Related to nvpro-samples#41. Signed-off-by: Benjamin Cheng <[email protected]>

Appears to be missed in f289815. The memory should be mapped and unmapped by VulkanDeviceMemoryImpl. Related to #41. Signed-off-by: Benjamin Cheng <[email protected]>

kkartaltepe mentioned this issue May 18, 2023

Fixes for running on mesa #42

Closed

lolzballs mentioned this issue Jun 29, 2023

Fix double vkUnmap calls #50

Merged

zlatinski pushed a commit that referenced this issue Apr 5, 2024

Fix double vkUnmap calls

37b2fa9

Appears to be missed in f289815. The memory should be mapped and unmapped by VulkanDeviceMemoryImpl. Related to #41. Signed-off-by: Benjamin Cheng <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Numerous segfaults/asserts on mesa 23.2 with ANV_VIDEO_DECODE #41

Numerous segfaults/asserts on mesa 23.2 with ANV_VIDEO_DECODE #41

kkartaltepe commented May 17, 2023 •

edited

Loading

zlatinski commented May 17, 2023

kkartaltepe commented May 18, 2023

zlatinski commented May 20, 2023

kkartaltepe commented May 20, 2023

zlatinski commented May 23, 2023

kkartaltepe commented May 23, 2023 •

edited

Loading

zlatinski commented May 23, 2023

zlatinski commented May 23, 2023

airlied commented May 24, 2023

airlied commented May 24, 2023

zlatinski commented May 24, 2023

zlatinski commented May 24, 2023

airlied commented May 24, 2023

zlatinski commented May 25, 2023

charlie-ht commented Jun 21, 2023

BattleAxeVR commented Jun 21, 2023 •

edited

Loading

charlie-ht commented Jun 22, 2023

lolzballs commented Jun 29, 2023

zlatinski commented Dec 14, 2023

zlatinski commented Dec 14, 2023

Numerous segfaults/asserts on mesa 23.2 with ANV_VIDEO_DECODE #41

Numerous segfaults/asserts on mesa 23.2 with ANV_VIDEO_DECODE #41

Comments

kkartaltepe commented May 17, 2023 • edited Loading

zlatinski commented May 17, 2023

kkartaltepe commented May 18, 2023

zlatinski commented May 20, 2023

kkartaltepe commented May 20, 2023

zlatinski commented May 23, 2023

kkartaltepe commented May 23, 2023 • edited Loading

zlatinski commented May 23, 2023

zlatinski commented May 23, 2023

airlied commented May 24, 2023

airlied commented May 24, 2023

zlatinski commented May 24, 2023

zlatinski commented May 24, 2023

airlied commented May 24, 2023

zlatinski commented May 25, 2023

charlie-ht commented Jun 21, 2023

BattleAxeVR commented Jun 21, 2023 • edited Loading

charlie-ht commented Jun 22, 2023

lolzballs commented Jun 29, 2023

zlatinski commented Dec 14, 2023

zlatinski commented Dec 14, 2023

kkartaltepe commented May 17, 2023 •

edited

Loading

kkartaltepe commented May 23, 2023 •

edited

Loading

BattleAxeVR commented Jun 21, 2023 •

edited

Loading