-
Notifications
You must be signed in to change notification settings - Fork 302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Use vt for manually decoding frames. Fixes #533 #535
base: master
Are you sure you want to change the base?
Conversation
df3e5c1
to
a8af6ad
Compare
use pts from moonlight server to schedule frame display use decompression callback unused frameRef field to propagate frameType information Use obj-c cb for decode session Revert to direct decode, use PTS correctly
a8af6ad
to
31560a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are looking pretty good.
I think this has the potential to improve the frame pacing option too. Now that we have access to the decoded samples, we can keep a queue of those and submit those in our display link callback. Since we were queuing compressed video samples before, there was the potential that we could miss a frame deadline if the decoder couldn't finish decoding that frame by the time it was due to be displayed.
@@ -378,4 +373,65 @@ - (int)submitDecodeBuffer:(unsigned char *)data length:(int)length bufferType:(i | |||
return DR_OK; | |||
} | |||
|
|||
- (OSStatus) decodeFrameWithSampleBuffer:(CMSampleBufferRef)sampleBuffer frameType:(int)frameType{ | |||
VTDecodeFrameFlags flags = kVTDecodeFrame_EnableAsynchronousDecompression; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does async decompression result in improved performance vs synchronous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I didn't compare sync/async here because I couldn't find a reliable way to measure performance other than my gut feeling. Any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not every frame is decompressed asynchronously, this is the correct setting to increase speed.
|
||
OSStatus res = CMVideoFormatDescriptionCreateForImageBuffer(kCFAllocatorDefault, imageBuffer, &formatDescriptionRef); | ||
if (res != noErr){ | ||
NSLog(@"Failed to create video format description from imageBuffer"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change the NSLog()
calls to our Log()
function instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also call LiRequestIdrFrame() here?
} else { | ||
// I-frame | ||
CFDictionarySetValue(dict, kCMSampleAttachmentKey_NotSync, kCFBooleanFalse); | ||
CFDictionarySetValue(dict, kCMSampleAttachmentKey_DependsOnOthers, kCFBooleanFalse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These attributes should be set on the H.264/HEVC CMSampleBuffer
that we pass to the VTDecompressionSession
rather than the CMSampleBuffer
we pass to AVBufferSampleDisplayLayer
(which is now just raw YUV data).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes total sense! will change it
Plot Twist: As part of tackling the improvements you suggested @cgutman, I hit the reason for the original issue #533. Still unsure about the root cause, but it's these lines: I ran some tests, and we don't even need to set any value in the dict; only getting it will cause the decoder to go nuts:
This is enough to make the decoder fail when the sample buffers contain HDR data. I imagine it must be due to some OS bug. If I remove the whole block, the decoder will work just fine. Given that:
|
Removing those lines should be fine.
Yep, let's do that to fix #533 ASAP and we can see if using a VTDecompressionSession improves things further vs the current pure AVSampleBufferDisplayLayer solution. Do you see a frame pacing regression using the PTS info with the pacing option enabled? If so, we can just use this solution for HDR streaming only for now. |
@cgutman as part of the changes, I wanted to test different ways of decoding and rendering, using VT to manually decode and update a CALayer with the resulting image, continue to pass the encoded buffer directly to AVSampleBufferDisplayLayer, and then measure the latency of each approach. |
I suppose you could use a phone in slow motion mode. For now though, let's try to get HDR on the new Apple TV, then we can fine tune things later. Can you send your basic PR with just the HDR fix? |
Going to build and test the lowest latency option on my Apple TV 4K 2021 with MoCA setup and report back! |
@felipejfc do the latency improvements only apply to the 2022 Apple TV 4K? |
@Starlank the changes here and in the other PR should not improve latency; but should improve stream "smoothness" using the low latency pacing mode. I'm currently studying latency improvements locally. |
|
||
- (void) setupDecompressionSession { | ||
if (decompressionSession != NULL){ | ||
VTDecompressionSessionInvalidate(decompressionSession); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is also necessary to do the following:
VTDecompressionSessionWaitForAsynchronousFrames(decompressionSession);
} | ||
|
||
// Enqueue the next frame | ||
[self->displayLayer enqueueSampleBuffer:sampleBuffer]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a flush before the [self->displayLayer enqueueSampleBuffer:sampleBuffer] function.
if (![self->displayLayer isReadyForMoreMediaData]) {
[self->displayLayer flush];
}
Sometimes it happens that not all data is played, and the buffer fills up, so the playback stops.
} | ||
|
||
CMSampleBufferRef sampleBuffer; | ||
CMSampleTimingInfo sampleTiming = {kCMTimeInvalid, presentationTimestamp, presentationDuration}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend using CACurrentMediaTime() for timing info.
In this way, the frames will be played immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the impression if I set it to display immediately, more jittering is generated, maybe because presentationDuration gets messed up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set the duration if you know the fps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but try it without setting the duration, I don't notice the jittering in another project.
decompressionSession = nil; | ||
} | ||
|
||
int status = VTDecompressionSessionCreate(kCFAllocatorDefault, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For initialize VTDecompressionSession need set parameters.
If I understand it correctly, the sender can send data from various sources.
Therefore, it is necessary to prepare for the fact that the data can be of various types.
For example:
let imageBufferAttributes = [
//kCVPixelBufferPixelFormatTypeKey: NSNumber(value: kCVPixelFormatType_......), /// if needed
kCVPixelBufferIOSurfacePropertiesKey: [:] as AnyObject,
kCVPixelBufferOpenGLESCompatibilityKey: NSNumber(booleanLiteral: true),
kCVPixelBufferMetalCompatibilityKey: NSNumber(booleanLiteral: true),
kCVPixelBufferOpenGLCompatibilityKey: NSNumber(booleanLiteral: true)
]
If you like SDR change to HDR:
Do not forget that on tvOS it is necessary to switch the TV to HDR mode. |
Thanks for the review @Alanko5. I have doubts regarding the manual decompression approach though, as I wasn't able to reduce video latency. |
Is this SDR->HDR mapping? |
Yes, apple mentions it somewhere in the documentation. |
I don't think it's caused by HW. |
What latency are we talking about? |
There's streaming delay witn ATV4K when compared to streaming with an iphone/ipad or nvidia shield. I compared them using a stopwatch application and slow-mo iPhone camera to compare PC screen time with streaming screen time |
I understand. In my opinion, the timing will help you solve the problem. Do you not use WiFi when measuring? :-) |
For 4k HEVC I think I was getting 8ms time to decode each frame. 10~11 ms total time to receive the whole frame, pack it together and decode |
Did you measure the decompression time of the Key and non-Key frames? What is the total delay of the image that you measured with the camera? What version of apple tv do you have? How do you create a VTDecompressionSession? |
Code is in this branch https://github.com/felipejfc/moonlight-ios/tree/ds_queue_surface I have the latest 4K Apple TV(2022) with the iPhone 12 pro processor. What is the total delay of the image that you measured with the camera? Did you measure the decompression time of the Key and non-Key frames? Can you set the server to send fewer keyframes? Pretty sure gamestream won't allow me to do it |
According to what you write, the problem is not in decoding. Well, you can try as follows: It is necessary that you set this value (As I wrote above): Your destinationImageBufferAttributes:
I think that during rendering it would help if the layer could use Metal.
|
Sorry, I misspelled it. It's actually 25~50ms delay! I will try your changes anyways when I get home; travelling right now so that's only next week |
Two main changes
1 - Use VideoToolbox to manually decode each frame instead of submitting it directly to AVSampleBufferDisplayLayer; I'm not proud of this change, but it was needed to fix #533. There may be some way to fix the issue without needing this change, but I still didn't manage to do it.
2 - Latency and smoothness changes
2.1 - Use Direct Submit in VideoDecodeRenderer (reduces latency)
2.2 - Use PTS information correctly per frame instead of using the DisplayImmediately flag in each sampleBuffer. Together with the change above, I was able to replicate smooth low latency stream as I get into the Nvidia Shield. I think using the flag messed with frame time and caused jittering.
Right now, I'm breaking the "Smooth Stream" option that we added some months ago, but wanted to create the PR either way for us to discuss options @cgutman