Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use nanosleep for non-Windows platforms #6392

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

hwsmm
Copy link
Contributor

@hwsmm hwsmm commented Oct 18, 2024

This PR lets Unix platforms use nanosleep instead of Thread.Sleep to improve on frame pacing. I will test on Android after some time, but I don't have any Apple devices to test this on.

I put UnixNativeSleep in Platform/Linux/Native, but I don't think it's the best fit... Do I need to create a new folder, or is it good as is?

Testing:

  • Windows
  • macOS
  • Linux
  • iOS
  • Android

Copy link
Member

@Susko3 Susko3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, there are a lot of unknowns/variables when trying to implement this in C#: the native library name, definition of struct timespec and EINTR.

Maybe it's better to call SDL3.SDL_DelayNS() and have a comment explaining that it's just a wrapper for nanosleep() on Unix platforms.

Comment on lines 14 to 15
public long Seconds;
public long NanoSeconds;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these need to be IntPtr to work properly on 32-bit Unix platforms. Since time_t and long are the underlying types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had also thought about this, but do we have any plans to support 32bit systems in future? I've seen many discussions which suggested 32bit won't be supported.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nint also works, but I find this pedantry. 32-bit unices are no longer relevant in $CURRENT_YEAR. We don't even ship all binaries for linux-x86 (ffmpeg is missing).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Android arm32 (armeabi-v7a) might be relevant. If you don't want to use IntPtr/nint here, make this code only work on 64-bit systems by checking Environment.Is64BitProcess.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.NET 9+ is only built for distros with 64bit time_t on arm32. Using long is more future-proof with the configuration used by .NET.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't matter what configuration .NET is built on, the only thing that matters is which function from libc we import. There's nanosleep and _nanosleep64 (or similar).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nanosleep is used, the bitness of timespec matters, which is decided by time_t, which is provided by libc. It can help with keeping consistent with the libc that .NET builds with. That's the point.

[DllImport("libc.so.6", EntryPoint = "nanosleep", CallingConvention = CallingConvention.Cdecl, SetLastError = true)]
private static extern int nanosleep_libc6(in TimeSpec duration, out TimeSpec rem);

private const int interrupt_error = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://man7.org/linux/man-pages/man3/errno.3.html

The error numbers that correspond to each symbolic name vary
across UNIX systems, and even across different architectures on
Linux. Therefore, numeric values are not included as part of the
list of error names below. The perror(3) and strerror(3)
functions can be used to convert these names to corresponding
textual error messages.

We can't really define a numerical constant for this...

Copy link
Contributor Author

@hwsmm hwsmm Oct 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EINTR seems to be 4 on all platforms we support. I need confirmation on Apple devices, though.

I also agree that this is not reliable, but anyway..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, this is pedantry. As long as it works it's fine to hardcode. If it doesn't work somewhere, then we'll find out eventually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the decompiled SDL3 code in SDL3-CS/native and it's always 4.

@hwsmm
Copy link
Contributor Author

hwsmm commented Oct 19, 2024

So do we just try using SDL_DelayNS again? I would prefer using SDL_DelayNS for Unix platforms as targeting different libc/arch in C# doesn't sound good for me, but I also agree that using an SDL function in Clock doesn't match well either (might be good enough with the comment as Susko3 mentioned).

@peppy
Copy link
Member

peppy commented Oct 19, 2024

Where's the testing/benchmarking at? I'd want to see some kind of proof this is beneficial before blindly implementing.

I'd hope this is done by the implementor as to not take time away from the core team.

@hwsmm
Copy link
Contributor Author

hwsmm commented Oct 19, 2024

You can also see that update thread has (almost) constant 1.2ms frame time on current release, while this implementation keeps it at 1ms. These were taken on Linux, and YMMV as not every distro has same configuration, but it works pretty well on my environment.

Current release This PR
osu_2024-10-19_00-05-04 osu_2024-10-19_00-02-45

@bdach
Copy link
Collaborator

bdach commented Oct 21, 2024

So do we just try using SDL_DelayNS again?

If this diff can't work then I'd rather have nothing than sdl in threading code paths.

Comment on lines 22 to 27
// Android and some platforms don't have version in lib name.
[DllImport("c", EntryPoint = "nanosleep", CallingConvention = CallingConvention.Cdecl, SetLastError = true)]
private static extern int nanosleep_c(in TimeSpec duration, out TimeSpec rem);

[DllImport("libc.so.6", EntryPoint = "nanosleep", CallingConvention = CallingConvention.Cdecl, SetLastError = true)]
private static extern int nanosleep_libc6(in TimeSpec duration, out TimeSpec rem);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just import "libc" and let .NET deal with it.

This avoids having two native functions and the delegate. We can probably assume nanosleep() is always available, so there's no need for the static constructor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full proposal implementing the 64-bit check:

diff --git a/osu.Framework/Platform/Linux/Native/UnixNativeSleep.cs b/osu.Framework/Platform/Linux/Native/UnixNativeSleep.cs
index d0bf152b3..8418fbff2 100644
--- a/osu.Framework/Platform/Linux/Native/UnixNativeSleep.cs
+++ b/osu.Framework/Platform/Linux/Native/UnixNativeSleep.cs
@@ -8,6 +8,8 @@ namespace osu.Framework.Platform.Linux.Native
 {
     internal class UnixNativeSleep : INativeSleep
     {
+        public static bool Available => Environment.Is64BitProcess;
+
         [StructLayout(LayoutKind.Sequential)]
         public struct TimeSpec
         {
@@ -15,56 +17,13 @@ public struct TimeSpec
             public long NanoSeconds;
         }
 
-        private delegate int NanoSleepDelegate(in TimeSpec duration, out TimeSpec rem);
-
-        private static readonly NanoSleepDelegate? nanosleep;
-
-        // Android and some platforms don't have version in lib name.
-        [DllImport("c", EntryPoint = "nanosleep", CallingConvention = CallingConvention.Cdecl, SetLastError = true)]
-        private static extern int nanosleep_c(in TimeSpec duration, out TimeSpec rem);
-
-        [DllImport("libc.so.6", EntryPoint = "nanosleep", CallingConvention = CallingConvention.Cdecl, SetLastError = true)]
-        private static extern int nanosleep_libc6(in TimeSpec duration, out TimeSpec rem);
+        [DllImport("libc", CallingConvention = CallingConvention.Cdecl, SetLastError = true)]
+        private static extern int nanosleep(in TimeSpec duration, out TimeSpec rem);
 
         private const int interrupt_error = 4;
 
-        static UnixNativeSleep()
-        {
-            TimeSpec test = new TimeSpec
-            {
-                Seconds = 0,
-                NanoSeconds = 1,
-            };
-
-            try
-            {
-                nanosleep_c(in test, out _);
-                nanosleep = nanosleep_c;
-            }
-            catch
-            {
-            }
-
-            if (nanosleep == null)
-            {
-                try
-                {
-                    nanosleep_libc6(in test, out _);
-                    nanosleep = nanosleep_libc6;
-                }
-                catch
-                {
-                }
-            }
-
-            // if nanosleep is null at this point, Thread.Sleep should be used.
-        }
-
         public bool Sleep(TimeSpan duration)
         {
-            if (nanosleep == null)
-                return false;
-
             const int ns_per_second = 1000 * 1000 * 1000;
 
             long ns = (long)duration.TotalNanoseconds;
diff --git a/osu.Framework/Timing/ThrottledFrameClock.cs b/osu.Framework/Timing/ThrottledFrameClock.cs
index 4f827d68c..e0355c33f 100644
--- a/osu.Framework/Timing/ThrottledFrameClock.cs
+++ b/osu.Framework/Timing/ThrottledFrameClock.cs
@@ -33,13 +33,13 @@ public class ThrottledFrameClock : FramedClock, IDisposable
         /// </summary>
         public double TimeSlept { get; private set; }
 
-        private readonly INativeSleep nativeSleep;
+        private readonly INativeSleep? nativeSleep;
 
         internal ThrottledFrameClock()
         {
             if (RuntimeInfo.OS == RuntimeInfo.Platform.Windows)
                 nativeSleep = new WindowsNativeSleep();
-            else
+            else if (RuntimeInfo.IsUnix && UnixNativeSleep.Available)
                 nativeSleep = new UnixNativeSleep();
         }
 
@@ -95,7 +95,7 @@ private double sleepAndUpdateCurrent(double milliseconds)
 
             TimeSpan timeSpan = TimeSpan.FromMilliseconds(milliseconds);
 
-            if (!nativeSleep.Sleep(timeSpan))
+            if (nativeSleep?.Sleep(timeSpan) != true)
                 Thread.Sleep(timeSpan);
 
             return (CurrentTime = SourceTime) - before;
@@ -103,7 +103,7 @@ private double sleepAndUpdateCurrent(double milliseconds)
 
         public void Dispose()
         {
-            nativeSleep.Dispose();
+            nativeSleep?.Dispose();
         }
     }
 }

Copy link
Contributor Author

@hwsmm hwsmm Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I noticed your comments after pushing... I think it isn't that bad to use nint if we want to support 32bit Android.

I will revert if anyone thinks otherwise.

@sr229
Copy link
Contributor

sr229 commented Oct 29, 2024

You can also see that update thread has (almost) constant 1.2ms frame time on current release, while this implementation keeps it at 1ms. These were taken on Linux, and YMMV as not every distro has same configuration, but it works pretty well on my environment.

Current release This PR
osu_2024-10-19_00-05-04 osu_2024-10-19_00-02-45

I would prefer if this was benchmarked properly using BenchmarkDotNet to provide reproducible empirical results if there's any benefits. These screenshots won't do.

@hwsmm
Copy link
Contributor Author

hwsmm commented Oct 29, 2024

0.5ms duration:

Method Mean Error StdDev Allocated
TestThreadSleep 629.9 ns 7.74 ns 6.86 ns -
TestNativeSleep 503,293.3 ns 26.55 ns 23.54 ns -

1ms duration:

Method Mean Error StdDev Allocated
TestThreadSleep 1.054 ms 0.0001 ms 0.0001 ms -
TestNativeSleep 1.004 ms 0.0000 ms 0.0000 ms -

1.5ms duration:

Method Mean Error StdDev Allocated
TestThreadSleep 1.054 ms 0.0002 ms 0.0002 ms -
TestNativeSleep 1.503 ms 0.0001 ms 0.0001 ms -

I think Thread.Sleep is limited to millisecond precision even if TimeSpan is given.

Also, I assumed screenshots were enough to show near 0.00ms frame time stddev, but I agree that this is more clear.

@harrislegobrick
Copy link

harrislegobrick commented Nov 19, 2024

Additional frametime graphs

Windows 11
Current release This PR
image image

Test refactored Windows logic. Change is within margin of error.

Linux (Fedora 41)
Current release This PR
image image

The threads are sleeping for a more accurate amount of time leading to less gaps in the frametime graph and a smoother graph. The Thread.Sleep() method currently used is limited to integer millisecond level sleeping (as shown in the benchmark above) and a 1kHz loop requires sub millisecond timing. I observed no CPU utilization difference for this PR.

macOS
Current release This PR
image image

I had a limited amount of time with a MacBook so that's why these graphs are more rough looking than the other two. Even though the graphs look different, the graph for this PR looks smoother overall.

iOS
Current release This PR
ma pr

Seems to work as expected in lowering the jitter.

@peppy
Copy link
Member

peppy commented Nov 20, 2024

Is there a reason why there's (what looks to be) gaps in the old graphs?

@harrislegobrick
Copy link

Is there a reason why there's (what looks to be) gaps in the old graphs?

The gaps are an artifact of trying to sleep for less than a millisecond returning a tiny value. Applying this patch makes very small sleep times yellow and increases the height to be more visible.

diff --git a/osu.Framework/Graphics/Performance/FrameStatisticsDisplay.cs b/osu.Framework/Graphics/Performance/FrameStatisticsDisplay.cs
index f752c88c5..403138d70 100644
--- a/osu.Framework/Graphics/Performance/FrameStatisticsDisplay.cs
+++ b/osu.Framework/Graphics/Performance/FrameStatisticsDisplay.cs
@@ -38,7 +38,7 @@ internal partial class FrameStatisticsDisplay : Container, IStateful<FrameStatis
         private const int amount_count_steps = 5;
 
         private const int amount_ms_steps = 5;
-        private const float visible_ms_range = 20;
+        private const float visible_ms_range = 0.1f;
         private const float scale = HEIGHT / visible_ms_range;
 
         private const float alpha_when_active = 0.75f;
@@ -490,6 +490,12 @@ private int addArea(FrameStatistics frame, PerformanceCollectionType? frameTimeT
 
             Color4 col = frameTimeType.HasValue ? getColour(frameTimeType.Value) : new Color4(0.1f, 0.1f, 0.1f, 1);
 
+            if (frameTimeType.GetValueOrDefault() == PerformanceCollectionType.Sleep && drawHeight < 10)
+            {
+                col = Color4.Yellow;
+                drawHeight += HEIGHT / 2;
+            }
+
             for (int i = currentHeight - 1; i >= 0; --i)
             {
                 if (drawHeight-- == 0) break;

Linux

master (with above patch applied) This PR (with above patch applied)
image image

@peppy
Copy link
Member

peppy commented Nov 21, 2024

Right, so it's showing precisely what it should be based on how it was sleeping. All good.

@huoyaoyuan
Copy link
Contributor

Interesting. Consider reporting to https://github.com/dotnet/runtime/issues to improve the performance of Thread.Sleep. You may also get feedback around reliability/compatibility there.

@hwsmm
Copy link
Contributor Author

hwsmm commented Dec 19, 2024

I finally got around to writing the issue in dotnet/runtime, but it seems that there was already a similar discussion going on there. I guess we can keep our implementation for now?

Due to all of that, this isn't something I can see getting prioritized for .NET 10. However, there is nothing preventing a 3rd party library from being created that provides the functionality in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants