Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a profiler and instrumentation #30

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

TheCurle
Copy link
Member

@TheCurle TheCurle commented Apr 5, 2024

This PR adds three bits of infrastructure surrounding the usage of a Profiler:

Precise Timer

Currently, the Timer is a set of static ints, updated every frame by the ShadowApplication.
For the purposes of profiling, this is not useful (as we may want to profile many things that happen within a frame, with high precision).

SH::Timer is now a struct with a bunch of static methods, so that you can use it both as an RAII timer (create an instance at the start of a method to start the timer, and it stops when it goes out of scope. Call elapsed() to get the amount of nanoseconds that have elapsed since the timer started.

The timer struct itself uses std::chrono to perform timing, but the profiler may wish to get the exact current timestamp.
Thus,

static size_t getTimestamp();

is provided. To get maximum precision on every system, there is a platform source that implements this function on both Windows and Linux using platform-specific instrumentation calls.

Threads

The profiler needs to be able to handle multiple parallel threads - we'll have a job system. multithreaded rendering, a UI thread, stuff like that.
However, if we simply use std::thread for these, it's not possible to attach a profiler to this thread (as the thread handle is buried deep inside the stdlib), which is no good.
Thus, we have our own way to manage threads - the Thread class.
Simply implement Thread and provide a Run method, everything else is handled automatically.
A Join method is provided to block main thread waiting for the given thread, and the thread can also Wait() and Notify() to allow for complex parallel dependencies.

Threads are currently only implemented fully on Windows, but the infrastructure I've created should also work on Linux (as pthread is very simple).

The Profiler itself

The profiler has a simple interface; instantiate a Profiler with a name to start a block with that name, and when it goes out of scope, that block ends.

A block is to be represented as an actual rectangle on a graph, with width proportional to the time spent in the block.

There are blocks for all kinds of events; see the EventType enum:

    enum class EventType {
      Begin,
      Color,
      End,
      Frame,
      String,
      Int,
      FiberWait,
      FiberWake,
      ContextSwitch,
      Job,
      GPUBegin,
      GPUEnd,
      Link,
      Pause,
      GPUStats,
      Continue,
      Signal,
      Counter
    };

The Profiler is capable of linking to and profiling GPU processes, via the GPUBegin and Link functions.
Other functions are provided for utility, such as color, string, int, frame.

Profilers are per-thread; any calls to a Profiler function will only apply to the profiler for the thread that called the function, even if you call across threads; always the calling thread is updated.

This allows multiple parallel block diagrams to represent what any number of threads are doing at any given time.

The profiler is currently only implemented for Windows, due to extensively using the Windows thread trace functions.

Closes #19.

Copy link
Member

@dpeter99 dpeter99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice, the thread support does add a big pile of complexity for sure.
Currently on Linux so can't test it.
I would prefer if the core had no imports from windows, so it would be nice if the profiler stuff was abstracted out to the platforms folder as well.

I have a vision where everything you need to run shadow on a new platform was in that single folder and if something was missing it was clear.

virtual int Run() = 0;

// Start the thread and start it running.
bool Start(const char* name, bool extended);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add proper comments on what the name and extended are used for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is the name of the thread. I'm not sure how that can be explained.

Extended exists for something I'm planning later, currently unused.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add those details as proper doc comments?
Would remove the guess work

return elapsed;
}

static size_t getTimestamp();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this size_t while the others are double?
Guessing this is the raw unix timestamp or similar, but a comment wouldn't hurt.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a raw metric.

The others are in seconds/milliseconds, with high enough precision to have fractional parts (milliseconds/nanoseconds), but this one is directly from the OS, and both Linux and Windows return a size_t here. One is a Unix timestamp, one is a processor tick counter.

It's platform specific, so it's difficult to explain in the common section.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is a platform depended thing and not even the same time scale than it needs a proper comment even more

@TheCurle TheCurle mentioned this pull request Apr 6, 2024
@TheCurle
Copy link
Member Author

TheCurle commented Apr 6, 2024

An example usage of the Profiler counter, for tracking process memory usage:

        const float memory = Platform::GetProcessMemory() / (1024 * 1024);
        static size_t processMemoryCounter = SH::Profiler::MakeCounter("Process Memory (MB) ", 0);
        SH::Profiler::PushCounter(processMemoryCounter, memory);

This must be done on the main thread, and from that point on, the main thread will have a Process Memory usage counter attached. If this block of code is called every frame, the memory usage will be updated every frame, and a graph chart will be produced showing how the memory changes over time.

Copy link
Member

@dpeter99 dpeter99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Mutex and condition classes are missing.
The Profiler's platform specific stuff should be moved so we can support many targets. Adding increasing amounts of ifdefs is gonna be unmanagable.

projs/shadow/shadow-engine/core/src/profile/Profiler.cpp Outdated Show resolved Hide resolved
projs/shadow/shadow-engine/core/src/profile/Profiler.cpp Outdated Show resolved Hide resolved
size_t threadID;
};

// This ifdef contains a bunch of the infrastructure necessary for performance monitoring under Windows.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section should be extracted to the platforms folder so I can even try to add Linux support and later of course for the console support

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, though if i do that i'll have to duplicate the entire code between Linux and Windows, because these are static internal classes, not exposed in the header.

It's here for the DRY principle. A lot would be repeated if these were in their own file.

projs/shadow/shadow-engine/core/src/profile/Profiler.cpp Outdated Show resolved Hide resolved
@TheCurle
Copy link
Member Author

TheCurle commented Apr 7, 2024

I've pushed the files from the other branch that this PR depends on.

@dpeter99 dpeter99 marked this pull request as draft April 8, 2024 21:59
Copy link
Member

@dpeter99 dpeter99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only a first look-over, will do another focused on the actual meat of the PR.
Feels like a good direction. Maybe we should
a) make the file stuff a separate PR and delay the development more
b) add those files to the PR description and title

namespace ShadowEngine {

// An input stream that can read a file on disk.
struct FileInput final : InputStream {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the file system needed here?

//The indices of the mesh
std::vector<uint32_t> indices;
};
#pragma once
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you figure out what is up with this file?
maybe line endings or something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why changes here? 🫨

virtual int Run() = 0;

// Start the thread and start it running.
bool Start(const char* name, bool extended);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add those details as proper doc comments?
Would remove the guess work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance monitoring tools
2 participants