-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a profiler and instrumentation #30
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice, the thread support does add a big pile of complexity for sure.
Currently on Linux so can't test it.
I would prefer if the core had no imports from windows, so it would be nice if the profiler stuff was abstracted out to the platforms folder as well.
I have a vision where everything you need to run shadow on a new platform was in that single folder and if something was missing it was clear.
virtual int Run() = 0; | ||
|
||
// Start the thread and start it running. | ||
bool Start(const char* name, bool extended); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add proper comments on what the name and extended are used for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is the name of the thread. I'm not sure how that can be explained.
Extended exists for something I'm planning later, currently unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add those details as proper doc comments?
Would remove the guess work
return elapsed; | ||
} | ||
|
||
static size_t getTimestamp(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this size_t
while the others are double?
Guessing this is the raw unix timestamp or similar, but a comment wouldn't hurt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a raw metric.
The others are in seconds/milliseconds, with high enough precision to have fractional parts (milliseconds/nanoseconds), but this one is directly from the OS, and both Linux and Windows return a size_t here. One is a Unix timestamp, one is a processor tick counter.
It's platform specific, so it's difficult to explain in the common section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is a platform depended thing and not even the same time scale than it needs a proper comment even more
An example usage of the Profiler counter, for tracking process memory usage: const float memory = Platform::GetProcessMemory() / (1024 * 1024);
static size_t processMemoryCounter = SH::Profiler::MakeCounter("Process Memory (MB) ", 0);
SH::Profiler::PushCounter(processMemoryCounter, memory); This must be done on the main thread, and from that point on, the main thread will have a Process Memory usage counter attached. If this block of code is called every frame, the memory usage will be updated every frame, and a graph chart will be produced showing how the memory changes over time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Mutex and condition classes are missing.
The Profiler's platform specific stuff should be moved so we can support many targets. Adding increasing amounts of ifdef
s is gonna be unmanagable.
size_t threadID; | ||
}; | ||
|
||
// This ifdef contains a bunch of the infrastructure necessary for performance monitoring under Windows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section should be extracted to the platforms folder so I can even try to add Linux support and later of course for the console support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, though if i do that i'll have to duplicate the entire code between Linux and Windows, because these are static internal classes, not exposed in the header.
It's here for the DRY principle. A lot would be repeated if these were in their own file.
I've pushed the files from the other branch that this PR depends on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only a first look-over, will do another focused on the actual meat of the PR.
Feels like a good direction. Maybe we should
a) make the file stuff a separate PR and delay the development more
b) add those files to the PR description and title
namespace ShadowEngine { | ||
|
||
// An input stream that can read a file on disk. | ||
struct FileInput final : InputStream { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the file system needed here?
//The indices of the mesh | ||
std::vector<uint32_t> indices; | ||
}; | ||
#pragma once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you figure out what is up with this file?
maybe line endings or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why changes here? 🫨
virtual int Run() = 0; | ||
|
||
// Start the thread and start it running. | ||
bool Start(const char* name, bool extended); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add those details as proper doc comments?
Would remove the guess work
This PR adds three bits of infrastructure surrounding the usage of a Profiler:
Precise Timer
Currently, the Timer is a set of static ints, updated every frame by the ShadowApplication.
For the purposes of profiling, this is not useful (as we may want to profile many things that happen within a frame, with high precision).
SH::Timer is now a struct with a bunch of static methods, so that you can use it both as an RAII timer (create an instance at the start of a method to start the timer, and it stops when it goes out of scope. Call elapsed() to get the amount of nanoseconds that have elapsed since the timer started.
The timer struct itself uses std::chrono to perform timing, but the profiler may wish to get the exact current timestamp.
Thus,
is provided. To get maximum precision on every system, there is a platform source that implements this function on both Windows and Linux using platform-specific instrumentation calls.
Threads
The profiler needs to be able to handle multiple parallel threads - we'll have a job system. multithreaded rendering, a UI thread, stuff like that.
However, if we simply use std::thread for these, it's not possible to attach a profiler to this thread (as the thread handle is buried deep inside the stdlib), which is no good.
Thus, we have our own way to manage threads - the Thread class.
Simply implement Thread and provide a Run method, everything else is handled automatically.
A Join method is provided to block main thread waiting for the given thread, and the thread can also Wait() and Notify() to allow for complex parallel dependencies.
Threads are currently only implemented fully on Windows, but the infrastructure I've created should also work on Linux (as pthread is very simple).
The Profiler itself
The profiler has a simple interface; instantiate a Profiler with a name to start a block with that name, and when it goes out of scope, that block ends.
A block is to be represented as an actual rectangle on a graph, with width proportional to the time spent in the block.
There are blocks for all kinds of events; see the EventType enum:
The Profiler is capable of linking to and profiling GPU processes, via the GPUBegin and Link functions.
Other functions are provided for utility, such as color, string, int, frame.
Profilers are per-thread; any calls to a Profiler function will only apply to the profiler for the thread that called the function, even if you call across threads; always the calling thread is updated.
This allows multiple parallel block diagrams to represent what any number of threads are doing at any given time.
The profiler is currently only implemented for Windows, due to extensively using the Windows thread trace functions.
Closes #19.