-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show mean time in @btime
#258
base: main
Are you sure you want to change the base?
Conversation
One thing I really like about this format is that most users don’t realize when they use |
It could also say With everything now, the line is usually about 82 characters. Do we care about getting it under 80? It's not super-confusing if line wrapped. (The zero-allocation case is more like 45 characters, perfect for slack...) |
src/execution.jl
Outdated
$BenchmarkTools.prettymemory($BenchmarkTools.memory($trialmin)), ")") | ||
$print(" min ", $BenchmarkTools.prettytime($BenchmarkTools.time($trialmin)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to interpolate $BenchmarkTools.prettymemory
not just $prettymemory
? I see I left some and later dropped them.
trialmin, trialallocs = gensym(), gensym() | ||
trialmin, trialmean, trialallocs = gensym(), gensym(), gensym() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to write @gensym trialmin trialallocs
etc. here?
I think there is great value in having |
This is a fair point. And there's something to be said for matching Can I persuade anyone to try out this branch in real life for a bit, to see if they like or hate it in actual use? |
The issue is that for cases where you would use |
Isn't the mean what's relevant for throughput? I guess this PR takes the position that it's the most useful second number. For example, with the allocating The second number The mode seems tricky, on what are essentially real numbers. Should they be binned first, or can we trust the clock to do that well enough?
vs another machine
This and the median might be contenders for the best one number. But, as you say, these will often approach the minimum... which |
The problem is that it's noisy and unreliable; if something else happens to run on your system that temporarily interrupts the Basically, what source of timing variance are you interested in averaging? In many cases the variance is due to things outside of Julia, which makes it of questionable utility in benchmarking the performance of Julia code per se. I agree that there are some cases where timing variance arises from Julia itself— for example, due to garbage collection, or perhaps your algorithm itself is stochastic. In the latter cases, however, you need to pay much closer attention to statistics and probably |
Nice, somehow I never looked at the paper. I guess garbage collection is the major thing I had in mind, since it's very common to have different implementations with different allocations. The reported GC percentage is quite a large underestimate of how much time this is going to cost. This is a rare event, so something like mean(5-to-95th) will miss it. Agree this measure will be noisier than Is there another number which might better serve this purpose? |
Ref #37 (comment). |
A problem with benchmarking this is that it is strongly context dependent — how often gc occurs depends not just on the function you are benchmarking, but whether it is called in a context that also does allocation. (gc times aren't additive!) This is especially problematic here because That's not to say that benchmarking gc is unimportant! But, in combination with the difficulty of eliminating external noise, my sense is that the mean will mislead more that it informs in the |
It is because you are not running a representative benchmark. You are doing the equivalent of
and then saying that the minimum is a bad statistic for giving the mean value when running that code multiple times. Mean is bad. Every single noise of your computer, every latency based on the OS scheduler, etc contributes to the mean. Don't take it from me, take it from this dude https://youtu.be/vrfYLlR8X8k?t=1487 (the whole talk is worth looking at). Again, make a |
I want this. I'm perfectly fine with naming it |
Yes, call it something else. |
Are you happy to rename it? Seems easy enough to make everyone happy this way. |
@mcabbott doing some cleanup before a 2.0 release, would you wanna rename this |
There is room to show a bit more information with
@btime
, and I think the next number to show after the minimum is the mean. So this PR does this:Before:
Notice that it also doesn't bother writing "0 bytes" if there are no allocations, in the name of reducing clutter. I think that's safe but if anyone has a counter-example speak up.
I'm less sure this is a good idea, but it also prints a the mean GC time, which I hope is likely to be the most useful one. (Often the fastest run will not trigger this.) I think it's less confusing to show the time than just the percentage -- I hope it is clear that this is the mean of the GC times, as a percentage of the mean time shown. Not the mean GC percentage.
Xref #217 for ongoing discussion about
@benchmark
's histograms.Edit: for comparison, here's
Base.@time
's minimal and maximal display:This also skips
0 allocations
when zero, and prints GC time + compilation time when these are nonzero. It's longer than the longest@btime
above. I suppose this is also an argument for showing GC time as a percentage, which could also save some space.Edit': perhaps this wants a screenshot. A variant with bold times & only GC percentage:
Notice that the minimum times are in the wrong order, compared to the mean. And also that the GC percentages greatly understate the average cost of allocations.