-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MM-59980] Upgrade to whisper.cpp v1.7.1 #33
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice investigation, thank you!
Very high data widths are very sensitive to data layout and L1/L2 cache invalidations. If the code is not perfectly laid out by the compiler, it can have the opposite effect of CPUs having more cache-misses. I'd be really curious if you can run a
|
@agnivade I'll do it as I share your spirit of curiosity, but please don't make me mess with "how the code is laid out by the compiler" on this one :P Some background discussion at ggerganov/whisper.cpp#2099 has some pointers as well. |
lol, I'm not a monster :) |
You know, none of those events exists on EC2 since it's virtualized, so it's going to be harder to get this data without going metal but then it wouldn't be a fair (or very useful) comparison.
Well, scratch that, none of it works either so I think we are stuck in terms of benchmarking this on EC2. |
Ah no, I was suggesting to benchmark on your laptop. You can use perflock: https://github.com/aclements/perflock to run benchmarks reliably on your laptop. But it would also help if you shutdown browser and code editor. |
Anyways, please feel free to ignore if it's too much work. It was just a curiosity from my side. |
@agnivade Here you go :) AVX512=0
AVX512=1
|
Verified the image built is working on an instance without AVX512 support. |
Sorry, just getting to this now. So, as expected the IPC decreases from 2.33 to 1.61, which is theoretically a good thing. But we can see where the problem is:
|
Thanks, that makes sense. The perf output was actually color-coded to highlight the increase in cache misses. Good stuff to keep in mind if we ever need to think about this in the future. |
Summary
Interestingly, benchmarks revealed that
AVX512
wasn't adding much value. In fact, binaries compiled with that instruction set performed worse on average.Since we are here, we also bump whisper.cpp to the latest version (v1.7.1), which shows a reasonable performance boost (~5% on tiny, ~3.5% on base).
This also means we'll be able to get rid of https://github.com/mattermost/mattermost-plugin-calls/blob/1720eb5ab348bb358869369045b194f8c461a84b/.github/workflows/e2e.yml#L55-L80 soon enough.
Ticket Link
https://mattermost.atlassian.net/browse/MM-59980