-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Be faster #11
Comments
The way I see it, there are two main possibilities to improve the blurring speed: Either choose a different algorithm or try to optimize the current algorithm further. The original algo for the gaussian blur came from libjxl, maybe they did tests and already have justification on why they use that particular implementation. I couldn't find it quickly but it probably exists somewhere. With regards to optimizing the algo further, the focus should probably not be CPU time, but memory accesses. Which is going to be pretty hard IMO, since the compiler is already doing a good job. I don't think there are any easy big gains left here, at least for x86. |
I had some free time recently and really looked at it again, and I noticed something: All1 of the SIMD (AVX, to be specific) instructions are the We should be able to get at least the horizontal pass to do packed instructions. Though I'm sure it'd be fairly difficult, because of the way the algorithm accesses memory (bounds "checking", 2 accesses per pixel). Maybe libjxl can serve as an inspiration on how to do it. Footnotes
|
The blur in particular is still rather slow despite our many efforts to optimize it
The text was updated successfully, but these errors were encountered: