Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be faster #11

Open
shssoichiro opened this issue Dec 3, 2022 · 2 comments
Open

Be faster #11

shssoichiro opened this issue Dec 3, 2022 · 2 comments

Comments

@shssoichiro
Copy link
Contributor

The blur in particular is still rather slow despite our many efforts to optimize it

@FreezyLemon
Copy link
Contributor

The way I see it, there are two main possibilities to improve the blurring speed: Either choose a different algorithm or try to optimize the current algorithm further.

The original algo for the gaussian blur came from libjxl, maybe they did tests and already have justification on why they use that particular implementation. I couldn't find it quickly but it probably exists somewhere.

With regards to optimizing the algo further, the focus should probably not be CPU time, but memory accesses. Which is going to be pretty hard IMO, since the compiler is already doing a good job. I don't think there are any easy big gains left here, at least for x86.

@FreezyLemon
Copy link
Contributor

I had some free time recently and really looked at it again, and I noticed something: All1 of the SIMD (AVX, to be specific) instructions are the ss (scalar single) variant, and not the ps (packed scalar) one. This means that we're effectively using the xmm registers to hold ONE float, and are just using the register to use the FMA instructions.

We should be able to get at least the horizontal pass to do packed instructions. Though I'm sure it'd be fairly difficult, because of the way the algorithm accesses memory (bounds "checking", 2 accesses per pixel). Maybe libjxl can serve as an inspiration on how to do it.

Footnotes

  1. with some relatively unimportant exceptions like mov and xor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants