Be faster #11

shssoichiro · 2022-12-03T11:35:11Z

The blur in particular is still rather slow despite our many efforts to optimize it

FreezyLemon · 2022-12-21T13:16:47Z

The way I see it, there are two main possibilities to improve the blurring speed: Either choose a different algorithm or try to optimize the current algorithm further.

The original algo for the gaussian blur came from libjxl, maybe they did tests and already have justification on why they use that particular implementation. I couldn't find it quickly but it probably exists somewhere.

With regards to optimizing the algo further, the focus should probably not be CPU time, but memory accesses. Which is going to be pretty hard IMO, since the compiler is already doing a good job. I don't think there are any easy big gains left here, at least for x86.

FreezyLemon · 2023-05-18T09:16:36Z

I had some free time recently and really looked at it again, and I noticed something: All¹ of the SIMD (AVX, to be specific) instructions are the ss (scalar single) variant, and not the ps (packed scalar) one. This means that we're effectively using the xmm registers to hold ONE float, and are just using the register to use the FMA instructions.

We should be able to get at least the horizontal pass to do packed instructions. Though I'm sure it'd be fairly difficult, because of the way the algorithm accesses memory (bounds "checking", 2 accesses per pixel). Maybe libjxl can serve as an inspiration on how to do it.

with some relatively unimportant exceptions like mov and xor ↩

FreezyLemon mentioned this issue Feb 12, 2024

Investigate performance impact of flat image buffers (Vec<f32>) vs planar buffers ([Vec<f32>; 3]) #19

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Be faster #11

Be faster #11

shssoichiro commented Dec 3, 2022

FreezyLemon commented Dec 21, 2022

FreezyLemon commented May 18, 2023

Be faster #11

Be faster #11

Comments

shssoichiro commented Dec 3, 2022

FreezyLemon commented Dec 21, 2022

FreezyLemon commented May 18, 2023

Footnotes