Performance improvements of `Find` recipe #4758

nielsdebruin · 2024-12-09T14:45:20Z

What's changed?

The Find has been modified to increase its performance. Nearly all these optimizations are related to how Strings are handled.

Main changes

Replace basic String concatenation with a StringBuilder where we set the capacity to the exact length of the final String. Apart from setting the capacity exactly, this will mostly only affect OpenRewrite users using old JVM, since most modern JVM will often already generate optimized bytecode using a StringBuilder.
Significantly reduce the number of generated substrings, e.g., while looking up line ending on substrings.
Avoid recomputing common values, byte precomputing them outside the loop.

What's your motivation?

Make it faster.

Anyone you would like to review specifically?

@kmccarp @timtebeek

Checklist

I've added unit tests to cover both positive and negative cases
I've read and applied the recipe conventions and best practices
I've used the IntelliJ IDEA auto-formatter on affected files

kmccarp · 2024-12-09T14:52:28Z

@lkerford I guess @nielsdebruin beat you to it :) please take a look

lkerford

I wanted to see how much of a difference this change would make with the performance, and I wasn't disappointed. The test I have shared below ran in 7753ms on the main branch vs 450ms on this branch. This is fantastic

@Test
    void findPerformanceTest() {
        String intputLine = """
          This is a line above.
          This is text.
          This is a line below.
          """;
        String outputLine = """
          This is a line above.
          This is ~~>text.
          This is a line below.
          """;

        int i=0;
        String fileInput= intputLine;
        String fileOutput= outputLine;
        int numberOfEntries = 100000;
        while (i < numberOfEntries){
            fileInput += intputLine;
            fileOutput += outputLine;
            i++;
        }
        long startTime = System.currentTimeMillis();

        rewriteRun(
          spec -> spec.recipe(new Find("text", null, null, null, null, null))
            .dataTable(TextMatches.Row.class, rows -> {
                assertThat(rows).hasSize(numberOfEntries+1);
                assertThat(rows.get(0).getMatch()).isEqualTo("This is ~~>text.");
            }),
          text(
            fileInput,
            fileOutput
          )
        );
        long elapsedTime = System.currentTimeMillis() - startTime;
        System.out.println(elapsedTime + "ms");
    }

rewrite-core/src/test/java/org/openrewrite/text/FindTest.java

knutwannheden · 2024-12-10T17:36:02Z

If we want to improve the performance even more we should either avoid computing that line terminator index or use an ArrayList and then use binary search.

Personally I don't see the point of precomputing this, at least not for the whole file (only for the range containing matches) and especially not before finding any matches. Also note that both indexOf() and lastIndexOf() both have an overload accepting a starting point.

The index (or some custom logic) may make sense for files with very long lines. So if we want to keep the index we could compute the index for the part after the first match (and the terminator before it) and use binary search.

Implement some performance improvements on Find recipe

dd9b489

nielsdebruin added the enhancement New feature or request label Dec 9, 2024

nielsdebruin requested review from kmccarp and timtebeek December 9, 2024 14:45

nielsdebruin self-assigned this Dec 9, 2024

timtebeek requested a review from lkerford December 9, 2024 16:30

lkerford approved these changes Dec 9, 2024

View reviewed changes

Add extra tests

9bd3ac8

timtebeek reviewed Dec 10, 2024

View reviewed changes

rewrite-core/src/test/java/org/openrewrite/text/FindTest.java Outdated Show resolved Hide resolved

Modify last test

4c375f9

Restore linked list

98be617

nielsdebruin force-pushed the refactor/find-recipe branch from 988c910 to 98be617 Compare December 11, 2024 09:43

Merge branch 'main' into refactor/find-recipe

aef3020

timtebeek requested a review from knutwannheden December 11, 2024 10:31

More performance gains

0b0de0b

knutwannheden approved these changes Dec 11, 2024

View reviewed changes

timtebeek approved these changes Dec 11, 2024

View reviewed changes

timtebeek merged commit d789bcb into main Dec 11, 2024
0 of 2 checks passed

timtebeek deleted the refactor/find-recipe branch December 11, 2024 11:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements of `Find` recipe #4758

Performance improvements of `Find` recipe #4758

nielsdebruin commented Dec 9, 2024

kmccarp commented Dec 9, 2024

lkerford left a comment

knutwannheden commented Dec 10, 2024

Performance improvements of Find recipe #4758

Performance improvements of Find recipe #4758

Conversation

nielsdebruin commented Dec 9, 2024

What's changed?

Main changes

What's your motivation?

Anyone you would like to review specifically?

Checklist

kmccarp commented Dec 9, 2024

lkerford left a comment

Choose a reason for hiding this comment

knutwannheden commented Dec 10, 2024

Performance improvements of `Find` recipe #4758

Performance improvements of `Find` recipe #4758