Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements of Find recipe #4758

Merged
merged 6 commits into from
Dec 11, 2024
Merged

Conversation

nielsdebruin
Copy link
Contributor

What's changed?

The Find has been modified to increase its performance. Nearly all these optimizations are related to how Strings are handled.

Main changes

  • Replace basic String concatenation with a StringBuilder where we set the capacity to the exact length of the final String. Apart from setting the capacity exactly, this will mostly only affect OpenRewrite users using old JVM, since most modern JVM will often already generate optimized bytecode using a StringBuilder.
  • Significantly reduce the number of generated substrings, e.g., while looking up line ending on substrings.
  • Avoid recomputing common values, byte precomputing them outside the loop.

What's your motivation?

Make it faster.

Anyone you would like to review specifically?

@kmccarp @timtebeek

Checklist

  • I've added unit tests to cover both positive and negative cases
  • I've read and applied the recipe conventions and best practices
  • I've used the IntelliJ IDEA auto-formatter on affected files

@nielsdebruin nielsdebruin added the enhancement New feature or request label Dec 9, 2024
@nielsdebruin nielsdebruin self-assigned this Dec 9, 2024
@kmccarp
Copy link
Contributor

kmccarp commented Dec 9, 2024

@lkerford I guess @nielsdebruin beat you to it :) please take a look

@timtebeek timtebeek requested a review from lkerford December 9, 2024 16:30
Copy link
Contributor

@lkerford lkerford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to see how much of a difference this change would make with the performance, and I wasn't disappointed. The test I have shared below ran in 7753ms on the main branch vs 450ms on this branch. This is fantastic

@Test
    void findPerformanceTest() {
        String intputLine = """
          This is a line above.
          This is text.
          This is a line below.
          """;
        String outputLine = """
          This is a line above.
          This is ~~>text.
          This is a line below.
          """;

        int i=0;
        String fileInput= intputLine;
        String fileOutput= outputLine;
        int numberOfEntries = 100000;
        while (i < numberOfEntries){
            fileInput += intputLine;
            fileOutput += outputLine;
            i++;
        }
        long startTime = System.currentTimeMillis();

        rewriteRun(
          spec -> spec.recipe(new Find("text", null, null, null, null, null))
            .dataTable(TextMatches.Row.class, rows -> {
                assertThat(rows).hasSize(numberOfEntries+1);
                assertThat(rows.get(0).getMatch()).isEqualTo("This is ~~>text.");
            }),
          text(
            fileInput,
            fileOutput
          )
        );
        long elapsedTime = System.currentTimeMillis() - startTime;
        System.out.println(elapsedTime + "ms");
    }

@knutwannheden
Copy link
Contributor

If we want to improve the performance even more we should either avoid computing that line terminator index or use an ArrayList and then use binary search.

Personally I don't see the point of precomputing this, at least not for the whole file (only for the range containing matches) and especially not before finding any matches. Also note that both indexOf() and lastIndexOf() both have an overload accepting a starting point.

The index (or some custom logic) may make sense for files with very long lines. So if we want to keep the index we could compute the index for the part after the first match (and the terminator before it) and use binary search.

@timtebeek timtebeek merged commit d789bcb into main Dec 11, 2024
0 of 2 checks passed
@timtebeek timtebeek deleted the refactor/find-recipe branch December 11, 2024 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants