-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements of Find
recipe
#4758
Conversation
@lkerford I guess @nielsdebruin beat you to it :) please take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to see how much of a difference this change would make with the performance, and I wasn't disappointed. The test I have shared below ran in 7753ms on the main branch vs 450ms on this branch. This is fantastic
@Test
void findPerformanceTest() {
String intputLine = """
This is a line above.
This is text.
This is a line below.
""";
String outputLine = """
This is a line above.
This is ~~>text.
This is a line below.
""";
int i=0;
String fileInput= intputLine;
String fileOutput= outputLine;
int numberOfEntries = 100000;
while (i < numberOfEntries){
fileInput += intputLine;
fileOutput += outputLine;
i++;
}
long startTime = System.currentTimeMillis();
rewriteRun(
spec -> spec.recipe(new Find("text", null, null, null, null, null))
.dataTable(TextMatches.Row.class, rows -> {
assertThat(rows).hasSize(numberOfEntries+1);
assertThat(rows.get(0).getMatch()).isEqualTo("This is ~~>text.");
}),
text(
fileInput,
fileOutput
)
);
long elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(elapsedTime + "ms");
}
If we want to improve the performance even more we should either avoid computing that line terminator index or use an ArrayList and then use binary search. Personally I don't see the point of precomputing this, at least not for the whole file (only for the range containing matches) and especially not before finding any matches. Also note that both indexOf() and lastIndexOf() both have an overload accepting a starting point. The index (or some custom logic) may make sense for files with very long lines. So if we want to keep the index we could compute the index for the part after the first match (and the terminator before it) and use binary search. |
988c910
to
98be617
Compare
What's changed?
The
Find
has been modified to increase its performance. Nearly all these optimizations are related to how Strings are handled.Main changes
StringBuilder
where we set the capacity to the exact length of the final String. Apart from setting the capacity exactly, this will mostly only affect OpenRewrite users using old JVM, since most modern JVM will often already generate optimized bytecode using a StringBuilder.What's your motivation?
Make it faster.
Anyone you would like to review specifically?
@kmccarp @timtebeek
Checklist