-
Notifications
You must be signed in to change notification settings - Fork 253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(NODE-5557): move DataView and Set allocation used for double parsing and utf8 validation to nested path #611
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @billouboq thanks so much for contributing! I have a couple questions for you and some cleanup to get this closer to ready
Can you share some performance numbers you're observing after this change?
@nbbeeken I just updated the code and add a new benchmark that is really close to my real world usage. Made some benchmark tests and here are the mean durations : No changes : 548.269 - 555 |
031fc4f
to
d6d8979
Compare
@W-A-James Hello, any chance this one get's merged ? |
Hi @billouboq, apologies for the long turnaround time on this PR. We are currently in the middle of reworking how we do performance testing for js-bson (see NODE-3654). As a result, we're putting a pause on re-reviewing and potentially merging this and other potential performance improvements until that work is done. At that point, with more robust benchmarking infrastructure in place, we will reassess this PR and come to a decision. Thanks for all your work on this repo. |
Hello @W-A-James, do you have any news about this ? It seems people are having performance issues with the new mongodb driver : |
Hello, I would like to know what's holding us from using that optimisation ? Is there something I can do to make it pass ? |
Hi @billouboq, thanks for reaching out, we recently were unblocked on this (see NODE-5557) with the completion of our suite of BSON benchmarks that Warren mentioned. Now we can see the effect of this change per data type and collectively on mixed documents. Do you think your document represents a case unique enough to be separate from the bestbuy test case? If so we can include this document as its own test. I am currently working on performance measuring and improvements, so I will be taking another look at this soon. I am interested in seeing the results of the double-focused tests we have they should reveal if this change caused a delay due to GC of the DataView on every iteration. I am thinking we can split the difference here by storing the dataview on a |
Thanks for the update
I created a test that was representing something close to what I could use in a production environment, so with lot's of different types of values. We can remove it, it was more a test to check how much improvements I could have in our own environment.
That might be a good idea indeed ! Don't hesitate to tell me what I should change or you can close that PR and make a new PR based on my code, up to you |
Awesome thanks for sticking with this after so much time :) Could you rebase this and move the dataview variable back up to the top as a let, and the double code path can use I will handle running the benchmarks on my end and report back the results, TIA! |
I just have small possible improvements to test first based on the same kind of changes from that PR. I will rebase after those testings |
d6d8979
to
a2e03d1
Compare
@nbbeeken I rebased and pushed only the needed changes 👍 Also, I do not have the time to dig more, but I feel like we could improve date deserialisation performance by skipping creating a "new Long" and just create a small function that take lowBits and hightBits and do only this part without the context "this" : // taken from Long class file
toNumber(): number {
if (this.unsigned) return (this.high >>> 0) * TWO_PWR_32_DBL + (this.low >>> 0);
return this.high * TWO_PWR_32_DBL + (this.low >>> 0);
} That way we avoid creating a new Long object just for the sake of converting it directly. But it should be benchmarked to make sure it's worse it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @billouboq for the help! LGTM
// bytes: { _id: new bson.Int32(2) }
bson.deserialize(bytes, {validation: {utf8:false}})
do_nothing x 851,290,397 ops/sec ±0.05% (196 runs sampled)
bson_deserialize_current x 3,191,767 ops/sec ±0.20% (193 runs sampled)
bson_deserialize_nested_dv x 6,522,438 ops/sec ±0.15% (194 runs sampled)
I ran our granular (per bson type) benchmarks and I see numbers corroborating this, for non-doubles we get way better performance, looks like DataView is not too great. I will file follow up tickets for your suggestion about Date's and I bet we can get Doubles to be even better if we use the readDoubleLE API that Node.js offers (when running in Node.js)
Filed:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you @billouboq!
Description
This is a small performance improvement that has 0 impact on the code behaviour.
Things can be improved a little bit by changing the UTF-8 validation option, but this is more work, will see if I can test it later
What is changing?
Remove uneeded Set and dataview creation.
Also added a new benchmark with a more "natural" object to serialize
Is there new documentation needed for these changes?
No documentation changes needed
What is the motivation for this change?
Increasing deserialization performance
Release Highlight
Deserialization performance increased
If BSON data does not contain Doubles and UTF8 validation is disabled the deserializer is careful to not allocate data structures needed to support that functionality. This has shown to greatly increase (2x-1.3x) the performance of the deserializer.
Thank you @billouboq for this contribution!
Double check the following
npm run check:lint
scripttype(NODE-xxxx)[!]: description
feat([NODE-1234](https://jira.mongodb.org/browse/NODE-1234))!: rewriting everything in coffeescript