-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add config option mongo_count_timeout
to skip the global count per request
#1757
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1757 +/- ##
=========================================
Coverage ? 90.77%
=========================================
Files ? 74
Lines ? 4627
Branches ? 0
=========================================
Hits ? 4200
Misses ? 427
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more.
|
9791fb8
to
4a1beeb
Compare
elide_data_returned
to skip the global count per requestmongo_count_timeout
to skip the global count per request
4a1beeb
to
06d41a6
Compare
mongo_count_timeout: int = Field( | ||
5, | ||
description="""Number of seconds to allow MongoDB to perform a full database count before falling back to `null`. | ||
This operation can require a full COLLSCAN for empty queries which can be prohibitively slow if the database does not fit into the active set, hence a timeout can drastically speed-up response times.""", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I do not fully understand what you are writing here, but I think MongoDB should know how many entries there are in a collection, so for an empty filter the query should not be slow. For a more complex query for which MongoDB cannot drastically reduce the number of entries using one of the already existing indexes this would still be a useful feature though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, it shouldn't be slow, but previously for an empty filter we were just naively calling count_documents
which does still do a full scan that can be very slow. Now, I am using estimated_document_count
for this case, which uses simple collection metadata to just return the number. However for filters, it can still make use of this timeout.
Gave this a try on the MC server for the big database, and everything works well. See https://dev-optimade.materialscloud.org/archive/li-ion-conductors/v1/structures Would be good to have this merged. Thanks! However, the links:next is still broken for our APIs, but i realized it's broken also for releases |
06d41a6
to
13521da
Compare
It seems that our mongo implementation is very slow for large collections, in part because of the global structure count required for each filter. This PR adds the ability to disable that (and thus disable
data_returned
).cc @eimrek,