Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config option mongo_count_timeout to skip the global count per request #1757

Merged
merged 5 commits into from
Sep 27, 2023

Conversation

ml-evs
Copy link
Member

@ml-evs ml-evs commented Aug 24, 2023

It seems that our mongo implementation is very slow for large collections, in part because of the global structure count required for each filter. This PR adds the ability to disable that (and thus disable data_returned).

cc @eimrek,

@codecov
Copy link

codecov bot commented Aug 24, 2023

Codecov Report

❗ No coverage uploaded for pull request base (master@22f51a1). Click here to learn what that means.
The diff coverage is 77.77%.

@@            Coverage Diff            @@
##             master    #1757   +/-   ##
=========================================
  Coverage          ?   90.77%           
=========================================
  Files             ?       74           
  Lines             ?     4627           
  Branches          ?        0           
=========================================
  Hits              ?     4200           
  Misses            ?      427           
  Partials          ?        0           
Flag Coverage Δ
project 90.77% <77.77%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
optimade/server/config.py 93.75% <100.00%> (ø)
...made/server/entry_collections/entry_collections.py 95.83% <100.00%> (ø)
optimade/server/routers/utils.py 95.12% <ø> (ø)
optimade/server/entry_collections/mongo.py 91.42% <71.42%> (ø)

@ml-evs ml-evs force-pushed the ml-evs/add_data_returned_skip branch 2 times, most recently from 9791fb8 to 4a1beeb Compare September 18, 2023 21:23
@ml-evs ml-evs changed the title Add config option elide_data_returned to skip the global count per request Add config option mongo_count_timeout to skip the global count per request Sep 18, 2023
@ml-evs ml-evs force-pushed the ml-evs/add_data_returned_skip branch from 4a1beeb to 06d41a6 Compare September 18, 2023 21:31
@ml-evs ml-evs marked this pull request as ready for review September 22, 2023 13:30
mongo_count_timeout: int = Field(
5,
description="""Number of seconds to allow MongoDB to perform a full database count before falling back to `null`.
This operation can require a full COLLSCAN for empty queries which can be prohibitively slow if the database does not fit into the active set, hence a timeout can drastically speed-up response times.""",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I do not fully understand what you are writing here, but I think MongoDB should know how many entries there are in a collection, so for an empty filter the query should not be slow. For a more complex query for which MongoDB cannot drastically reduce the number of entries using one of the already existing indexes this would still be a useful feature though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it shouldn't be slow, but previously for an empty filter we were just naively calling count_documents which does still do a full scan that can be very slow. Now, I am using estimated_document_count for this case, which uses simple collection metadata to just return the number. However for filters, it can still make use of this timeout.

@eimrek
Copy link
Member

eimrek commented Sep 27, 2023

Gave this a try on the MC server for the big database, and everything works well.

See https://dev-optimade.materialscloud.org/archive/li-ion-conductors/v1/structures

Would be good to have this merged. Thanks!

However, the links:next is still broken for our APIs, but i realized it's broken also for releases 0.25.1 and 0.25.2, so it's not related to this change.

@ml-evs ml-evs added the server Issues pertaining to the example server implementation label Sep 27, 2023
@ml-evs ml-evs force-pushed the ml-evs/add_data_returned_skip branch from 06d41a6 to 13521da Compare September 27, 2023 18:22
@ml-evs ml-evs merged commit 7269566 into master Sep 27, 2023
11 checks passed
@ml-evs ml-evs deleted the ml-evs/add_data_returned_skip branch September 27, 2023 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
server Issues pertaining to the example server implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants