-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MongoDB slow for large databases #24
Comments
Yeah, the counting is a pain but I don't see an easy way around it. Each query needs to know the total number of structures it returns in the response, according to the spec. The only speed-up we could do would be to implement some session-based cursor so that we don't have to repeat queries with a new skip every request (and thus don't have to repeat the count). As the |
You can try installing from Materials-Consortia/optimade-python-tools#1757 and using the |
Reading this again, do you just mean that |
Thanks for the comments and the modification Materials-Consortia/optimade-python-tools#1757. I'll test it when i have a moment. Regarding your last question, just to explain it a bit more: if I run
You see that it just outputs the count. this takes 2+ minutes. If i do
I get the same result immediately. Of course, for more complicated queries, this probably doesn't hold. Regarding if this is the only slow part, i'm not sure. Nothing else was in the profiling log. I suspect that if |
Hi @ml-evs, the changes in Materials-Consortia/optimade-python-tools#1757 indeed made the API fast, see here: https://dev-optimade.materialscloud.org/archive/li-ion-conductors/v1/structures However, there seems to be a small issue, the https://dev-optimade.materialscloud.org/archive/li-ion-conductors/v1/structures?page_offset=25 It shows the |
I spent some time investigating the slowness of the
li-ion-conductors
optimade database.Here's the info endpoint: https://dev-optimade.materialscloud.org/archive/li-ion-conductors/v1/info
Accessing the
/structures
endpoint takes over 2 minutes.I turned on performance profiling (
db.setProfilingLevel(1)
) and here's the part of the log for accessing the/structures
endpoint (only the slow commands should show up here, meaning everything else was fast, i think):/structures profiling (click to show)
Some key points:
$match
everything in the collection; group by$_id
and then just sum the number).COLLSCAN
). I don't think this command, in it's current state, could be sped up by using indexes.I am wondering if this functionality could be implemented in a more efficient way. For example,
db.structures.count()
runs instantly.Just for additional information, initially also accessing a single structure was as slow (2+ min), e.g. via
https://dev-optimade.materialscloud.org/archive/li-ion-conductors/v1/structures/5b5b4b01-5b7e-48ad-8e17-8077f9b0b5d2
But after I added the
id
index withdb.structures.createIndex({ id: 1 })
, it's fast now.Pinging @ml-evs @unkcpz @superstar54 for comments/ideas regarding the "counting" speedup. i suspect this is probably something that should be adapted in https://github.com/Materials-Consortia/optimade-python-tools?
The text was updated successfully, but these errors were encountered: