REST Error on pulling a large amount of data via patterns (Legacy API) #3234

matthewcarbone · 2023-08-09T15:30:49Z

Description

When attempting to pull a "large" amount of data via the MPRester get_data() method, even just getting Materials Project IDs consistent with a pattern, e.g.

pattern = "Ti-O-*-*-*"  # All titanium oxides with 5 unique atom types
with MPRester(api_key) as mpr:
    result = mpr.get_data(pattern, prop="material_id")

This leads to a REST error where it appears the query is too large. E.g.,

MPRestError: BSON document too large (39439050 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.. Content: b'{"valid_response": false, "error": "BSON document too large (39439050 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.", "version": {"db": "2020_09_08", "pymatgen": "2022.0.8", "rest": "2.0"}, "created_at": "2023-08-09T08:25:12.741976", "traceback": "Traceback (most recent call last):\n File \"/var/www/python/matgen_prod/materials_django/rest/rest.py\", line 95, in wrapped\n d = func(*args, **kwargs)\n File \"/var/www/python/matgen_prod/materials_django/materials/rest.py\", line 121, in get_vasp_property\n entries = mdb.mat_qe.get_entries(crit, False, supported_properties)\n File \"/opt/miniconda3/envs/mpprod3/lib/python3.8/site-packages/matgendb/query_engine.py\", line 301, in get_entries\n for c in self.query(fields, criteria):\n File \"/opt/miniconda3/envs/mpprod3/lib/python3.8/site-packages/matgendb/query_engine.py\", line 654, in _result_generator\n for r in self._results:\n File \"/opt/miniconda3/envs/mpprod3/lib/python3.8/site-packages/pymongo/cursor.py\", line 1189, in next\n if len(self.__data) or self._refresh():\n File \"/opt/miniconda3/envs/mpprod3/lib/python3.8/site-packages/pymongo/cursor.py\", line 1104, in _refresh\n self.__send_message(q)\n File \"/opt/miniconda3/envs/mpprod3/lib/python3.8/site-packages/pymongo/cursor.py\", line 930, in __send_message\n response = client._send_message_with_response(\n File \"/opt/miniconda3/envs/mpprod3/lib/python3.8/site-packages/pymongo/mongo_client.py\", line 1138, in _send_message_with_response\n return self._reset_on_error(\n File \"/opt/miniconda3/envs/mpprod3/lib/python3.8/site-packages/pymongo/mongo_client.py\", line 1156, in _reset_on_error\n return func(*args, **kwargs)\n File \"/opt/miniconda3/envs/mpprod3/lib/python3.8/site-packages/pymongo/server.py\", line 105, in send_message_with_response\n sock_info.send_message(data, max_doc_size)\n File \"/opt/miniconda3/envs/mpprod3/lib/python3.8/site-packages/pymongo/pool.py\", line 593, in send_message\n raise DocumentTooLarge(\npymongo.errors.DocumentTooLarge: BSON document too large (39439050 bytes) - the connected server supports BSON document sizes up to 16777216 bytes.\n"}'

Repro

See above

Expected behavior

I believe there should be some protocol to split up the query or something. It seems a bit odd I cannot pull data like this, and I am not sure what the alternative would be. Again, I am just attempting to get the Materials Project IDs themselves. In principle, not even pulling structural data.

Is this something that is fixed on the new API? Regardless it should probably be working everywhere, I'd think.

Thanks!

Environment

MacOS M1 Ventura 13.4.1
pymatgen==2022.5.26

The text was updated successfully, but these errors were encountered:

shyuep · 2023-08-10T15:04:27Z

Pls use the new API.
This is actually not a good way to use this method. You can easily get all the mpids AND formulas/chemical systems in one shot and just postprocess that data to get the mpids of the specific systems.
Even if you prefer to use this method, having two wild cards makes it very difficult due to the sheer number of combinations. You can always use one wild-card with a loop on the other element.

matthewcarbone · 2023-08-10T15:17:09Z

@shyuep due respect none of these points answered my question. I would prefer if you reopened the issue so we can discuss how to make this feature better!

I would very much like to, but I can't for my uses. For instance: FEFFDictSet write_input appears to be bugged in some cases #3187
I'm aware. I set the pulled properties to just the Materials IDs in order to demonstrate that even pulling the minimum amount of data leads to this error. Even so, what method are you referring to here?
This is a reasonable recommendation but IMO should be implemented under the hood in PMG. I don't think the user should have to deal with this type of subtlety. Do you agree?

shyuep · 2023-08-10T18:00:05Z

I just fixed the FEFFDictSet issue. That should allow you to use the new API.
As for implementing it under the hood, the premise is that you are asking for 92x92x92 = 778688 chemical systems (each * is approximately 92 elements of the periodic table), each with tens, if not hundreds of structures. There is a lot of overlap in there too (because the total number of chemical systems even exceeds the total number of structures in the Materials Project). So this is not a reasonable query that can be handled even if we did a loop. In fact, the reasonable query in your case would be to find all materials containing Ti and O, and then set nelements=5 to fix the total number of elements.

shyuep closed this as completed Aug 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REST Error on pulling a large amount of data via patterns (Legacy API) #3234

REST Error on pulling a large amount of data via patterns (Legacy API) #3234

matthewcarbone commented Aug 9, 2023

shyuep commented Aug 10, 2023

matthewcarbone commented Aug 10, 2023 •

edited

Loading

shyuep commented Aug 10, 2023

REST Error on pulling a large amount of data via patterns (Legacy API) #3234

REST Error on pulling a large amount of data via patterns (Legacy API) #3234

Comments

matthewcarbone commented Aug 9, 2023

Description

Repro

Expected behavior

Environment

shyuep commented Aug 10, 2023

matthewcarbone commented Aug 10, 2023 • edited Loading

shyuep commented Aug 10, 2023

matthewcarbone commented Aug 10, 2023 •

edited

Loading