Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial data #467

Merged
merged 62 commits into from
Jun 16, 2023
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
70f6d9b
Add header for parial data appendix
rartino Jun 8, 2023
9a5a36f
First paragraph of partial data appendix
rartino Jun 8, 2023
0680b2f
Adding a JSON-API response example and to partial data examples.
sauliusg Jun 8, 2023
4063e13
Updating the partial response examples.
sauliusg Jun 8, 2023
1a7230f
A format of partial data URLs agreed with Giovanni.
sauliusg Jun 8, 2023
14b9e9d
Removing scaffold comments.
sauliusg Jun 8, 2023
104ed78
Fixinhg the formatting: removing trailing blanks, unfolding text lines.
sauliusg Jun 8, 2023
9fa3b29
Updating the partial data examples to be consistent with the new
sauliusg Jun 8, 2023
6ec5a11
Checking spelling, updating the ".words.lst" file.
sauliusg Jun 8, 2023
92e7c12
Full text of partial data format appendix
rartino Jun 8, 2023
cc3da46
Merge branch 'partial_data' of https://github.com/rartino/OPTIMADE in…
sauliusg Jun 8, 2023
7a1f684
Slight changes in the text.
sauliusg Jun 8, 2023
0245999
Apply suggestions from review
rartino Jun 8, 2023
06d6444
Apply suggestions from review
rartino Jun 8, 2023
3e5fa16
Delete trailing whitespace
rartino Jun 8, 2023
7eddd27
Fix descriptio of the data -> meta fields in the JSON response format
rartino Jun 8, 2023
3e1f04c
Fixing the "next" link definition.
sauliusg Jun 9, 2023
05eacc2
Update optimade.rst
rartino Jun 10, 2023
beaaeef
Apply suggestions from review
rartino Jun 10, 2023
50c355e
Update based on review
rartino Jun 10, 2023
7a92260
Revert unneseccary change to .words.lst
rartino Jun 11, 2023
8f4db09
Apply suggestions from review
rartino Jun 12, 2023
16d60f6
Slightly change the format of the markers
rartino Jun 12, 2023
e109706
Improve clarity for when number of lines does not match response_range
rartino Jun 12, 2023
34bdf2a
Remove trailing whitespace
rartino Jun 12, 2023
961f5b7
Apply suggestions from review
rartino Jun 14, 2023
874bd52
Apply suggestions from review
rartino Jun 15, 2023
7b314af
Add a key to the header to identify the format as OPTIMADE partial data
rartino Jun 15, 2023
6faf8db
Remove trailing whitespace
rartino Jun 15, 2023
316df78
Clarify handling of missing items in partial data
rartino Jun 15, 2023
b080cf2
Change markers to be more detectable in stream
rartino Jun 15, 2023
bd93804
Change markers to be more detectable in stream
rartino Jun 15, 2023
10bc845
Change markers to be more detectable in stream
rartino Jun 15, 2023
39d9ae5
Change format to representation to avoid a clash in terms and fieldnames
rartino Jun 15, 2023
2a24c1a
Enable for efficient parsing of responses a server knows has no refer…
rartino Jun 15, 2023
9d9e26e
Change format to representation to avoid a clash in terms and fieldnames
rartino Jun 15, 2023
ff5a27c
Rename partial_data_url and url to link to better conform to JSON API…
rartino Jun 15, 2023
8ae1928
Rename partial_data_url and url to link to better conform to JSON API…
rartino Jun 15, 2023
d8a11cb
Rename partial_data_url and url to link to better conform to JSON API…
rartino Jun 15, 2023
11900c5
Rename partial_data_url and url to link to better conform to JSON API…
rartino Jun 15, 2023
1b4093e
Remove trailing whitespace
rartino Jun 15, 2023
496b6ca
Change representation to layout to not confuse with URL representatio…
rartino Jun 15, 2023
4d906a2
Remove accidental leftover text.
rartino Jun 15, 2023
b6ab3ae
Fix segment incorrectly placed
rartino Jun 15, 2023
ee4c1e3
Fix braces in partial data examples
rartino Jun 15, 2023
1b0d1a6
Make returned_range RECOMMENDED and move a sentence that had ended up…
rartino Jun 15, 2023
1b9c607
Fix whitespace
rartino Jun 15, 2023
562d651
Improve formulation about partial data URLs
rartino Jun 15, 2023
498d169
Slightly adjust wording
rartino Jun 15, 2023
e5e6046
Slightly adjust wording
rartino Jun 15, 2023
4906c4f
Slightly adjust wording
rartino Jun 15, 2023
864450d
Slightly adjust wording
rartino Jun 15, 2023
e574106
Minor reformulations
rartino Jun 15, 2023
336ef21
Minor reformulations
rartino Jun 15, 2023
93ee583
Rearrange some text to be more logical
rartino Jun 15, 2023
edf4f25
Clarify optimade-partial-data/format field futureproofing
rartino Jun 15, 2023
5b13315
Minor reformulations and adjustments
rartino Jun 15, 2023
2cfe8c0
Allow an inline item_schema in addition to the link
rartino Jun 15, 2023
4e9fb4d
Fix missing quotation marks
rartino Jun 15, 2023
b50d93d
Minor language corrections from review
rartino Jun 16, 2023
dfc24d4
Add sentence about implementations decision on what is partial data
rartino Jun 16, 2023
a0aa533
Merge branch 'develop' into partial_data
rartino Jun 16, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .words.lst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
personal_ws-1.1 en 205
personal_ws-1.1 en 209
rartino marked this conversation as resolved.
Show resolved Hide resolved
rartino marked this conversation as resolved.
Show resolved Hide resolved
ABNF
ACM
Aa
Expand Down Expand Up @@ -86,6 +86,7 @@ bandgap
bd
booktitle
boolean
bzip
calc
cartesian
checksums
Expand Down Expand Up @@ -115,18 +116,21 @@ exclusiveMinimum
exmpl
fieldname
firstname
hdf
howpublished
href
html
http
hydrogens
hydroperoxide
implementers
incrementing
internaldb
javascript
json
jsonapi
jsonc
jsonlines
kvak
lastname
libc
Expand Down
155 changes: 155 additions & 0 deletions optimade.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3200,6 +3200,158 @@ Relationships with files may be used to relate an entry with any number of :entr
Appendices
==========

OPTIMADE JSON lines partial data format
---------------------------------------
The OPTIMADE JSON lines partial data format is a lightweight format for transmitting property data that are too large to fit in a single OPTIMADE response.
In this case, the usual OPTIMADE response gives the value :val:`null` for the property, and the per-entry metadata specifies a URL that can be used to fetch the missing data in this format.
See `Per-property metadata`_ for information on the per-entry and per-property metadata format.

.. _slice object:
rartino marked this conversation as resolved.
Show resolved Hide resolved

The JSON Lines specification is used to transmit long array data in a "streaming JSON" way [... ref ...].
To aid the definition of the "json lines" format below, we first define a "slice object" to be a JSON object describing slices of arrays.
The dictionary has the following OPTIONAL fields:

- :field:`"start"`: Integer.
rartino marked this conversation as resolved.
Show resolved Hide resolved
The slice starts at the value with the given index (inclusive).
The default is 0, i.e., the value at the start of the array.
- :field:`"stop"`
The slice ends at the value with the given index (inclusive).
JPBergsma marked this conversation as resolved.
Show resolved Hide resolved
The default is the last value of the array.
rartino marked this conversation as resolved.
Show resolved Hide resolved
- :field:`"step"`
The absolute difference in index between two subsequent values that are included in the slice.
The default is 1, i.e., every value in the range indicated by :field:`start` and :field:`stop` is included in the slice.
For example, a value of 2 denotes a slice of every second value in the array.
rartino marked this conversation as resolved.
Show resolved Hide resolved

Furthermore, we also define the following special markers:

- The "end-of-data--marker" is this exact JSON: :val:`[["end"], ""]`.
rartino marked this conversation as resolved.
Show resolved Hide resolved
- A "reference-marker" is this exact JSON: :val:`[["ref"], "URL"]`, where :val:`"URL"` is to be replaced with a URL being referenced.
- A "next-marker" is this exact JSON: :val:`[["end"], "URK"]`, where :val:`"URL"` is to be replaced with the target URL for the next link.
rartino marked this conversation as resolved.
Show resolved Hide resolved
rartino marked this conversation as resolved.
Show resolved Hide resolved

These JSON markers have been deliberately designed as lists with items of mixed data types, and thus cannot be encountered inside the actual data of an OPTIMADE property.
rartino marked this conversation as resolved.
Show resolved Hide resolved

The full response MUST be valid `json lines <https://jsonlines.org/>`__ that adheres to the format:
merkys marked this conversation as resolved.
Show resolved Hide resolved

- The first line is a header object (defined below)
- The following lines are data lines adhering to the formats described below.
- The final line is either an end-of-data--marker (indicating that there is no more data to be given), or a next-marker indicating that more data is available, which can be obtained by retriving data from the provided URL.
rartino marked this conversation as resolved.
Show resolved Hide resolved
rartino marked this conversation as resolved.
Show resolved Hide resolved

The first line MUST be a JSON object providing header information.
The header object MUST contain the key:

- :field:`"format"`: String.
rartino marked this conversation as resolved.
Show resolved Hide resolved
A string either equal to :val:`"dense"` or :val:`"sparse"` to indicate whether the returned format is dense or sparse.

The header object MAY also contain the key:

- :field:`"returned_range"`: Object.
rartino marked this conversation as resolved.
Show resolved Hide resolved
A `slice object`_ representing the range of data present in the response.
Once the client has encountered an end-of-data--marker (defined below), any data not covered by the encountered slices are to be assigned the value :val:`null`.
If the format is `"dense"` and :field:`returned_range` is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data--marker or next-marker (defined below).
rartino marked this conversation as resolved.
Show resolved Hide resolved
rartino marked this conversation as resolved.
Show resolved Hide resolved

The format of data lines of the response (i.e., all lines except the first and the last) depends on whether the header object specifies the format as :val:`"dense"` or :val:`sparse`.

- **Dense format:** In the dense partial data format, each data line reproduces one list item in the OPTIMADE list property being transmitted in JSON format.
If OPTIMADE list properties are embedded inside the item, they can either be included in full or replaced with a reference-marker.
If a list is replaced by a reference marker, the client MAY use the provided URL to obtain the list items, which is then also provided in the JSON lines partial data format.
rartino marked this conversation as resolved.
Show resolved Hide resolved

- **Sparse format for one-dimensional list:** When the response sparsely communicates items for a one-dimensional OPTIMADE list property, each data line contains a JSON array on the format:

- The first item is the index of the item provided.
rartino marked this conversation as resolved.
Show resolved Hide resolved
- The second item is a JSON representation of the item, on the same format as the lines in the dense format.
rartino marked this conversation as resolved.
Show resolved Hide resolved
In the same way as for the dense format, reference-markers are allowed for data that does not fit in the response.
rartino marked this conversation as resolved.
Show resolved Hide resolved

- **Sparse format for multi-dimensional lists:** Specifically for the case that the OPTIMADE property represents a series of directly hierarchically embedded lists, the server MAY represent them using a sparse multi-dimensional format.
rartino marked this conversation as resolved.
Show resolved Hide resolved
In this case, each data line contains a JSON array on the format:
rartino marked this conversation as resolved.
Show resolved Hide resolved

- All items except the last item are coordinates providing indices in the embedded dimensions in the order of outermost to innermost.
rartino marked this conversation as resolved.
Show resolved Hide resolved
- The last item is a JSON representation of the item at those coordinates, on the same format as the lines in the dense format.
rartino marked this conversation as resolved.
Show resolved Hide resolved
In the same way as for the dense format, reference-markers are allowed for data that does not fit in the response.


Examples
--------
rartino marked this conversation as resolved.
Show resolved Hide resolved

An example of an OPTIMADE JSON-API response that contains a link to a partial data protocol URL:
rartino marked this conversation as resolved.
Show resolved Hide resolved

.. code:: json
{
"data": {
"type": "structures",
"id": "2345678",
"attributes": {
"a": null
}
}
"meta": {
"partial_data_urls": {
"a": [
{
"format": "plain-jsonlines",
rartino marked this conversation as resolved.
Show resolved Hide resolved
"url": "https://example.db.org/assets/partial_values/structures/2345678/a/default_format"
},
{
"format": "bzip2-jsonlines",
"url": "https://example.db.org/assets/partial_values/structures/2345678/a/bzip2_format"
},
{
"format": "hdf5",
rartino marked this conversation as resolved.
Show resolved Hide resolved
"url": "https://example.db.org/assets/partial_values/structures/2345678/a/hdf5"
}
]
}
"property_metadata": {
"a": {

}
}
}
}

An example of a dense response for a partial array data, scalar values:
rartino marked this conversation as resolved.
Show resolved Hide resolved
rartino marked this conversation as resolved.
Show resolved Hide resolved

.. code:: json
{"format": "dense", "returned_range": {"start": 10, "stop": 20, "step": 2}}
rartino marked this conversation as resolved.
Show resolved Hide resolved
123
345
-12.6
[["next"], "https://example.db.org/value4"]
rartino marked this conversation as resolved.
Show resolved Hide resolved

An example of a dense response for a partial array data, multidimensional array values:
rartino marked this conversation as resolved.
Show resolved Hide resolved

.. code:: json
{"format": "dense", "returned_range": {"start": 10, "stop": 20, "step": 2}}
rartino marked this conversation as resolved.
Show resolved Hide resolved
[[10,20,21], [30,40,50]]
[["ext"], "https://example.db.org/value2"]]
rartino marked this conversation as resolved.
Show resolved Hide resolved
[[11, 110], [["ref"], "https://example.db.org/value3"], [550, 333]]
[["next"], "https://example.db.org/value4"]
rartino marked this conversation as resolved.
Show resolved Hide resolved

An example of a sparse response for a partial array data with aggregated dimensions, single dimension array:
rartino marked this conversation as resolved.
Show resolved Hide resolved

.. code:: json
{"format": "sparse"}
[3,5,19, [10,20,21,30]]
[30,15,9, [["ref"], "https://example.db.org/value1"]]
[["next"], "https://example.db.org/"]
rartino marked this conversation as resolved.
Show resolved Hide resolved

An example of a sparse response for a partial array data with aggregated dimensions, scalar values:

.. code:: json
{"format": "sparse"}
[3,5,19, 10]
[30,15,9, 31]
[["next"], "https://example.db.org/"]
rartino marked this conversation as resolved.
Show resolved Hide resolved

An example of a sparse response for a partial array data with aggregated dimensions, multidimensional array:

.. code:: json
{"format": "sparse"}
[3,5,19, [ [10,20,21], [30,40,50] ]
[3,7,19, [["ext"], "https://example.db.org/value2"]]
[4,5,19, [ [11, 110], [["ext"], "https://example.db.org/value3"], [550, 333]]
rartino marked this conversation as resolved.
Show resolved Hide resolved
[["end"], ""]

The Filter Language EBNF Grammar
--------------------------------

Expand Down Expand Up @@ -3421,3 +3573,6 @@ The strings below contain Extended Regular Expressions (EREs) to recognize ident
#BEGIN ERE strings
"([^\"]|\\.)*"
#END ERE strings