From 70f6d9b28be40a9c9813e285aba08cd4b9da411e Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 8 Jun 2023 12:24:15 +0200 Subject: [PATCH 01/60] Add header for parial data appendix --- optimade.rst | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/optimade.rst b/optimade.rst index 2947c3fcf..75a2328cc 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3200,6 +3200,11 @@ Relationships with files may be used to relate an entry with any number of :entr Appendices ========== +OPTIMADE partial data format +---------------------------- + + + The Filter Language EBNF Grammar -------------------------------- @@ -3421,3 +3426,6 @@ The strings below contain Extended Regular Expressions (EREs) to recognize ident #BEGIN ERE strings "([^\"]|\\.)*" #END ERE strings + + + From 9a5a36fa79df6e4071b54f3941459a285f410c3d Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 8 Jun 2023 12:44:04 +0200 Subject: [PATCH 02/60] First paragraph of partial data appendix --- optimade.rst | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/optimade.rst b/optimade.rst index 75a2328cc..82c3424be 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3200,11 +3200,19 @@ Relationships with files may be used to relate an entry with any number of :entr Appendices ========== -OPTIMADE partial data format ----------------------------- +OPTIMADE partial data protocol +------------------------------ +The OPTIMADE partial data protocol is a lightweight REST protocol for transmission of property data which is too large to fit in a single response. +The OPTIMADE response can in this case assign the property the value :val:`null` in the response, and in the per-property metadata for that property specify a URL that can be used to fetch the missing data using the OPTIMADE partial data protocol. +See `Per_property_metadata`_ for information on the format of the per-property metadata. +This section describes the REST interface and response format provided via this URL. +Examples +-------- + + The Filter Language EBNF Grammar -------------------------------- From 0680b2fecf04b40316973274c7db226073603ddf Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Saulius=20Gra=C5=BEulis?= Date: Thu, 8 Jun 2023 16:51:36 +0300 Subject: [PATCH 03/60] Adding a JSON-API response example and to partial data examples. --- optimade.rst | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 70 insertions(+), 2 deletions(-) diff --git a/optimade.rst b/optimade.rst index 82c3424be..f5b724c8f 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3207,12 +3207,80 @@ The OPTIMADE response can in this case assign the property the value :val:`null` See `Per_property_metadata`_ for information on the format of the per-property metadata. This section describes the REST interface and response format provided via this URL. - - Examples -------- +An example of an OPTIMADE JSON-API response that contains a link to a partial data protocol URL: +.. code:: json + { + "data": { + "type": "structures", + "id": "2345678", + "attributes": { + "a": null + } + } + "meta": { + "partial_data_urls": { // either this... + "a": [ + { + "format": "plain-jsonlines", + "url": "https://example.db.org/assets/partial_values/structures/2345678/a/default_format" + }, + { + "format": "bzip2-jsonlines", + "url": "https://example.db.org/assets/partial_values/structures/2345678/a/bzip2_format" + }, + { + "format": "hdf5", + "url": "https://example.db.org/assets/partial_values/structures/2345678/a/hdf5" + } + ] + } + "property_metadata": { // or this...: + "a": [ + { + "format": "plain-jsonlines", + "url": "https://example.db.org/assets/partial_values/structures/2345678/a/default_format" + }, + { + "format": "bzip2-jsonlines", + "url": "https://example.db.org/assets/partial_values/structures/2345678/a/bzip2_format" + }, + { + "format": "hdf5", + "url": "https://example.db.org/assets/partial_values/structures/2345678/a/hdf5" + } + ] + } + } + } + + +An example of a dense response for a partial array data (based on the array of + +.. code:: json + {"format": "dense", "returned_range": {"start": 1, "stop": 20, "step": 2}} + +An example of a sparse response for a partial array data with aggregated dimensions: + +.. code:: json + {"format": "sparse"} + [3,5,19, [10,20,21,30]] + [3,5,19, [["ext"], "https://example.db.org/value1"]] + [["next"], "https://example.db.org/"] + +a[][3][5][19][][] + +.. code:: json + {"format": "sparse"} + [3,5,19, [ [10,20,21], [30,40,50] ] + [3,7,19, [["ext"], "https://example.db.org/value2"]] + [4,5,19, [ [11, 110], [["ext"], "https://example.db.org/value3"], [550, 333]] + [["next"], null] + + The Filter Language EBNF Grammar -------------------------------- From 4063e135942abd88080c049952b538c6f388a21a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Saulius=20Gra=C5=BEulis?= Date: Thu, 8 Jun 2023 16:58:36 +0300 Subject: [PATCH 04/60] Updating the partial response examples. --- optimade.rst | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/optimade.rst b/optimade.rst index f5b724c8f..705f23440 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3258,10 +3258,14 @@ An example of an OPTIMADE JSON-API response that contains a link to a partial da } -An example of a dense response for a partial array data (based on the array of +An example of a dense response for a partial array data: .. code:: json - {"format": "dense", "returned_range": {"start": 1, "stop": 20, "step": 2}} + {"format": "dense", "returned_range": {"start": 10, "stop": 20, "step": 2}} + [[10,20,21], [30,40,50]] + [["ext"], "https://example.db.org/value2"]] + [[11, 110], [["ext"], "https://example.db.org/value3"], [550, 333]] + [["next"], "https://example.db.org/value4"] An example of a sparse response for a partial array data with aggregated dimensions: From 1a7230fbfc237a3efce071edce360beca002dfca Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Saulius=20Gra=C5=BEulis?= Date: Thu, 8 Jun 2023 17:07:36 +0300 Subject: [PATCH 05/60] A format of partial data URLs agreed with Giovanni. --- optimade.rst | 17 +++-------------- 1 file changed, 3 insertions(+), 14 deletions(-) diff --git a/optimade.rst b/optimade.rst index 705f23440..2c87cb42e 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3239,20 +3239,9 @@ An example of an OPTIMADE JSON-API response that contains a link to a partial da ] } "property_metadata": { // or this...: - "a": [ - { - "format": "plain-jsonlines", - "url": "https://example.db.org/assets/partial_values/structures/2345678/a/default_format" - }, - { - "format": "bzip2-jsonlines", - "url": "https://example.db.org/assets/partial_values/structures/2345678/a/bzip2_format" - }, - { - "format": "hdf5", - "url": "https://example.db.org/assets/partial_values/structures/2345678/a/hdf5" - } - ] + "a": { + + } } } } From 14b9e9dec7c8ef06168e44352b798459138da3d6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Saulius=20Gra=C5=BEulis?= Date: Thu, 8 Jun 2023 17:10:05 +0300 Subject: [PATCH 06/60] Removing scaffold comments. --- optimade.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/optimade.rst b/optimade.rst index 2c87cb42e..dea00bcbd 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3222,7 +3222,7 @@ An example of an OPTIMADE JSON-API response that contains a link to a partial da } } "meta": { - "partial_data_urls": { // either this... + "partial_data_urls": { "a": [ { "format": "plain-jsonlines", @@ -3238,7 +3238,7 @@ An example of an OPTIMADE JSON-API response that contains a link to a partial da } ] } - "property_metadata": { // or this...: + "property_metadata": { "a": { } From 104ed78856de296c0d3af7dd9d4cb62c1a460e1a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Saulius=20Gra=C5=BEulis?= Date: Thu, 8 Jun 2023 17:34:25 +0300 Subject: [PATCH 07/60] Fixinhg the formatting: removing trailing blanks, unfolding text lines. --- optimade.rst | 33 ++++++++++++++++++++++++--------- 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/optimade.rst b/optimade.rst index dea00bcbd..608ab8b5b 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3246,9 +3246,17 @@ An example of an OPTIMADE JSON-API response that contains a link to a partial da } } - -An example of a dense response for a partial array data: - +An example of a dense response for a partial array data, scalar values: + +.. code:: json + {"format": "dense", "returned_range": {"start": 10, "stop": 20, "step": 2}} + 123 + 345 + -12.6 + [["next"], "https://example.db.org/value4"] + +An example of a dense response for a partial array data, multidimensional array values: + .. code:: json {"format": "dense", "returned_range": {"start": 10, "stop": 20, "step": 2}} [[10,20,21], [30,40,50]] @@ -3256,23 +3264,30 @@ An example of a dense response for a partial array data: [[11, 110], [["ext"], "https://example.db.org/value3"], [550, 333]] [["next"], "https://example.db.org/value4"] -An example of a sparse response for a partial array data with aggregated dimensions: - +An example of a sparse response for a partial array data with aggregated dimensions, single dimension array: + .. code:: json {"format": "sparse"} - [3,5,19, [10,20,21,30]] - [3,5,19, [["ext"], "https://example.db.org/value1"]] + [3,5,19, [10,20,21,30]] + [30,15,9, [["ext"], "https://example.db.org/value1"]] [["next"], "https://example.db.org/"] -a[][3][5][19][][] +An example of a sparse response for a partial array data with aggregated dimensions, scalar values: + +.. code:: json + {"format": "sparse"} + [3,5,19, 10] + [30,15,9, 31] + [["next"], "https://example.db.org/"] +An example of a sparse response for a partial array data with aggregated dimensions, multidimensional array: + .. code:: json {"format": "sparse"} [3,5,19, [ [10,20,21], [30,40,50] ] [3,7,19, [["ext"], "https://example.db.org/value2"]] [4,5,19, [ [11, 110], [["ext"], "https://example.db.org/value3"], [550, 333]] [["next"], null] - The Filter Language EBNF Grammar -------------------------------- From 9fa3b291c6250c50ab1b9e50d2065da0e31a2835 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Saulius=20Gra=C5=BEulis?= Date: Thu, 8 Jun 2023 18:10:13 +0300 Subject: [PATCH 08/60] Updating the partial data examples to be consistent with the new description text. --- optimade.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/optimade.rst b/optimade.rst index 608ab8b5b..5d88c8b85 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3261,7 +3261,7 @@ An example of a dense response for a partial array data, multidimensional array {"format": "dense", "returned_range": {"start": 10, "stop": 20, "step": 2}} [[10,20,21], [30,40,50]] [["ext"], "https://example.db.org/value2"]] - [[11, 110], [["ext"], "https://example.db.org/value3"], [550, 333]] + [[11, 110], [["ref"], "https://example.db.org/value3"], [550, 333]] [["next"], "https://example.db.org/value4"] An example of a sparse response for a partial array data with aggregated dimensions, single dimension array: @@ -3269,7 +3269,7 @@ An example of a sparse response for a partial array data with aggregated dimensi .. code:: json {"format": "sparse"} [3,5,19, [10,20,21,30]] - [30,15,9, [["ext"], "https://example.db.org/value1"]] + [30,15,9, [["ref"], "https://example.db.org/value1"]] [["next"], "https://example.db.org/"] An example of a sparse response for a partial array data with aggregated dimensions, scalar values: @@ -3287,7 +3287,7 @@ An example of a sparse response for a partial array data with aggregated dimensi [3,5,19, [ [10,20,21], [30,40,50] ] [3,7,19, [["ext"], "https://example.db.org/value2"]] [4,5,19, [ [11, 110], [["ext"], "https://example.db.org/value3"], [550, 333]] - [["next"], null] + [["end"], ""] The Filter Language EBNF Grammar -------------------------------- From 6ec5a119ef4fcbeaec63099bb7e3ae3a2473ce25 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Saulius=20Gra=C5=BEulis?= Date: Thu, 8 Jun 2023 18:15:42 +0300 Subject: [PATCH 09/60] Checking spelling, updating the ".words.lst" file. --- .words.lst | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/.words.lst b/.words.lst index 5325fe295..6cc667b7a 100644 --- a/.words.lst +++ b/.words.lst @@ -1,4 +1,4 @@ -personal_ws-1.1 en 205 +personal_ws-1.1 en 209 ABNF ACM Aa @@ -86,6 +86,7 @@ bandgap bd booktitle boolean +bzip calc cartesian checksums @@ -115,6 +116,7 @@ exclusiveMinimum exmpl fieldname firstname +hdf howpublished href html @@ -122,11 +124,13 @@ http hydrogens hydroperoxide implementers +incrementing internaldb javascript json jsonapi jsonc +jsonlines kvak lastname libc From 92e7c127a2cb39bce0d4100ca59652a2ed2bd444 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 8 Jun 2023 17:53:15 +0200 Subject: [PATCH 10/60] Full text of partial data format appendix --- optimade.rst | 74 +++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 68 insertions(+), 6 deletions(-) diff --git a/optimade.rst b/optimade.rst index 705f23440..5351959e7 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3200,12 +3200,74 @@ Relationships with files may be used to relate an entry with any number of :entr Appendices ========== -OPTIMADE partial data protocol ------------------------------- -The OPTIMADE partial data protocol is a lightweight REST protocol for transmission of property data which is too large to fit in a single response. -The OPTIMADE response can in this case assign the property the value :val:`null` in the response, and in the per-property metadata for that property specify a URL that can be used to fetch the missing data using the OPTIMADE partial data protocol. -See `Per_property_metadata`_ for information on the format of the per-property metadata. -This section describes the REST interface and response format provided via this URL. +OPTIMADE JSON lines partial data format +--------------------------------------- +The OPTIMADE JSON lines partial data format is a lightweight format for transmitting property data too large to fit in a single OPTIMADE response. +In this case, the usual OPTIMADE response gives the value :val:`null` for the property, and the per-entry metadata specifies a URL that can be used to fetch the missing data in this format. +See `Per-property metadata`_ for information on the per-entry and per-property metadata format. + +.. _slice object: + +To aid the definition of the "json lines" format below, we first define a "slice object" to be a JSON object describing slices of arrays. +The dictionary has the following OPTIONAL fields: + +- :field:`"start"`: Integer. + The slice starts at the value with the given index (inclusive). + The default is 0, i.e., the value at the start of the array. +- :field:`"stop"` + The slice ends at the value with the given index (inclusive). + The default is the last value of the array. +- :field:`"step"` + The absolute difference in index between two subsequent values that are included in the slice. + The default is 1, i.e., every value in the range indicated by :field:`start` and :field:`stop` is included in the slice. + For example, a value of 2 denotes a slice of every second value in the array. + +Furthermore, we also define the following special markers: + +- The "end-of-data--marker" is this exact JSON: :val:`[["end"], ""]`. +- A "reference-marker" is this exact JSON: :val:`[["ref"], "URL"]`, where :val:`"URL"` is to be replaced with a URL being referenced. +- A "next-marker" is this exact JSON: :val:`[["end"], "URK"]`, where :val:`"URL"` is to be replaced with the target URL for the next link. + +These JSON markers have been deliberately designed as lists with items of mixed data types, and thus cannot be encountered inside the actual data of an OPTIMADE property. + +The full response MUST be valid `json lines `__ that adheres to the format: + +- The first line is a header object (defined below) +- The following lines are data lines adhering to the formats described below. +- The final line is either an end-of-data--marker (indicating that there is no more data to be given), or a next-marker indicating that more data is available, which can be obtained by retriving data from the provided URL. + +The first line MUST be a JSON object providing header information. +The header object MUST contain the key: + +- :field:`"format"`: String. + A string either equal to :val:`"dense"` or :val:`"sparse"` to indicate whether the returned format is dense or sparse. + +The header object MAY also contain the key: + +- :field:`"returned_range"`: Object. + A `slice object`_ representing the range of data present in the response. + Once the client has encountered an end-of-data--marker (defined below), any data not covered by the encountered slices are to be assigned the value :val:`null`. + If the format is `"dense"` and :field:`returned_range` is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data--marker or next-marker (defined below). + +The format of data lines of the response (i.e., all lines except the first and the last) depends on whether the header object specifies the format as :val:`"dense"` or :val:`sparse`. + +- **Dense format:** In the dense partial data format, each data line reproduces one list item in the OPTIMADE list property being transmitted in JSON format. + If OPTIMADE list properties are embedded inside the item, they can either be included in full or replaced with a reference-marker. + If a list is replaced by a reference marker, the client MAY use the provided URL to obtain the list items, which is then also provided in the JSON lines partial data format. + +- **Sparse format for one-dimensional list:** When the response sparsely communicates items for a one-dimensional OPTIMADE list property, each data line contains a JSON array on the format: + + - The first item is the index of the item provided. + - The second item is a JSON representation of the item, on the same format as the lines in the dense format. + In the same way as for the dense format, reference-markers are allowed for data that does not fit in the response. + +- **Sparse format for multi-dimensional lists:** Specifically for the case that the OPTIMADE property represents a series of directly hierarchically embedded lists, the server MAY represent them using a sparse multi-dimensional format. + In this case, each data line contains a JSON array on the format: + + - All items except the last item are coordinates providing indices in the embedded dimensions in the order of outermost to innermost. + - The last item is a JSON representation of the item at those coordinates, on the same format as the lines in the dense format. + In the same way as for the dense format, reference-markers are allowed for data that does not fit in the response. + Examples -------- From 7a1f684a0dfe85facea97c24d4f6d1025d527d17 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Saulius=20Gra=C5=BEulis?= Date: Thu, 8 Jun 2023 18:58:45 +0300 Subject: [PATCH 11/60] Slight changes in the text. --- optimade.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index af9769a1f..eb8d37056 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3202,12 +3202,13 @@ Appendices OPTIMADE JSON lines partial data format --------------------------------------- -The OPTIMADE JSON lines partial data format is a lightweight format for transmitting property data too large to fit in a single OPTIMADE response. +The OPTIMADE JSON lines partial data format is a lightweight format for transmitting property data that are too large to fit in a single OPTIMADE response. In this case, the usual OPTIMADE response gives the value :val:`null` for the property, and the per-entry metadata specifies a URL that can be used to fetch the missing data in this format. See `Per-property metadata`_ for information on the per-entry and per-property metadata format. .. _slice object: +The JSON Lines specification is used to transmit long array data in a "streaming JSON" way [... ref ...]. To aid the definition of the "json lines" format below, we first define a "slice object" to be a JSON object describing slices of arrays. The dictionary has the following OPTIONAL fields: From 0245999df2f980471ce8256dfd10eac4de43b12f Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 8 Jun 2023 22:47:15 +0200 Subject: [PATCH 12/60] Apply suggestions from review Co-authored-by: Andrius Merkys Co-authored-by: Johan Bergsma <29785380+JPBergsma@users.noreply.github.com> --- .words.lst | 4 ++-- optimade.rst | 12 ++++++------ 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/.words.lst b/.words.lst index 6cc667b7a..9afb8515d 100644 --- a/.words.lst +++ b/.words.lst @@ -1,4 +1,4 @@ -personal_ws-1.1 en 209 +personal_ws-1.1 en 209 ABNF ACM Aa @@ -207,4 +207,4 @@ xy yacc zeo zeolites -ångström +Ã¥ngström diff --git a/optimade.rst b/optimade.rst index eb8d37056..d78d950dc 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3227,7 +3227,7 @@ Furthermore, we also define the following special markers: - The "end-of-data--marker" is this exact JSON: :val:`[["end"], ""]`. - A "reference-marker" is this exact JSON: :val:`[["ref"], "URL"]`, where :val:`"URL"` is to be replaced with a URL being referenced. -- A "next-marker" is this exact JSON: :val:`[["end"], "URK"]`, where :val:`"URL"` is to be replaced with the target URL for the next link. +- A "next-marker" is this exact JSON: :val:`[["end"], "URL"]`, where :val:`"URL"` is to be replaced with the target URL for the next link. These JSON markers have been deliberately designed as lists with items of mixed data types, and thus cannot be encountered inside the actual data of an OPTIMADE property. @@ -3235,7 +3235,7 @@ The full response MUST be valid `json lines `__ that adh - The first line is a header object (defined below) - The following lines are data lines adhering to the formats described below. -- The final line is either an end-of-data--marker (indicating that there is no more data to be given), or a next-marker indicating that more data is available, which can be obtained by retriving data from the provided URL. +- The final line is either an end-of-data--marker (indicating that there is no more data to be given), or a next-marker indicating that more data is available, which can be obtained by retrieving data from the provided URL. The first line MUST be a JSON object providing header information. The header object MUST contain the key: @@ -3263,7 +3263,7 @@ The format of data lines of the response (i.e., all lines except the first and t In the same way as for the dense format, reference-markers are allowed for data that does not fit in the response. - **Sparse format for multi-dimensional lists:** Specifically for the case that the OPTIMADE property represents a series of directly hierarchically embedded lists, the server MAY represent them using a sparse multi-dimensional format. - In this case, each data line contains a JSON array on the format: + In this case, each data line contains a JSON array in the format of: - All items except the last item are coordinates providing indices in the embedded dimensions in the order of outermost to innermost. - The last item is a JSON representation of the item at those coordinates, on the same format as the lines in the dense format. @@ -3323,7 +3323,7 @@ An example of a dense response for a partial array data, multidimensional array .. code:: json {"format": "dense", "returned_range": {"start": 10, "stop": 20, "step": 2}} [[10,20,21], [30,40,50]] - [["ext"], "https://example.db.org/value2"]] + [["ref"], "https://example.db.org/value2"] [[11, 110], [["ref"], "https://example.db.org/value3"], [550, 333]] [["next"], "https://example.db.org/value4"] @@ -3348,8 +3348,8 @@ An example of a sparse response for a partial array data with aggregated dimensi .. code:: json {"format": "sparse"} [3,5,19, [ [10,20,21], [30,40,50] ] - [3,7,19, [["ext"], "https://example.db.org/value2"]] - [4,5,19, [ [11, 110], [["ext"], "https://example.db.org/value3"], [550, 333]] + [3,7,19, [["ref"], "https://example.db.org/value2"]] + [4,5,19, [ [11, 110], [["ref"], "https://example.db.org/value3"], [550, 333]] [["end"], ""] The Filter Language EBNF Grammar From 06d6444005e94180fdb57c47700be48272643bb0 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 8 Jun 2023 23:21:34 +0200 Subject: [PATCH 13/60] Apply suggestions from review --- optimade.rst | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/optimade.rst b/optimade.rst index d78d950dc..750ffabae 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3217,7 +3217,9 @@ The dictionary has the following OPTIONAL fields: The default is 0, i.e., the value at the start of the array. - :field:`"stop"` The slice ends at the value with the given index (inclusive). - The default is the last value of the array. + If omitted, the end of the slice is not specified. + If the end of the slice is not specified when used to express the values included in a response, the client has to count the number of items to know the end. + If the slice refers to a requested range of items, to omit :field:`stop` has the same meaning as specifying the last index of the array. - :field:`"step"` The absolute difference in index between two subsequent values that are included in the slice. The default is 1, i.e., every value in the range indicated by :field:`start` and :field:`stop` is included in the slice. @@ -3245,10 +3247,11 @@ The header object MUST contain the key: The header object MAY also contain the key: -- :field:`"returned_range"`: Object. - A `slice object`_ representing the range of data present in the response. - Once the client has encountered an end-of-data--marker (defined below), any data not covered by the encountered slices are to be assigned the value :val:`null`. - If the format is `"dense"` and :field:`returned_range` is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data--marker or next-marker (defined below). +- :field:`"returned_ranges"`: Array of Object. + For dense data and sparse data of one dimensional list properties, the array contains a single element which is a `slice object`_ representing the range of data present in the response. + Once the client has encountered an end-of-data--marker, any data not covered by any of the encountered slices are to be assigned the value :val:`null`. + If the field :field:`"format"` is `"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data--marker or next-marker. + In the specific case of a hierarchy of list properties represented as a sparse multi-dimensional array, if the field :field:`"returned_ranges"` is given, it MUST contain one slice object per dimension of the multi-dimensional array, representing slices for each dimension that cover the data given in the response. The format of data lines of the response (i.e., all lines except the first and the last) depends on whether the header object specifies the format as :val:`"dense"` or :val:`sparse`. @@ -3312,7 +3315,7 @@ An example of an OPTIMADE JSON-API response that contains a link to a partial da An example of a dense response for a partial array data, scalar values: .. code:: json - {"format": "dense", "returned_range": {"start": 10, "stop": 20, "step": 2}} + {"format": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]} 123 345 -12.6 @@ -3321,7 +3324,7 @@ An example of a dense response for a partial array data, scalar values: An example of a dense response for a partial array data, multidimensional array values: .. code:: json - {"format": "dense", "returned_range": {"start": 10, "stop": 20, "step": 2}} + {"format": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]} [[10,20,21], [30,40,50]] [["ref"], "https://example.db.org/value2"] [[11, 110], [["ref"], "https://example.db.org/value3"], [550, 333]] From 3e5fa1658331d32e38f6556d234431cb4d6457ac Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 8 Jun 2023 23:32:31 +0200 Subject: [PATCH 14/60] Delete trailing whitespace --- optimade.rst | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/optimade.rst b/optimade.rst index 750ffabae..468131ea6 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3219,7 +3219,7 @@ The dictionary has the following OPTIONAL fields: The slice ends at the value with the given index (inclusive). If omitted, the end of the slice is not specified. If the end of the slice is not specified when used to express the values included in a response, the client has to count the number of items to know the end. - If the slice refers to a requested range of items, to omit :field:`stop` has the same meaning as specifying the last index of the array. + If the slice refers to a requested range of items, to omit :field:`stop` has the same meaning as specifying the last index of the array. - :field:`"step"` The absolute difference in index between two subsequent values that are included in the slice. The default is 1, i.e., every value in the range indicated by :field:`start` and :field:`stop` is included in the slice. @@ -3306,7 +3306,7 @@ An example of an OPTIMADE JSON-API response that contains a link to a partial da } "property_metadata": { "a": { - + } } } @@ -3339,7 +3339,7 @@ An example of a sparse response for a partial array data with aggregated dimensi [["next"], "https://example.db.org/"] An example of a sparse response for a partial array data with aggregated dimensions, scalar values: - + .. code:: json {"format": "sparse"} [3,5,19, 10] @@ -3347,14 +3347,14 @@ An example of a sparse response for a partial array data with aggregated dimensi [["next"], "https://example.db.org/"] An example of a sparse response for a partial array data with aggregated dimensions, multidimensional array: - + .. code:: json {"format": "sparse"} [3,5,19, [ [10,20,21], [30,40,50] ] [3,7,19, [["ref"], "https://example.db.org/value2"]] [4,5,19, [ [11, 110], [["ref"], "https://example.db.org/value3"], [550, 333]] [["end"], ""] - + The Filter Language EBNF Grammar -------------------------------- @@ -3576,6 +3576,3 @@ The strings below contain Extended Regular Expressions (EREs) to recognize ident #BEGIN ERE strings "([^\"]|\\.)*" #END ERE strings - - - From 7eddd27215bb21adf811e0cbcc1e29fd29ee916d Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Fri, 9 Jun 2023 01:04:13 +0200 Subject: [PATCH 15/60] Fix descriptio of the data -> meta fields in the JSON response format --- optimade.rst | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/optimade.rst b/optimade.rst index 468131ea6..4e738b00f 100644 --- a/optimade.rst +++ b/optimade.rst @@ -593,6 +593,23 @@ Every response SHOULD contain the following fields, and MUST contain at least :f - **data**: The schema of this value varies by endpoint, it can be either a *single* `JSON API resource object `__ or a *list* of JSON API resource objects. Every resource object needs the :field:`type` and :field:`id` fields, and its attributes (described in section `API Endpoints`_) need to be in a dictionary corresponding to the :field:`attributes` field. + The :field:`data` field MAY also contain a :field:`meta` field with the following keys: + + - **property_metadata**: an object containing per-entry and per-property metadata. + The keys are the names of the fields in :field:`attributes` for which metadata is available. + The values belonging to these keys are dictionaries containing the relevant metadata fields. + + - **partial_data_urls**: an object used to list URL:s which can be used to fetch data that has been omitted from the :field:`data` part of the response. + The keys are the names of the fields in :field:`attributes` for which partial data URLs are available. + Each value is a list of items that MUST have the following keys: + + - **format**: String. + A name of the format provided via this URL. + One of the items SHOULD be "json lines", which refers to the format in `OPTIMADE JSON lines partial data format`_. + + - **url**: String. + The URL from which the data can be fetched. + The response MAY also return resources related to the primary data in the field: - **links**: `JSON API links `__ is REQUIRED for implementing pagination. @@ -915,7 +932,8 @@ OPTIONALLY it can also contain the following fields: - **self**: the entry's URL -- **meta**: a `JSON API meta object `__ that contains non-standard meta-information about the object. +- **meta**: a `JSON API meta object `__ that is used to communicate metadata. + See `JSON Response Schema: Common Fields`_ for more information about this field. - **relationships**: a dictionary containing references to other entries according to the description in section `Relationships`_ encoded as `JSON API Relationships `__. The OPTIONAL human-readable description of the relationship MAY be provided in the :field:`description` field inside the :field:`meta` dictionary of the JSON API resource identifier object. @@ -3203,12 +3221,14 @@ Appendices OPTIMADE JSON lines partial data format --------------------------------------- The OPTIMADE JSON lines partial data format is a lightweight format for transmitting property data that are too large to fit in a single OPTIMADE response. -In this case, the usual OPTIMADE response gives the value :val:`null` for the property, and the per-entry metadata specifies a URL that can be used to fetch the missing data in this format. -See `Per-property metadata`_ for information on the per-entry and per-property metadata format. +The format is based on `JSON Lines `__, which allows for streaming handling of large datasets. + +To communicate a property using this format, the usual OPTIMADE response gives the value :val:`null` for the property. +Furthermore, a URL is given which can be used to fetch the missing data. +For responses that use the JSON response format, a subfield :field:`partial_data_urls` of the resource object metadata field, :field:`meta`, is used, see `JSON Response Schema: Common Fields`_. .. _slice object: -The JSON Lines specification is used to transmit long array data in a "streaming JSON" way [... ref ...]. To aid the definition of the "json lines" format below, we first define a "slice object" to be a JSON object describing slices of arrays. The dictionary has the following OPTIONAL fields: From 3e1f04ca4743782629f6572871560d0bd1ffbe4a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Saulius=20Gra=C5=BEulis?= Date: Fri, 9 Jun 2023 10:11:07 +0300 Subject: [PATCH 16/60] Fixing the "next" link definition. --- optimade.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index 4e738b00f..f6ea61589 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3249,7 +3249,7 @@ Furthermore, we also define the following special markers: - The "end-of-data--marker" is this exact JSON: :val:`[["end"], ""]`. - A "reference-marker" is this exact JSON: :val:`[["ref"], "URL"]`, where :val:`"URL"` is to be replaced with a URL being referenced. -- A "next-marker" is this exact JSON: :val:`[["end"], "URL"]`, where :val:`"URL"` is to be replaced with the target URL for the next link. +- A "next-marker" is this exact JSON: :val:`[["next"], "URL"]`, where :val:`"URL"` is to be replaced with the target URL for the next link. These JSON markers have been deliberately designed as lists with items of mixed data types, and thus cannot be encountered inside the actual data of an OPTIMADE property. From 05eacc224f114e9be4826bfdb7b2b5ccfb33e95f Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Sat, 10 Jun 2023 23:10:11 +0200 Subject: [PATCH 17/60] Update optimade.rst MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Saulius Gražulis --- optimade.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/optimade.rst b/optimade.rst index f6ea61589..6e706dc29 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3248,6 +3248,7 @@ The dictionary has the following OPTIONAL fields: Furthermore, we also define the following special markers: - The "end-of-data--marker" is this exact JSON: :val:`[["end"], ""]`. +The "end-of-data--marker" marker is chosen so that it is a valid JSON object but *not* a valid OPTIMADE value (an OPTIMADE object may not contain values of different types in a list as of v 1.1), which make sure that a valid value will never be misinterpreted as the "end..." marker. - A "reference-marker" is this exact JSON: :val:`[["ref"], "URL"]`, where :val:`"URL"` is to be replaced with a URL being referenced. - A "next-marker" is this exact JSON: :val:`[["next"], "URL"]`, where :val:`"URL"` is to be replaced with the target URL for the next link. From beaaeef2fe4ba162bd87bf1eff85e2b302431827 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Sat, 10 Jun 2023 23:33:15 +0200 Subject: [PATCH 18/60] Apply suggestions from review Co-authored-by: Giovanni Pizzi --- optimade.rst | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/optimade.rst b/optimade.rst index 6e706dc29..3e4d77e30 100644 --- a/optimade.rst +++ b/optimade.rst @@ -599,7 +599,7 @@ Every response SHOULD contain the following fields, and MUST contain at least :f The keys are the names of the fields in :field:`attributes` for which metadata is available. The values belonging to these keys are dictionaries containing the relevant metadata fields. - - **partial_data_urls**: an object used to list URL:s which can be used to fetch data that has been omitted from the :field:`data` part of the response. + - **partial_data_urls**: an object used to list URLs which can be used to fetch data that has been omitted from the :field:`data` part of the response. The keys are the names of the fields in :field:`attributes` for which partial data URLs are available. Each value is a list of items that MUST have the following keys: @@ -3243,7 +3243,9 @@ The dictionary has the following OPTIONAL fields: - :field:`"step"` The absolute difference in index between two subsequent values that are included in the slice. The default is 1, i.e., every value in the range indicated by :field:`start` and :field:`stop` is included in the slice. - For example, a value of 2 denotes a slice of every second value in the array. + Hence, a value of 2 denotes a slice of every second value in the array. + +For example, for the array `["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]` a slice that specifies start=1, end=7, and step=3 refers to the items `["b", "e", "h"]`. Furthermore, we also define the following special markers: @@ -3269,7 +3271,7 @@ The header object MUST contain the key: The header object MAY also contain the key: - :field:`"returned_ranges"`: Array of Object. - For dense data and sparse data of one dimensional list properties, the array contains a single element which is a `slice object`_ representing the range of data present in the response. + For dense data, and sparse data of one dimensional list properties, the array contains a single element which is a `slice object`_ representing the range of data present in the response. Once the client has encountered an end-of-data--marker, any data not covered by any of the encountered slices are to be assigned the value :val:`null`. If the field :field:`"format"` is `"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data--marker or next-marker. In the specific case of a hierarchy of list properties represented as a sparse multi-dimensional array, if the field :field:`"returned_ranges"` is given, it MUST contain one slice object per dimension of the multi-dimensional array, representing slices for each dimension that cover the data given in the response. @@ -3282,15 +3284,15 @@ The format of data lines of the response (i.e., all lines except the first and t - **Sparse format for one-dimensional list:** When the response sparsely communicates items for a one-dimensional OPTIMADE list property, each data line contains a JSON array on the format: - - The first item is the index of the item provided. - - The second item is a JSON representation of the item, on the same format as the lines in the dense format. - In the same way as for the dense format, reference-markers are allowed for data that does not fit in the response. + - The first item is the (zero-based) index of the item provided. + - The second item is a JSON representation of the item, with the same format as the lines in the dense format. + In the same way as for the dense format, reference-markers are allowed for data that does not fit in the response (see example below). -- **Sparse format for multi-dimensional lists:** Specifically for the case that the OPTIMADE property represents a series of directly hierarchically embedded lists, the server MAY represent them using a sparse multi-dimensional format. +- **Sparse format for multi-dimensional lists:** We provide a sparse format specifically for the case that the OPTIMADE property represents a series of directly hierarchically embedded lists (i.e., a multidimensional sparse array). Then, the server MAY represent them using the following sparse multi-dimensional format. In this case, each data line contains a JSON array in the format of: - - All items except the last item are coordinates providing indices in the embedded dimensions in the order of outermost to innermost. - - The last item is a JSON representation of the item at those coordinates, on the same format as the lines in the dense format. + - All items except the last item are integer zero-based indices of the value being provided in this line; these indices refer to the embedded dimensions in the order of outermost to innermost. + - The last item is a JSON representation of the item at those coordinates, with the same format as the lines in the dense format. In the same way as for the dense format, reference-markers are allowed for data that does not fit in the response. @@ -3333,7 +3335,7 @@ An example of an OPTIMADE JSON-API response that contains a link to a partial da } } -An example of a dense response for a partial array data, scalar values: +An example of a dense response for a partial array data, scalar values. The request returns the first three items and provides the next-marker link to continue fetching data: .. code:: json {"format": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]} @@ -3342,7 +3344,7 @@ An example of a dense response for a partial array data, scalar values: -12.6 [["next"], "https://example.db.org/value4"] -An example of a dense response for a partial array data, multidimensional array values: +An example of a dense response for a partial array data, multidimensional array values. Item with index 10 in the original list (the first one provided in the response since start=10) is provided explicitly in the response. The item with index 12 in the list (the second provided, since start=10 and step=2) is not provided and only referenced. The third provided item (index 14 in the original list) is only partially returned: it is a list of three items, the first and last ar explicitly provided, the second one is only referenced. .. code:: json {"format": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]} From 50c355eb9c9f19cb4787d31bffca31d18c0160d2 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Sun, 11 Jun 2023 00:58:27 +0200 Subject: [PATCH 19/60] Update based on review --- .words.lst | 4 +- optimade.rst | 347 +++++++++++++++++++++++++++------------------------ 2 files changed, 184 insertions(+), 167 deletions(-) diff --git a/.words.lst b/.words.lst index 9afb8515d..6cc667b7a 100644 --- a/.words.lst +++ b/.words.lst @@ -1,4 +1,4 @@ -personal_ws-1.1 en 209 +personal_ws-1.1 en 209 ABNF ACM Aa @@ -207,4 +207,4 @@ xy yacc zeo zeolites -Ã¥ngström +ångström diff --git a/optimade.rst b/optimade.rst index 3e4d77e30..538e04eca 100644 --- a/optimade.rst +++ b/optimade.rst @@ -442,6 +442,54 @@ For example, the following query can be sent to API implementations `exmpl1` and :filter:`filter=_exmpl1_band_gap<2.0 OR _exmpl2_band_gap<2.5` +Transmission of large property values +------------------------------------- + +A property value may be too large to fit in a single response. +OPTIMADE provides a mechanism for a client to handle such properties by fetching them in separate series of requests. + +In this case, the response to the initial query gives the value :val:`null` for the property. +A list of one or more data URLs together with their respective partial data formats are given in the response. +How this list is provided is response format-dependent. +For the JSON response format, see the description of the :field:`partial_data_urls` field inside :field:`meta` inside :field:`data` in the section `JSON Response Schema: Common Fields`_. + +The default partial data format is named "jsonlines" and is described in the Appendix `OPTIMADE JSON lines partial data format`_. +An implementation SHOULD always include this format as one of alternative partial data formats provided for a property that has been omitted from the response to the initial query. +Implementations MAY provide links to their own non-standard formats, but non-standard format names MUST be prefixed by a database-provider-specific prefix. + +Below follows an example of the data and meta parts in a response using the JSON response format that communicates that the property value has been omitted from the response, with three different URLs for different partial data formats provided. + +.. code:: jsonc + { + // ... + "data": { + "type": "structures", + "id": "2345678", + "attributes": { + "a": null + } + } + "meta": { + "partial_data_urls": { + "a": [ + { + "format": "jsonlines", + "url": "https://example.org/optimade/v1.2/extensions/partial_data/structures/2345678/a/default_format" + }, + { + "format": "_exmpl_bzip2_jsonlines", + "url": "https://db.example.org/assets/partial_values/structures/2345678/a/bzip2_format" + }, + { + "format": "_exmpl_hdf5", + "url": "https://cloud.example.org/ACCHSORJGIHWOSJZG" + } + ] + } + } + // ... + } + Responses ========= @@ -595,20 +643,19 @@ Every response SHOULD contain the following fields, and MUST contain at least :f The :field:`data` field MAY also contain a :field:`meta` field with the following keys: - - **property_metadata**: an object containing per-entry and per-property metadata. - The keys are the names of the fields in :field:`attributes` for which metadata is available. - The values belonging to these keys are dictionaries containing the relevant metadata fields. - - **partial_data_urls**: an object used to list URLs which can be used to fetch data that has been omitted from the :field:`data` part of the response. The keys are the names of the fields in :field:`attributes` for which partial data URLs are available. Each value is a list of items that MUST have the following keys: - **format**: String. A name of the format provided via this URL. - One of the items SHOULD be "json lines", which refers to the format in `OPTIMADE JSON lines partial data format`_. + One of the items SHOULD be "jsonlines", which refers to the format in `OPTIMADE JSON lines partial data format`_. - **url**: String. The URL from which the data can be fetched. + There is no requirement on the syntax or format of the URL. + + For more information about the mechanism to transmit large property values, including an example of the format of :field:`partial_data_urls`, see `Transmission of large property values`_. The response MAY also return resources related to the primary data in the field: @@ -3218,166 +3265,6 @@ Relationships with files may be used to relate an entry with any number of :entr Appendices ========== -OPTIMADE JSON lines partial data format ---------------------------------------- -The OPTIMADE JSON lines partial data format is a lightweight format for transmitting property data that are too large to fit in a single OPTIMADE response. -The format is based on `JSON Lines `__, which allows for streaming handling of large datasets. - -To communicate a property using this format, the usual OPTIMADE response gives the value :val:`null` for the property. -Furthermore, a URL is given which can be used to fetch the missing data. -For responses that use the JSON response format, a subfield :field:`partial_data_urls` of the resource object metadata field, :field:`meta`, is used, see `JSON Response Schema: Common Fields`_. - -.. _slice object: - -To aid the definition of the "json lines" format below, we first define a "slice object" to be a JSON object describing slices of arrays. -The dictionary has the following OPTIONAL fields: - -- :field:`"start"`: Integer. - The slice starts at the value with the given index (inclusive). - The default is 0, i.e., the value at the start of the array. -- :field:`"stop"` - The slice ends at the value with the given index (inclusive). - If omitted, the end of the slice is not specified. - If the end of the slice is not specified when used to express the values included in a response, the client has to count the number of items to know the end. - If the slice refers to a requested range of items, to omit :field:`stop` has the same meaning as specifying the last index of the array. -- :field:`"step"` - The absolute difference in index between two subsequent values that are included in the slice. - The default is 1, i.e., every value in the range indicated by :field:`start` and :field:`stop` is included in the slice. - Hence, a value of 2 denotes a slice of every second value in the array. - -For example, for the array `["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]` a slice that specifies start=1, end=7, and step=3 refers to the items `["b", "e", "h"]`. - -Furthermore, we also define the following special markers: - -- The "end-of-data--marker" is this exact JSON: :val:`[["end"], ""]`. -The "end-of-data--marker" marker is chosen so that it is a valid JSON object but *not* a valid OPTIMADE value (an OPTIMADE object may not contain values of different types in a list as of v 1.1), which make sure that a valid value will never be misinterpreted as the "end..." marker. -- A "reference-marker" is this exact JSON: :val:`[["ref"], "URL"]`, where :val:`"URL"` is to be replaced with a URL being referenced. -- A "next-marker" is this exact JSON: :val:`[["next"], "URL"]`, where :val:`"URL"` is to be replaced with the target URL for the next link. - -These JSON markers have been deliberately designed as lists with items of mixed data types, and thus cannot be encountered inside the actual data of an OPTIMADE property. - -The full response MUST be valid `json lines `__ that adheres to the format: - -- The first line is a header object (defined below) -- The following lines are data lines adhering to the formats described below. -- The final line is either an end-of-data--marker (indicating that there is no more data to be given), or a next-marker indicating that more data is available, which can be obtained by retrieving data from the provided URL. - -The first line MUST be a JSON object providing header information. -The header object MUST contain the key: - -- :field:`"format"`: String. - A string either equal to :val:`"dense"` or :val:`"sparse"` to indicate whether the returned format is dense or sparse. - -The header object MAY also contain the key: - -- :field:`"returned_ranges"`: Array of Object. - For dense data, and sparse data of one dimensional list properties, the array contains a single element which is a `slice object`_ representing the range of data present in the response. - Once the client has encountered an end-of-data--marker, any data not covered by any of the encountered slices are to be assigned the value :val:`null`. - If the field :field:`"format"` is `"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data--marker or next-marker. - In the specific case of a hierarchy of list properties represented as a sparse multi-dimensional array, if the field :field:`"returned_ranges"` is given, it MUST contain one slice object per dimension of the multi-dimensional array, representing slices for each dimension that cover the data given in the response. - -The format of data lines of the response (i.e., all lines except the first and the last) depends on whether the header object specifies the format as :val:`"dense"` or :val:`sparse`. - -- **Dense format:** In the dense partial data format, each data line reproduces one list item in the OPTIMADE list property being transmitted in JSON format. - If OPTIMADE list properties are embedded inside the item, they can either be included in full or replaced with a reference-marker. - If a list is replaced by a reference marker, the client MAY use the provided URL to obtain the list items, which is then also provided in the JSON lines partial data format. - -- **Sparse format for one-dimensional list:** When the response sparsely communicates items for a one-dimensional OPTIMADE list property, each data line contains a JSON array on the format: - - - The first item is the (zero-based) index of the item provided. - - The second item is a JSON representation of the item, with the same format as the lines in the dense format. - In the same way as for the dense format, reference-markers are allowed for data that does not fit in the response (see example below). - -- **Sparse format for multi-dimensional lists:** We provide a sparse format specifically for the case that the OPTIMADE property represents a series of directly hierarchically embedded lists (i.e., a multidimensional sparse array). Then, the server MAY represent them using the following sparse multi-dimensional format. - In this case, each data line contains a JSON array in the format of: - - - All items except the last item are integer zero-based indices of the value being provided in this line; these indices refer to the embedded dimensions in the order of outermost to innermost. - - The last item is a JSON representation of the item at those coordinates, with the same format as the lines in the dense format. - In the same way as for the dense format, reference-markers are allowed for data that does not fit in the response. - - -Examples --------- - -An example of an OPTIMADE JSON-API response that contains a link to a partial data protocol URL: - -.. code:: json - { - "data": { - "type": "structures", - "id": "2345678", - "attributes": { - "a": null - } - } - "meta": { - "partial_data_urls": { - "a": [ - { - "format": "plain-jsonlines", - "url": "https://example.db.org/assets/partial_values/structures/2345678/a/default_format" - }, - { - "format": "bzip2-jsonlines", - "url": "https://example.db.org/assets/partial_values/structures/2345678/a/bzip2_format" - }, - { - "format": "hdf5", - "url": "https://example.db.org/assets/partial_values/structures/2345678/a/hdf5" - } - ] - } - "property_metadata": { - "a": { - - } - } - } - } - -An example of a dense response for a partial array data, scalar values. The request returns the first three items and provides the next-marker link to continue fetching data: - -.. code:: json - {"format": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]} - 123 - 345 - -12.6 - [["next"], "https://example.db.org/value4"] - -An example of a dense response for a partial array data, multidimensional array values. Item with index 10 in the original list (the first one provided in the response since start=10) is provided explicitly in the response. The item with index 12 in the list (the second provided, since start=10 and step=2) is not provided and only referenced. The third provided item (index 14 in the original list) is only partially returned: it is a list of three items, the first and last ar explicitly provided, the second one is only referenced. - -.. code:: json - {"format": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]} - [[10,20,21], [30,40,50]] - [["ref"], "https://example.db.org/value2"] - [[11, 110], [["ref"], "https://example.db.org/value3"], [550, 333]] - [["next"], "https://example.db.org/value4"] - -An example of a sparse response for a partial array data with aggregated dimensions, single dimension array: - -.. code:: json - {"format": "sparse"} - [3,5,19, [10,20,21,30]] - [30,15,9, [["ref"], "https://example.db.org/value1"]] - [["next"], "https://example.db.org/"] - -An example of a sparse response for a partial array data with aggregated dimensions, scalar values: - -.. code:: json - {"format": "sparse"} - [3,5,19, 10] - [30,15,9, 31] - [["next"], "https://example.db.org/"] - -An example of a sparse response for a partial array data with aggregated dimensions, multidimensional array: - -.. code:: json - {"format": "sparse"} - [3,5,19, [ [10,20,21], [30,40,50] ] - [3,7,19, [["ref"], "https://example.db.org/value2"]] - [4,5,19, [ [11, 110], [["ref"], "https://example.db.org/value3"], [550, 333]] - [["end"], ""] - The Filter Language EBNF Grammar -------------------------------- @@ -3599,3 +3486,133 @@ The strings below contain Extended Regular Expressions (EREs) to recognize ident #BEGIN ERE strings "([^\"]|\\.)*" #END ERE strings + +OPTIMADE JSON lines partial data format +--------------------------------------- +The OPTIMADE JSON lines partial data format is a lightweight format for transmitting property data that are too large to fit in a single OPTIMADE response. +The format is based on `JSON Lines `__, which allows for streaming handling of large datasets. +Note: since the below definition references both JSON fields and OPTIMADE properties, the data type names depend on context: for JSON they are, e.g., "array" and "object" and for OPTIMADE properties they are, e.g., "list" and "dictionary". + +.. _slice object: + +To aid the definition of the format below, we first define a "slice object" to be a JSON object describing slices of arrays. +The dictionary has the following OPTIONAL fields: + +- :field:`"start"`: Integer. + The slice starts at the value with the given index (inclusive). + The default is 0, i.e., the value at the start of the array. +- :field:`"stop"`: Integer. + The slice ends at the value with the given index (inclusive). + If omitted, the end of the slice is not specified. + If the slice is used to express the values included in a response and :field:`stop` is omitted, the client has to count the number of items to know the end. + If the slice is used to request a range of items, to omit :field:`stop` has the same meaning as specifying the last index of the array. +- :field:`"step"`: Integer. + The absolute difference in index between two subsequent values that are included in the slice. + The default is 1, i.e., every value in the range indicated by :field:`start` and :field:`stop` is included in the slice. + Hence, a value of 2 denotes a slice of every second value in the array. + +For example, for the array `["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]` the slice object `{"start":1, "end":7, "step": 3}` refers to the items `["b", "e", "h"]`. + +Furthermore, we also define the following special markers: + +- The "end-of-data-marker" is this exact JSON: :val:`[["end"], ""]`. +- A "reference-marker" is this exact JSON: :val:`[["ref"], "URL"]`, where :val:`"URL"` is to be replaced with a URL being referenced. +- A "next-marker" is this exact JSON: :val:`[["next"], "URL"]`, where :val:`"URL"` is to be replaced with the target URL for the next link. + +There is no requirement on the syntax or format of the URLs provided in these markers. +The data provided via the URLs MUST be the JSON lines partial data format, i.e., the markers cannot be used to link to partial data provided in other formats. +The markers have been deliberately designed to be valid JSON objects but *not* valid OPTIMADE property values. +Since the OPTIMADE list data type is defined as list of values of the same data type or :val:`null`, the above markers cannot be encountered inside the actual data of an OPTIMADE property. + +The full response MUST be valid `JSON Lines `__ that adheres to the following format: + +- The first line is a header object (defined below) +- The following lines are data lines adhering to the formats described below. +- The final line is either an end-of-data-marker (indicating that there is no more data to be given), or a next-marker indicating that more data is available, which can be obtained by retrieving data from the provided URL. + +The first line MUST be a JSON object providing header information. +The header object MUST contain the key: + +- :field:`"format"`: String. + A string either equal to :val:`"dense"` or :val:`"sparse"` to indicate whether the returned format is dense or sparse. + +The header object MAY also contain the key: + +- :field:`"returned_ranges"`: Array of Object. + For dense data, and sparse data of one dimensional list properties, the array contains a single element which is a `slice object`_ representing the range of data present in the response. + Once the client has encountered an end-of-data-marker, any data not covered by any of the encountered slices are to be assigned the value :val:`null`. + If the field :field:`"format"` is `"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data--marker or next-marker. + In the specific case of a hierarchy of list properties represented as a sparse multi-dimensional array, if the field :field:`"returned_ranges"` is given, it MUST contain one slice object per dimension of the multi-dimensional array, representing slices for each dimension that cover the data given in the response. + +The format of data lines of the response (i.e., all lines except the first and the last) depends on whether the header object specifies the format as :val:`"dense"` or :val:`sparse`. + +- **Dense format:** In the dense partial data format, each data line reproduces one list item in the OPTIMADE list property being transmitted in JSON format. + If OPTIMADE list properties are embedded inside the item, they can either be included in full or replaced with a reference-marker. + If a list is replaced by a reference marker, the client MAY use the provided URL to obtain the list items. + +- **Sparse format for one-dimensional list:** When the response sparsely communicates items for a one-dimensional OPTIMADE list property, each data line contains a JSON array on the format: + + - The first item is the zero-based index of the item provided. + - The second item is a JSON representation of the item, with the same format as the lines in the dense format. + In the same way as for the dense format, reference-markers are allowed for data that does not fit in the response (see example below). + +- **Sparse format for multi-dimensional lists:** We provide a sparse format specifically for the case that the OPTIMADE property represents a series of directly hierarchically embedded lists (i.e., a multidimensional sparse array). + Then, the server MAY represent them using the following sparse multi-dimensional format for a number of aggregated dimensions. + In this case, each data line contains a JSON array in the format of: + + - All items except the last item are integer zero-based indices of the value being provided in this line; these indices refer to the aggregated dimensions in the order of outermost to innermost. + - The last item is a JSON representation of the item at those coordinates, with the same format as the lines in the dense format. + In the same way as for the dense format, reference-markers are allowed for data that does not fit in the response. + +Examples +-------- + +Below follows an example of a dense response for a partial array data of integer values. +The request returns the first three items and provides the next-marker link to continue fetching data: + +.. code:: json + {"format": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]} + 123 + 345 + -12.6 + [["next"], "https://example.db.org/value4"] + +Below follows an example of a dense response for a list property as a partial array of multidimensional array values. +The item with index 10 in the original list is provided explicitly in the response and is the first one provided in the response since start=10. +The item with index 12 in the list, the second data item provided since start=10 and step=2, is not included only referenced. +The third provided item (index 14 in the original list) is only partially returned: it is a list of three items, the first and last are explicitly provided, the second one is only referenced. + +.. code:: json + {"format": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]} + [[10,20,21], [30,40,50]] + [["ref"], "https://example.db.org/value2"] + [[11, 110], [["ref"], "https://example.db.org/value3"], [550, 333]] + [["next"], "https://example.db.org/value4"] + +Below follows an example of the sparse format for multi-dimensional lists with three aggregated dimensions. +The underlying property value can be taken to be sparse data in lists in four dimensions of 10000 x 10000 x 10000 x N, where the innermost list is a non-sparse list of abitrary length of numbers. +The only non-null items in the outer three dimensions are, say, [3,5,19], [30,15,9], and [42,54,17]. +The response below communicates the first item explicitly; the second one by defering the innermost list using a reference-marker; and the third item is not included in this response, but defered to another page via a next-marker. + +.. code:: json + {"format": "sparse"} + [3,5,19, [10,20,21,30]] + [30,15,9, [["ref"], "https://example.db.org/value1"]] + [["next"], "https://example.db.org/"] + +An example of the sparse format for multi-dimensional lists with three aggregated dimensions and integer values: + +.. code:: json + {"format": "sparse"} + [3,5,19, 10] + [30,15,9, 31] + [["next"], "https://example.db.org/"] + +An example of the sparse format for multi-dimensional lists with three aggregated dimensions and values that are multidimensional lists of integers of arbitrary lengths: + +.. code:: json + {"format": "sparse"} + [3,5,19, [ [10,20,21], [30,40,50] ] + [3,7,19, [["ref"], "https://example.db.org/value2"]] + [4,5,19, [ [11, 110], [["ref"], "https://example.db.org/value3"], [550, 333]] + [["end"], ""] From 7a92260d9e08954723a4f3298790ff6e9c7433b6 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Sun, 11 Jun 2023 14:02:59 +0200 Subject: [PATCH 20/60] Revert unneseccary change to .words.lst --- .words.lst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.words.lst b/.words.lst index 6cc667b7a..a058cd2d3 100644 --- a/.words.lst +++ b/.words.lst @@ -1,4 +1,4 @@ -personal_ws-1.1 en 209 +personal_ws-1.1 en 205 ABNF ACM Aa @@ -207,4 +207,4 @@ xy yacc zeo zeolites -ångström +Ã¥ngström From 8f4db09c2b1a62523c36586a5ecac93a540386b1 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Mon, 12 Jun 2023 16:37:56 +0200 Subject: [PATCH 21/60] Apply suggestions from review Co-authored-by: Andrius Merkys Co-authored-by: Giovanni Pizzi --- optimade.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/optimade.rst b/optimade.rst index 538e04eca..4cae2cd88 100644 --- a/optimade.rst +++ b/optimade.rst @@ -641,7 +641,7 @@ Every response SHOULD contain the following fields, and MUST contain at least :f - **data**: The schema of this value varies by endpoint, it can be either a *single* `JSON API resource object `__ or a *list* of JSON API resource objects. Every resource object needs the :field:`type` and :field:`id` fields, and its attributes (described in section `API Endpoints`_) need to be in a dictionary corresponding to the :field:`attributes` field. - The :field:`data` field MAY also contain a :field:`meta` field with the following keys: + Every resource object MAY also contain a :field:`meta` field with the following keys: - **partial_data_urls**: an object used to list URLs which can be used to fetch data that has been omitted from the :field:`data` part of the response. The keys are the names of the fields in :field:`attributes` for which partial data URLs are available. @@ -3541,10 +3541,10 @@ The header object MAY also contain the key: - :field:`"returned_ranges"`: Array of Object. For dense data, and sparse data of one dimensional list properties, the array contains a single element which is a `slice object`_ representing the range of data present in the response. Once the client has encountered an end-of-data-marker, any data not covered by any of the encountered slices are to be assigned the value :val:`null`. - If the field :field:`"format"` is `"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data--marker or next-marker. + If the field :field:`"format"` is `"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data-marker or next-marker. In the specific case of a hierarchy of list properties represented as a sparse multi-dimensional array, if the field :field:`"returned_ranges"` is given, it MUST contain one slice object per dimension of the multi-dimensional array, representing slices for each dimension that cover the data given in the response. -The format of data lines of the response (i.e., all lines except the first and the last) depends on whether the header object specifies the format as :val:`"dense"` or :val:`sparse`. +The format of data lines of the response (i.e., all lines except the first and the last) depends on whether the header object specifies the format as :val:`"dense"` or :val:`"sparse"`. - **Dense format:** In the dense partial data format, each data line reproduces one list item in the OPTIMADE list property being transmitted in JSON format. If OPTIMADE list properties are embedded inside the item, they can either be included in full or replaced with a reference-marker. From 16d60f6d4e817242058f5ce4db7ca5ab1bf0f7e1 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Mon, 12 Jun 2023 16:43:47 +0200 Subject: [PATCH 22/60] Slightly change the format of the markers --- optimade.rst | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/optimade.rst b/optimade.rst index 4cae2cd88..08498a123 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3515,9 +3515,9 @@ For example, for the array `["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]` Furthermore, we also define the following special markers: -- The "end-of-data-marker" is this exact JSON: :val:`[["end"], ""]`. -- A "reference-marker" is this exact JSON: :val:`[["ref"], "URL"]`, where :val:`"URL"` is to be replaced with a URL being referenced. -- A "next-marker" is this exact JSON: :val:`[["next"], "URL"]`, where :val:`"URL"` is to be replaced with the target URL for the next link. +- The "end-of-data-marker" is this exact JSON: :val:`["end", [""]]`. +- A "reference-marker" is this exact JSON: :val:`["ref", ["URL"]]`, where :val:`"URL"` is to be replaced with a URL being referenced. +- A "next-marker" is this exact JSON: :val:`["next", ["URL"]]`, where :val:`"URL"` is to be replaced with the target URL for the next link. There is no requirement on the syntax or format of the URLs provided in these markers. The data provided via the URLs MUST be the JSON lines partial data format, i.e., the markers cannot be used to link to partial data provided in other formats. @@ -3575,7 +3575,7 @@ The request returns the first three items and provides the next-marker link to c 123 345 -12.6 - [["next"], "https://example.db.org/value4"] + ["next", ["https://example.db.org/value4"]] Below follows an example of a dense response for a list property as a partial array of multidimensional array values. The item with index 10 in the original list is provided explicitly in the response and is the first one provided in the response since start=10. @@ -3585,9 +3585,9 @@ The third provided item (index 14 in the original list) is only partially return .. code:: json {"format": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]} [[10,20,21], [30,40,50]] - [["ref"], "https://example.db.org/value2"] - [[11, 110], [["ref"], "https://example.db.org/value3"], [550, 333]] - [["next"], "https://example.db.org/value4"] + ["ref", ["https://example.db.org/value2"]] + [[11, 110], ["ref", ["https://example.db.org/value3"]], [550, 333]] + ["next", ["https://example.db.org/value4"]] Below follows an example of the sparse format for multi-dimensional lists with three aggregated dimensions. The underlying property value can be taken to be sparse data in lists in four dimensions of 10000 x 10000 x 10000 x N, where the innermost list is a non-sparse list of abitrary length of numbers. @@ -3597,8 +3597,8 @@ The response below communicates the first item explicitly; the second one by def .. code:: json {"format": "sparse"} [3,5,19, [10,20,21,30]] - [30,15,9, [["ref"], "https://example.db.org/value1"]] - [["next"], "https://example.db.org/"] + [30,15,9, ["ref", ["https://example.db.org/value1"]]] + ["next", ["https://example.db.org/"]] An example of the sparse format for multi-dimensional lists with three aggregated dimensions and integer values: @@ -3606,13 +3606,13 @@ An example of the sparse format for multi-dimensional lists with three aggregate {"format": "sparse"} [3,5,19, 10] [30,15,9, 31] - [["next"], "https://example.db.org/"] + ["next", ["https://example.db.org/"]] An example of the sparse format for multi-dimensional lists with three aggregated dimensions and values that are multidimensional lists of integers of arbitrary lengths: .. code:: json {"format": "sparse"} [3,5,19, [ [10,20,21], [30,40,50] ] - [3,7,19, [["ref"], "https://example.db.org/value2"]] - [4,5,19, [ [11, 110], [["ref"], "https://example.db.org/value3"], [550, 333]] - [["end"], ""] + [3,7,19, ["ref", ["https://example.db.org/value2"]]] + [4,5,19, [ [11, 110], ["ref", ["https://example.db.org/value3"]], [550, 333]] + ["end", [""]] From e109706ec9f49bd7a1c525e49009186293798944 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Tue, 13 Jun 2023 01:38:04 +0200 Subject: [PATCH 23/60] Improve clarity for when number of lines does not match response_range --- optimade.rst | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/optimade.rst b/optimade.rst index 08498a123..24814776c 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3503,9 +3503,7 @@ The dictionary has the following OPTIONAL fields: The default is 0, i.e., the value at the start of the array. - :field:`"stop"`: Integer. The slice ends at the value with the given index (inclusive). - If omitted, the end of the slice is not specified. - If the slice is used to express the values included in a response and :field:`stop` is omitted, the client has to count the number of items to know the end. - If the slice is used to request a range of items, to omit :field:`stop` has the same meaning as specifying the last index of the array. + If omitted, the end of the slice is the last index of the array. - :field:`"step"`: Integer. The absolute difference in index between two subsequent values that are included in the slice. The default is 1, i.e., every value in the range indicated by :field:`start` and :field:`stop` is included in the slice. @@ -3541,7 +3539,8 @@ The header object MAY also contain the key: - :field:`"returned_ranges"`: Array of Object. For dense data, and sparse data of one dimensional list properties, the array contains a single element which is a `slice object`_ representing the range of data present in the response. Once the client has encountered an end-of-data-marker, any data not covered by any of the encountered slices are to be assigned the value :val:`null`. - If the field :field:`"format"` is `"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data-marker or next-marker. + If the field :field:`"format"` is `"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data-marker or next-marker. + If :field:`"returned_ranges"` is included and the client encounters a next or end-of-data-marker before receiving all lines indicated by the slice, it should proceed by not assigning any values to those items, i.e., this is not an error. In the specific case of a hierarchy of list properties represented as a sparse multi-dimensional array, if the field :field:`"returned_ranges"` is given, it MUST contain one slice object per dimension of the multi-dimensional array, representing slices for each dimension that cover the data given in the response. The format of data lines of the response (i.e., all lines except the first and the last) depends on whether the header object specifies the format as :val:`"dense"` or :val:`"sparse"`. From 34bdf2a2e0ac9d4c446740928be12b827c85dc16 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Tue, 13 Jun 2023 01:50:27 +0200 Subject: [PATCH 24/60] Remove trailing whitespace --- optimade.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index 24814776c..6d32aaf16 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3539,7 +3539,7 @@ The header object MAY also contain the key: - :field:`"returned_ranges"`: Array of Object. For dense data, and sparse data of one dimensional list properties, the array contains a single element which is a `slice object`_ representing the range of data present in the response. Once the client has encountered an end-of-data-marker, any data not covered by any of the encountered slices are to be assigned the value :val:`null`. - If the field :field:`"format"` is `"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data-marker or next-marker. + If the field :field:`"format"` is `"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data-marker or next-marker. If :field:`"returned_ranges"` is included and the client encounters a next or end-of-data-marker before receiving all lines indicated by the slice, it should proceed by not assigning any values to those items, i.e., this is not an error. In the specific case of a hierarchy of list properties represented as a sparse multi-dimensional array, if the field :field:`"returned_ranges"` is given, it MUST contain one slice object per dimension of the multi-dimensional array, representing slices for each dimension that cover the data given in the response. From 961f5b784f8e3578910f2f65cb104678d341db6d Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Wed, 14 Jun 2023 17:19:36 +0200 Subject: [PATCH 25/60] Apply suggestions from review Co-authored-by: Antanas Vaitkus --- optimade.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/optimade.rst b/optimade.rst index 6d32aaf16..cf5ec18a5 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3524,7 +3524,7 @@ Since the OPTIMADE list data type is defined as list of values of the same data The full response MUST be valid `JSON Lines `__ that adheres to the following format: -- The first line is a header object (defined below) +- The first line is a header object (defined below). - The following lines are data lines adhering to the formats described below. - The final line is either an end-of-data-marker (indicating that there is no more data to be given), or a next-marker indicating that more data is available, which can be obtained by retrieving data from the provided URL. @@ -3591,7 +3591,7 @@ The third provided item (index 14 in the original list) is only partially return Below follows an example of the sparse format for multi-dimensional lists with three aggregated dimensions. The underlying property value can be taken to be sparse data in lists in four dimensions of 10000 x 10000 x 10000 x N, where the innermost list is a non-sparse list of abitrary length of numbers. The only non-null items in the outer three dimensions are, say, [3,5,19], [30,15,9], and [42,54,17]. -The response below communicates the first item explicitly; the second one by defering the innermost list using a reference-marker; and the third item is not included in this response, but defered to another page via a next-marker. +The response below communicates the first item explicitly; the second one by deferring the innermost list using a reference-marker; and the third item is not included in this response, but deferred to another page via a next-marker. .. code:: json {"format": "sparse"} From 874bd52ba1e2ba2458d4f0ae87ba60e75231c29a Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 02:25:16 +0200 Subject: [PATCH 26/60] Apply suggestions from review Co-authored-by: Matthew Evans <7916000+ml-evs@users.noreply.github.com> --- optimade.rst | 58 ++++++++++++++++++++++++++-------------------------- 1 file changed, 29 insertions(+), 29 deletions(-) diff --git a/optimade.rst b/optimade.rst index cf5ec18a5..467041a48 100644 --- a/optimade.rst +++ b/optimade.rst @@ -451,43 +451,43 @@ OPTIMADE provides a mechanism for a client to handle such properties by fetching In this case, the response to the initial query gives the value :val:`null` for the property. A list of one or more data URLs together with their respective partial data formats are given in the response. How this list is provided is response format-dependent. -For the JSON response format, see the description of the :field:`partial_data_urls` field inside :field:`meta` inside :field:`data` in the section `JSON Response Schema: Common Fields`_. +For the JSON response format, see the description of the :field:`partial_data_urls` field, nested under :field:`data` and then :field:`meta`, in the section `JSON Response Schema: Common Fields`_. The default partial data format is named "jsonlines" and is described in the Appendix `OPTIMADE JSON lines partial data format`_. An implementation SHOULD always include this format as one of alternative partial data formats provided for a property that has been omitted from the response to the initial query. Implementations MAY provide links to their own non-standard formats, but non-standard format names MUST be prefixed by a database-provider-specific prefix. -Below follows an example of the data and meta parts in a response using the JSON response format that communicates that the property value has been omitted from the response, with three different URLs for different partial data formats provided. +Below follows an example of the :field:`data` and :field:`meta` parts of a response using the JSON response format that communicates that the property value has been omitted from the response, with three different URLs for different partial data formats provided. .. code:: jsonc - { + { // ... "data": { - "type": "structures", - "id": "2345678", - "attributes": { - "a": null - } - } - "meta": { + "type": "structures", + "id": "2345678", + "attributes": { + "a": null + } + "meta": { "partial_data_urls": { - "a": [ - { - "format": "jsonlines", - "url": "https://example.org/optimade/v1.2/extensions/partial_data/structures/2345678/a/default_format" - }, - { - "format": "_exmpl_bzip2_jsonlines", - "url": "https://db.example.org/assets/partial_values/structures/2345678/a/bzip2_format" - }, - { - "format": "_exmpl_hdf5", - "url": "https://cloud.example.org/ACCHSORJGIHWOSJZG" - } - ] + "a": [ + { + "format": "jsonlines", + "url": "https://example.org/optimade/v1.2/extensions/partial_data/structures/2345678/a/default_format" + }, + { + "format": "_exmpl_bzip2_jsonlines", + "url": "https://db.example.org/assets/partial_values/structures/2345678/a/bzip2_format" + }, + { + "format": "_exmpl_hdf5", + "url": "https://cloud.example.org/ACCHSORJGIHWOSJZG" + } + ] } + } } - // ... + // ... } Responses @@ -3490,7 +3490,7 @@ The strings below contain Extended Regular Expressions (EREs) to recognize ident OPTIMADE JSON lines partial data format --------------------------------------- The OPTIMADE JSON lines partial data format is a lightweight format for transmitting property data that are too large to fit in a single OPTIMADE response. -The format is based on `JSON Lines `__, which allows for streaming handling of large datasets. +The format is based on `JSON Lines `__, which enables streaming of JSON data. Note: since the below definition references both JSON fields and OPTIMADE properties, the data type names depend on context: for JSON they are, e.g., "array" and "object" and for OPTIMADE properties they are, e.g., "list" and "dictionary". .. _slice object: @@ -3509,13 +3509,13 @@ The dictionary has the following OPTIONAL fields: The default is 1, i.e., every value in the range indicated by :field:`start` and :field:`stop` is included in the slice. Hence, a value of 2 denotes a slice of every second value in the array. -For example, for the array `["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]` the slice object `{"start":1, "end":7, "step": 3}` refers to the items `["b", "e", "h"]`. +For example, for the array `["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]` the slice object `{"start": 1, "end": 7, "step": 3}` refers to the items `["b", "e", "h"]`. Furthermore, we also define the following special markers: - The "end-of-data-marker" is this exact JSON: :val:`["end", [""]]`. -- A "reference-marker" is this exact JSON: :val:`["ref", ["URL"]]`, where :val:`"URL"` is to be replaced with a URL being referenced. -- A "next-marker" is this exact JSON: :val:`["next", ["URL"]]`, where :val:`"URL"` is to be replaced with the target URL for the next link. +- A "reference-marker" is this exact JSON: :val:`["ref", [""]]`, where :val:`""` is to be replaced with a URL being referenced. +- A "next-marker" is this exact JSON: :val:`["next", [""]]`, where :val:`""` is to be replaced with the target URL for the next link. There is no requirement on the syntax or format of the URLs provided in these markers. The data provided via the URLs MUST be the JSON lines partial data format, i.e., the markers cannot be used to link to partial data provided in other formats. From 7b314afdab0c0f62ce54c18a23a6939eac13fa35 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 02:43:12 +0200 Subject: [PATCH 27/60] Add a key to the header to identify the format as OPTIMADE partial data --- optimade.rst | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/optimade.rst b/optimade.rst index 467041a48..b0b63edd5 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3531,6 +3531,13 @@ The full response MUST be valid `JSON Lines `__ that adh The first line MUST be a JSON object providing header information. The header object MUST contain the key: +- :field:`"optimade-partial-data"`: Object. + An object that identifying the response as being on OPTIMADE partial data format. + It MUST contain the following key: + + - :field:`"partial-data-format"`: String. + Specifies the minor version of the partial data format used. The string MUST be of the format "MAJOR.MINOR", referring to the version of the OPTIMADE standard that describes the format. The version number string MUST NOT be prefixed by, e.g., "v". In implementations of the present version of the standard, the value MUST be exactly :val:`1.2`. + - :field:`"format"`: String. A string either equal to :val:`"dense"` or :val:`"sparse"` to indicate whether the returned format is dense or sparse. From 6faf8db50c7189136f0c96ef2f56affdadfc6203 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 08:23:43 +0200 Subject: [PATCH 28/60] Remove trailing whitespace --- optimade.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/optimade.rst b/optimade.rst index b0b63edd5..a2a709baa 100644 --- a/optimade.rst +++ b/optimade.rst @@ -485,7 +485,7 @@ Below follows an example of the :field:`data` and :field:`meta` parts of a respo } ] } - } + } } // ... } @@ -3534,9 +3534,9 @@ The header object MUST contain the key: - :field:`"optimade-partial-data"`: Object. An object that identifying the response as being on OPTIMADE partial data format. It MUST contain the following key: - + - :field:`"partial-data-format"`: String. - Specifies the minor version of the partial data format used. The string MUST be of the format "MAJOR.MINOR", referring to the version of the OPTIMADE standard that describes the format. The version number string MUST NOT be prefixed by, e.g., "v". In implementations of the present version of the standard, the value MUST be exactly :val:`1.2`. + Specifies the minor version of the partial data format used. The string MUST be of the format "MAJOR.MINOR", referring to the version of the OPTIMADE standard that describes the format. The version number string MUST NOT be prefixed by, e.g., "v". In implementations of the present version of the standard, the value MUST be exactly :val:`1.2`. - :field:`"format"`: String. A string either equal to :val:`"dense"` or :val:`"sparse"` to indicate whether the returned format is dense or sparse. From 316df7864622e6f7ac2dd6f6a223eb238c233f53 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 08:38:03 +0200 Subject: [PATCH 29/60] Clarify handling of missing items in partial data --- optimade.rst | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/optimade.rst b/optimade.rst index a2a709baa..f55bc2878 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3546,8 +3546,11 @@ The header object MAY also contain the key: - :field:`"returned_ranges"`: Array of Object. For dense data, and sparse data of one dimensional list properties, the array contains a single element which is a `slice object`_ representing the range of data present in the response. Once the client has encountered an end-of-data-marker, any data not covered by any of the encountered slices are to be assigned the value :val:`null`. - If the field :field:`"format"` is `"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data-marker or next-marker. - If :field:`"returned_ranges"` is included and the client encounters a next or end-of-data-marker before receiving all lines indicated by the slice, it should proceed by not assigning any values to those items, i.e., this is not an error. + If the field :field:`"format"` is `"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data (possibly with a `step` between continuous indices) from the start of the array up to the number of elements given until reaching the end-of-data-marker or next-marker. +If :field:`"returned_ranges"` is included and the client encounters a next-marker before receiving all lines indicated by the slice, it should proceed by not assigning any values to those items, i.e., this is not an error. +Since the remaining values are not assigned a value, they will be :val:`null` if they are not assigned in another response retrieved via a next link encountered before the end-of-data-marker. +(Since there is no requirement that values are assigned in order between responses, it is possible the omitted values have already been assigned. +In that case they shall remain as assigned, i.e., they are not overwritten by :val:`null` in this situation.) In the specific case of a hierarchy of list properties represented as a sparse multi-dimensional array, if the field :field:`"returned_ranges"` is given, it MUST contain one slice object per dimension of the multi-dimensional array, representing slices for each dimension that cover the data given in the response. The format of data lines of the response (i.e., all lines except the first and the last) depends on whether the header object specifies the format as :val:`"dense"` or :val:`"sparse"`. From b080cf25e02b2d50dde47fd55518c10a54aa8656 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 08:54:15 +0200 Subject: [PATCH 30/60] Change markers to be more detectable in stream --- optimade.rst | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/optimade.rst b/optimade.rst index f55bc2878..2098ad57c 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3513,14 +3513,17 @@ For example, for the array `["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]` Furthermore, we also define the following special markers: -- The "end-of-data-marker" is this exact JSON: :val:`["end", [""]]`. -- A "reference-marker" is this exact JSON: :val:`["ref", [""]]`, where :val:`""` is to be replaced with a URL being referenced. -- A "next-marker" is this exact JSON: :val:`["next", [""]]`, where :val:`""` is to be replaced with the target URL for the next link. +- The "end-of-data-marker" is this exact JSON: :val:`["PARTIAL-DATA-END", [""]]`. +- A "reference-marker" is this exact JSON: :val:`["PARTIAL-DATA-REF", [""]]`, where :val:`""` is to be replaced with a URL being referenced. +- A "next-marker" is this exact JSON: :val:`["PARTIAL-DATA-NEXT", [""]]`, where :val:`""` is to be replaced with the target URL for the next link. There is no requirement on the syntax or format of the URLs provided in these markers. The data provided via the URLs MUST be the JSON lines partial data format, i.e., the markers cannot be used to link to partial data provided in other formats. The markers have been deliberately designed to be valid JSON objects but *not* valid OPTIMADE property values. -Since the OPTIMADE list data type is defined as list of values of the same data type or :val:`null`, the above markers cannot be encountered inside the actual data of an OPTIMADE property. +Since the OPTIMADE list data type is defined as a list of values of the same data type or :val:`null`, the above markers cannot be encountered inside the actual data of an OPTIMADE property. + + Implementation note: the unusual string values for the markers should make it possible to, with a high level of precision, determine lines that do not need further processing for potential reference-markers via a pre-scanning step just on the raw JSON text data (or, alternatively, by hooking into the string parser used by the JSON parser to trigger the additional processing only when these strings are detected). + This should help performance when parsing partial data with only occasional reference-markers. The full response MUST be valid `JSON Lines `__ that adheres to the following format: @@ -3594,9 +3597,9 @@ The third provided item (index 14 in the original list) is only partially return .. code:: json {"format": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]} [[10,20,21], [30,40,50]] - ["ref", ["https://example.db.org/value2"]] - [[11, 110], ["ref", ["https://example.db.org/value3"]], [550, 333]] - ["next", ["https://example.db.org/value4"]] + ["PARTIAL-DATA-REF", ["https://example.db.org/value2"]] + [[11, 110], ["PARTIAL-DATA-REF", ["https://example.db.org/value3"]], [550, 333]] + ["PARTIAL-DATA-NEXT", ["https://example.db.org/value4"]] Below follows an example of the sparse format for multi-dimensional lists with three aggregated dimensions. The underlying property value can be taken to be sparse data in lists in four dimensions of 10000 x 10000 x 10000 x N, where the innermost list is a non-sparse list of abitrary length of numbers. @@ -3615,13 +3618,13 @@ An example of the sparse format for multi-dimensional lists with three aggregate {"format": "sparse"} [3,5,19, 10] [30,15,9, 31] - ["next", ["https://example.db.org/"]] + ["PARTIAL-DATA-NEXT", ["https://example.db.org/"]] An example of the sparse format for multi-dimensional lists with three aggregated dimensions and values that are multidimensional lists of integers of arbitrary lengths: .. code:: json {"format": "sparse"} [3,5,19, [ [10,20,21], [30,40,50] ] - [3,7,19, ["ref", ["https://example.db.org/value2"]]] - [4,5,19, [ [11, 110], ["ref", ["https://example.db.org/value3"]], [550, 333]] - ["end", [""]] + [3,7,19, ["PARTIAL-DATA-REF", ["https://example.db.org/value2"]]] + [4,5,19, [ [11, 110], ["PARTIAL-DATA-REF", ["https://example.db.org/value3"]], [550, 333]] + ["PARTIAL-DATA-END", [""]] From bd93804d5a10eaa19e478d1040896e9f6b5b6803 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 08:54:31 +0200 Subject: [PATCH 31/60] Change markers to be more detectable in stream --- optimade.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/optimade.rst b/optimade.rst index 2098ad57c..58d333a40 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3609,8 +3609,8 @@ The response below communicates the first item explicitly; the second one by def .. code:: json {"format": "sparse"} [3,5,19, [10,20,21,30]] - [30,15,9, ["ref", ["https://example.db.org/value1"]]] - ["next", ["https://example.db.org/"]] + [30,15,9, ["PARTIAL-DATA-REF", ["https://example.db.org/value1"]]] + ["PARTIAL-DATA-NEXT", ["https://example.db.org/"]] An example of the sparse format for multi-dimensional lists with three aggregated dimensions and integer values: From 10bc845bf9eb5087763e2c859635ef90f151da96 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 08:55:18 +0200 Subject: [PATCH 32/60] Change markers to be more detectable in stream --- optimade.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index 58d333a40..e5fbe6efe 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3587,7 +3587,7 @@ The request returns the first three items and provides the next-marker link to c 123 345 -12.6 - ["next", ["https://example.db.org/value4"]] + ["PARTIAL-DATA-NEXT", ["https://example.db.org/value4"]] Below follows an example of a dense response for a list property as a partial array of multidimensional array values. The item with index 10 in the original list is provided explicitly in the response and is the first one provided in the response since start=10. From 39d9ae5949a9aa9dd9ecf3d51c8412f0ab24bed6 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 09:04:53 +0200 Subject: [PATCH 33/60] Change format to representation to avoid a clash in terms and fieldnames --- optimade.rst | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/optimade.rst b/optimade.rst index e5fbe6efe..efb365902 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3538,10 +3538,10 @@ The header object MUST contain the key: An object that identifying the response as being on OPTIMADE partial data format. It MUST contain the following key: - - :field:`"partial-data-format"`: String. + - :field:`"format"`: String. Specifies the minor version of the partial data format used. The string MUST be of the format "MAJOR.MINOR", referring to the version of the OPTIMADE standard that describes the format. The version number string MUST NOT be prefixed by, e.g., "v". In implementations of the present version of the standard, the value MUST be exactly :val:`1.2`. -- :field:`"format"`: String. +- :field:`"representation"`: String. A string either equal to :val:`"dense"` or :val:`"sparse"` to indicate whether the returned format is dense or sparse. The header object MAY also contain the key: @@ -3549,32 +3549,32 @@ The header object MAY also contain the key: - :field:`"returned_ranges"`: Array of Object. For dense data, and sparse data of one dimensional list properties, the array contains a single element which is a `slice object`_ representing the range of data present in the response. Once the client has encountered an end-of-data-marker, any data not covered by any of the encountered slices are to be assigned the value :val:`null`. - If the field :field:`"format"` is `"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data (possibly with a `step` between continuous indices) from the start of the array up to the number of elements given until reaching the end-of-data-marker or next-marker. + If the field :field:`"representation"` is `"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data (possibly with a `step` between continuous indices) from the start of the array up to the number of elements given until reaching the end-of-data-marker or next-marker. If :field:`"returned_ranges"` is included and the client encounters a next-marker before receiving all lines indicated by the slice, it should proceed by not assigning any values to those items, i.e., this is not an error. Since the remaining values are not assigned a value, they will be :val:`null` if they are not assigned in another response retrieved via a next link encountered before the end-of-data-marker. (Since there is no requirement that values are assigned in order between responses, it is possible the omitted values have already been assigned. In that case they shall remain as assigned, i.e., they are not overwritten by :val:`null` in this situation.) In the specific case of a hierarchy of list properties represented as a sparse multi-dimensional array, if the field :field:`"returned_ranges"` is given, it MUST contain one slice object per dimension of the multi-dimensional array, representing slices for each dimension that cover the data given in the response. -The format of data lines of the response (i.e., all lines except the first and the last) depends on whether the header object specifies the format as :val:`"dense"` or :val:`"sparse"`. +The format of data lines of the response (i.e., all lines except the first and the last) depends on whether the header object specifies the representation as :val:`"dense"` or :val:`"sparse"`. -- **Dense format:** In the dense partial data format, each data line reproduces one list item in the OPTIMADE list property being transmitted in JSON format. +- **Dense representation:** In the dense partial data representation, each data line reproduces one list item in the OPTIMADE list property being transmitted in JSON format. If OPTIMADE list properties are embedded inside the item, they can either be included in full or replaced with a reference-marker. If a list is replaced by a reference marker, the client MAY use the provided URL to obtain the list items. -- **Sparse format for one-dimensional list:** When the response sparsely communicates items for a one-dimensional OPTIMADE list property, each data line contains a JSON array on the format: +- **Sparse representation for one-dimensional list:** When the response sparsely communicates items for a one-dimensional OPTIMADE list property, each data line contains a JSON array on the format: - The first item is the zero-based index of the item provided. - The second item is a JSON representation of the item, with the same format as the lines in the dense format. - In the same way as for the dense format, reference-markers are allowed for data that does not fit in the response (see example below). + In the same way as for the dense representation, reference-markers are allowed for data that does not fit in the response (see example below). -- **Sparse format for multi-dimensional lists:** We provide a sparse format specifically for the case that the OPTIMADE property represents a series of directly hierarchically embedded lists (i.e., a multidimensional sparse array). - Then, the server MAY represent them using the following sparse multi-dimensional format for a number of aggregated dimensions. +- **Sparse representation for multi-dimensional lists:** We provide a sparse representation specifically for the case that the OPTIMADE property represents a series of directly hierarchically embedded lists (i.e., a multidimensional sparse array). + Then, the server MAY represent them using the following sparse multi-dimensional representation for a number of aggregated dimensions. In this case, each data line contains a JSON array in the format of: - All items except the last item are integer zero-based indices of the value being provided in this line; these indices refer to the aggregated dimensions in the order of outermost to innermost. - - The last item is a JSON representation of the item at those coordinates, with the same format as the lines in the dense format. - In the same way as for the dense format, reference-markers are allowed for data that does not fit in the response. + - The last item is a JSON representation of the item at those coordinates, with the same format as the lines in the dense representation. + In the same way as for the dense representation, reference-markers are allowed for data that does not fit in the response. Examples -------- @@ -3583,7 +3583,7 @@ Below follows an example of a dense response for a partial array data of integer The request returns the first three items and provides the next-marker link to continue fetching data: .. code:: json - {"format": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]} + {"representation": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]} 123 345 -12.6 @@ -3595,13 +3595,13 @@ The item with index 12 in the list, the second data item provided since start=10 The third provided item (index 14 in the original list) is only partially returned: it is a list of three items, the first and last are explicitly provided, the second one is only referenced. .. code:: json - {"format": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]} + {"representation": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]} [[10,20,21], [30,40,50]] ["PARTIAL-DATA-REF", ["https://example.db.org/value2"]] [[11, 110], ["PARTIAL-DATA-REF", ["https://example.db.org/value3"]], [550, 333]] ["PARTIAL-DATA-NEXT", ["https://example.db.org/value4"]] -Below follows an example of the sparse format for multi-dimensional lists with three aggregated dimensions. +Below follows an example of the sparse representation for multi-dimensional lists with three aggregated dimensions. The underlying property value can be taken to be sparse data in lists in four dimensions of 10000 x 10000 x 10000 x N, where the innermost list is a non-sparse list of abitrary length of numbers. The only non-null items in the outer three dimensions are, say, [3,5,19], [30,15,9], and [42,54,17]. The response below communicates the first item explicitly; the second one by deferring the innermost list using a reference-marker; and the third item is not included in this response, but deferred to another page via a next-marker. @@ -3612,18 +3612,18 @@ The response below communicates the first item explicitly; the second one by def [30,15,9, ["PARTIAL-DATA-REF", ["https://example.db.org/value1"]]] ["PARTIAL-DATA-NEXT", ["https://example.db.org/"]] -An example of the sparse format for multi-dimensional lists with three aggregated dimensions and integer values: +An example of the sparse representation for multi-dimensional lists with three aggregated dimensions and integer values: .. code:: json - {"format": "sparse"} + {"representation": "sparse"} [3,5,19, 10] [30,15,9, 31] ["PARTIAL-DATA-NEXT", ["https://example.db.org/"]] -An example of the sparse format for multi-dimensional lists with three aggregated dimensions and values that are multidimensional lists of integers of arbitrary lengths: +An example of the sparse representation for multi-dimensional lists with three aggregated dimensions and values that are multidimensional lists of integers of arbitrary lengths: .. code:: json - {"format": "sparse"} + {"representation": "sparse"} [3,5,19, [ [10,20,21], [30,40,50] ] [3,7,19, ["PARTIAL-DATA-REF", ["https://example.db.org/value2"]]] [4,5,19, [ [11, 110], ["PARTIAL-DATA-REF", ["https://example.db.org/value3"]], [550, 333]] From 2a24c1ac36e348ab05b1087fce831d6a4fc76be2 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 09:09:50 +0200 Subject: [PATCH 34/60] Enable for efficient parsing of responses a server knows has no reference markers --- optimade.rst | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/optimade.rst b/optimade.rst index efb365902..cf4e91c0a 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3548,6 +3548,10 @@ The header object MAY also contain the key: - :field:`"returned_ranges"`: Array of Object. For dense data, and sparse data of one dimensional list properties, the array contains a single element which is a `slice object`_ representing the range of data present in the response. + +- :field:`"has_references"`: Boolean. + An optional boolean to indicate whether any of the data lines in the response contains a reference marker. + By including this field and giving it the value :val:`false`, a server MAY indicate that the client does not have to process any of the lines to detect reference markers. Once the client has encountered an end-of-data-marker, any data not covered by any of the encountered slices are to be assigned the value :val:`null`. If the field :field:`"representation"` is `"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data (possibly with a `step` between continuous indices) from the start of the array up to the number of elements given until reaching the end-of-data-marker or next-marker. If :field:`"returned_ranges"` is included and the client encounters a next-marker before receiving all lines indicated by the slice, it should proceed by not assigning any values to those items, i.e., this is not an error. From 9d9e26e0b4dcec7807f9cc8759f205ef9bd24c02 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 09:10:12 +0200 Subject: [PATCH 35/60] Change format to representation to avoid a clash in terms and fieldnames --- optimade.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index cf4e91c0a..d66e3b47c 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3611,7 +3611,7 @@ The only non-null items in the outer three dimensions are, say, [3,5,19], [30,15 The response below communicates the first item explicitly; the second one by deferring the innermost list using a reference-marker; and the third item is not included in this response, but deferred to another page via a next-marker. .. code:: json - {"format": "sparse"} + {"representation": "sparse"} [3,5,19, [10,20,21,30]] [30,15,9, ["PARTIAL-DATA-REF", ["https://example.db.org/value1"]]] ["PARTIAL-DATA-NEXT", ["https://example.db.org/"]] From ff5a27ce7c6a26ae1ea7b33314c746ccc671d0f1 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 10:33:51 +0200 Subject: [PATCH 36/60] Rename partial_data_url and url to link to better conform to JSON API naming --- optimade.rst | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/optimade.rst b/optimade.rst index d66e3b47c..c0be040c6 100644 --- a/optimade.rst +++ b/optimade.rst @@ -451,7 +451,7 @@ OPTIMADE provides a mechanism for a client to handle such properties by fetching In this case, the response to the initial query gives the value :val:`null` for the property. A list of one or more data URLs together with their respective partial data formats are given in the response. How this list is provided is response format-dependent. -For the JSON response format, see the description of the :field:`partial_data_urls` field, nested under :field:`data` and then :field:`meta`, in the section `JSON Response Schema: Common Fields`_. +For the JSON response format, see the description of the :field:`partial_data_links` field, nested under :field:`data` and then :field:`meta`, in the section `JSON Response Schema: Common Fields`_. The default partial data format is named "jsonlines" and is described in the Appendix `OPTIMADE JSON lines partial data format`_. An implementation SHOULD always include this format as one of alternative partial data formats provided for a property that has been omitted from the response to the initial query. @@ -473,15 +473,15 @@ Below follows an example of the :field:`data` and :field:`meta` parts of a respo "a": [ { "format": "jsonlines", - "url": "https://example.org/optimade/v1.2/extensions/partial_data/structures/2345678/a/default_format" + "link": "https://example.org/optimade/v1.2/extensions/partial_data/structures/2345678/a/default_format" }, { "format": "_exmpl_bzip2_jsonlines", - "url": "https://db.example.org/assets/partial_values/structures/2345678/a/bzip2_format" + "link": "https://db.example.org/assets/partial_values/structures/2345678/a/bzip2_format" }, { "format": "_exmpl_hdf5", - "url": "https://cloud.example.org/ACCHSORJGIHWOSJZG" + "link": "https://cloud.example.org/ACCHSORJGIHWOSJZG" } ] } @@ -643,7 +643,7 @@ Every response SHOULD contain the following fields, and MUST contain at least :f Every resource object MAY also contain a :field:`meta` field with the following keys: - - **partial_data_urls**: an object used to list URLs which can be used to fetch data that has been omitted from the :field:`data` part of the response. + - **partial_data_links**: an object used to list URLs which can be used to fetch data that has been omitted from the :field:`data` part of the response. The keys are the names of the fields in :field:`attributes` for which partial data URLs are available. Each value is a list of items that MUST have the following keys: @@ -651,11 +651,11 @@ Every response SHOULD contain the following fields, and MUST contain at least :f A name of the format provided via this URL. One of the items SHOULD be "jsonlines", which refers to the format in `OPTIMADE JSON lines partial data format`_. - - **url**: String. - The URL from which the data can be fetched. - There is no requirement on the syntax or format of the URL. + - **link**: String. + A `JSON API link `__ that points to a location from which the omitted data can be fetched. + There is no requirement on the syntax or format for the link URL. - For more information about the mechanism to transmit large property values, including an example of the format of :field:`partial_data_urls`, see `Transmission of large property values`_. + For more information about the mechanism to transmit large property values, including an example of the format of :field:`partial_data_links`, see `Transmission of large property values`_. The response MAY also return resources related to the primary data in the field: From 8ae1928e7ce887842ee3702a732a6b90ccf4df90 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 10:34:11 +0200 Subject: [PATCH 37/60] Rename partial_data_url and url to link to better conform to JSON API naming --- optimade.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index c0be040c6..1adea3df4 100644 --- a/optimade.rst +++ b/optimade.rst @@ -469,7 +469,7 @@ Below follows an example of the :field:`data` and :field:`meta` parts of a respo "a": null } "meta": { - "partial_data_urls": { + "partial_data_links": { "a": [ { "format": "jsonlines", From d8a11cbee9e4d90196f2d45a4c30019d51437222 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 10:35:30 +0200 Subject: [PATCH 38/60] Rename partial_data_url and url to link to better conform to JSON API naming --- optimade.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index 1adea3df4..ed9290265 100644 --- a/optimade.rst +++ b/optimade.rst @@ -457,7 +457,7 @@ The default partial data format is named "jsonlines" and is described in the App An implementation SHOULD always include this format as one of alternative partial data formats provided for a property that has been omitted from the response to the initial query. Implementations MAY provide links to their own non-standard formats, but non-standard format names MUST be prefixed by a database-provider-specific prefix. -Below follows an example of the :field:`data` and :field:`meta` parts of a response using the JSON response format that communicates that the property value has been omitted from the response, with three different URLs for different partial data formats provided. +Below follows an example of the :field:`data` and :field:`meta` parts of a response using the JSON response format that communicates that the property value has been omitted from the response, with three different links for different partial data formats provided. .. code:: jsonc { From 11900c542c2f149d5d5c4cbe248f725e3eee0cf3 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 10:37:24 +0200 Subject: [PATCH 39/60] Rename partial_data_url and url to link to better conform to JSON API naming --- optimade.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/optimade.rst b/optimade.rst index ed9290265..989a02725 100644 --- a/optimade.rst +++ b/optimade.rst @@ -643,12 +643,12 @@ Every response SHOULD contain the following fields, and MUST contain at least :f Every resource object MAY also contain a :field:`meta` field with the following keys: - - **partial_data_links**: an object used to list URLs which can be used to fetch data that has been omitted from the :field:`data` part of the response. - The keys are the names of the fields in :field:`attributes` for which partial data URLs are available. + - **partial_data_links**: an object used to list links which can be used to fetch data that has been omitted from the :field:`data` part of the response. + The keys are the names of the fields in :field:`attributes` for which partial data links are available. Each value is a list of items that MUST have the following keys: - **format**: String. - A name of the format provided via this URL. + A name of the format provided via this link. One of the items SHOULD be "jsonlines", which refers to the format in `OPTIMADE JSON lines partial data format`_. - **link**: String. From 1b4093e91b812daa03226b4ed8d06bdab878eb9c Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 11:41:49 +0200 Subject: [PATCH 40/60] Remove trailing whitespace --- optimade.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/optimade.rst b/optimade.rst index 989a02725..3845c6f37 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3523,7 +3523,7 @@ The markers have been deliberately designed to be valid JSON objects but *not* v Since the OPTIMADE list data type is defined as a list of values of the same data type or :val:`null`, the above markers cannot be encountered inside the actual data of an OPTIMADE property. Implementation note: the unusual string values for the markers should make it possible to, with a high level of precision, determine lines that do not need further processing for potential reference-markers via a pre-scanning step just on the raw JSON text data (or, alternatively, by hooking into the string parser used by the JSON parser to trigger the additional processing only when these strings are detected). - This should help performance when parsing partial data with only occasional reference-markers. + This should help performance when parsing partial data with only occasional reference-markers. The full response MUST be valid `JSON Lines `__ that adheres to the following format: @@ -3548,7 +3548,7 @@ The header object MAY also contain the key: - :field:`"returned_ranges"`: Array of Object. For dense data, and sparse data of one dimensional list properties, the array contains a single element which is a `slice object`_ representing the range of data present in the response. - + - :field:`"has_references"`: Boolean. An optional boolean to indicate whether any of the data lines in the response contains a reference marker. By including this field and giving it the value :val:`false`, a server MAY indicate that the client does not have to process any of the lines to detect reference markers. From 496b6ca16e6773b781b8ac88a91641171ad0f772 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 11:47:26 +0200 Subject: [PATCH 41/60] Change representation to layout to not confuse with URL representation; formatting fixes --- optimade.rst | 103 +++++++++++++++++++++++++++++++++------------------ 1 file changed, 67 insertions(+), 36 deletions(-) diff --git a/optimade.rst b/optimade.rst index 3845c6f37..37484a0c3 100644 --- a/optimade.rst +++ b/optimade.rst @@ -460,6 +460,7 @@ Implementations MAY provide links to their own non-standard formats, but non-sta Below follows an example of the :field:`data` and :field:`meta` parts of a response using the JSON response format that communicates that the property value has been omitted from the response, with three different links for different partial data formats provided. .. code:: jsonc + { // ... "data": { @@ -3509,21 +3510,22 @@ The dictionary has the following OPTIONAL fields: The default is 1, i.e., every value in the range indicated by :field:`start` and :field:`stop` is included in the slice. Hence, a value of 2 denotes a slice of every second value in the array. -For example, for the array `["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]` the slice object `{"start": 1, "end": 7, "step": 3}` refers to the items `["b", "e", "h"]`. +For example, for the array :val:`["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"]` the slice object :val:`{"start": 1, "end": 7, "step": 3}` refers to the items :val:`["b", "e", "h"]`. Furthermore, we also define the following special markers: -- The "end-of-data-marker" is this exact JSON: :val:`["PARTIAL-DATA-END", [""]]`. -- A "reference-marker" is this exact JSON: :val:`["PARTIAL-DATA-REF", [""]]`, where :val:`""` is to be replaced with a URL being referenced. -- A "next-marker" is this exact JSON: :val:`["PARTIAL-DATA-NEXT", [""]]`, where :val:`""` is to be replaced with the target URL for the next link. +- The *end-of-data-marker* is this exact JSON: :val:`["PARTIAL-DATA-END", [""]]`. +- A *reference-marker* is this exact JSON: :val:`["PARTIAL-DATA-REF", [""]]`, where :val:`""` is to be replaced with a URL being referenced. +- A *next-marker* is this exact JSON: :val:`["PARTIAL-DATA-NEXT", [""]]`, where :val:`""` is to be replaced with the target URL for the next link. There is no requirement on the syntax or format of the URLs provided in these markers. The data provided via the URLs MUST be the JSON lines partial data format, i.e., the markers cannot be used to link to partial data provided in other formats. The markers have been deliberately designed to be valid JSON objects but *not* valid OPTIMADE property values. Since the OPTIMADE list data type is defined as a list of values of the same data type or :val:`null`, the above markers cannot be encountered inside the actual data of an OPTIMADE property. - Implementation note: the unusual string values for the markers should make it possible to, with a high level of precision, determine lines that do not need further processing for potential reference-markers via a pre-scanning step just on the raw JSON text data (or, alternatively, by hooking into the string parser used by the JSON parser to trigger the additional processing only when these strings are detected). - This should help performance when parsing partial data with only occasional reference-markers. + **Implementation note:** the recognizable string values for the markers should make it possible to, with a high level of precision, determine lines that do not need to be further processed for reference-markers by prescreening the raw JSON text data lines for the relevant string (alternatively, this screening can be done by the string parser used by the JSON parser). + The undelying design idea is that for lines that have reference-markers, the time it takes to process the data structure to locate the markers should be negliable compared to the time it takes to resolve and handle the large data they reference. + Hence, the most relevant optimization is to avoid spending time processing data structures to find markers for lines where there are none. The full response MUST be valid `JSON Lines `__ that adheres to the following format: @@ -3532,62 +3534,87 @@ The full response MUST be valid `JSON Lines `__ that adh - The final line is either an end-of-data-marker (indicating that there is no more data to be given), or a next-marker indicating that more data is available, which can be obtained by retrieving data from the provided URL. The first line MUST be a JSON object providing header information. -The header object MUST contain the key: +The header object MUST contain the keys: - :field:`"optimade-partial-data"`: Object. An object that identifying the response as being on OPTIMADE partial data format. + It MUST contain the following key: - :field:`"format"`: String. Specifies the minor version of the partial data format used. The string MUST be of the format "MAJOR.MINOR", referring to the version of the OPTIMADE standard that describes the format. The version number string MUST NOT be prefixed by, e.g., "v". In implementations of the present version of the standard, the value MUST be exactly :val:`1.2`. -- :field:`"representation"`: String. - A string either equal to :val:`"dense"` or :val:`"sparse"` to indicate whether the returned format is dense or sparse. + It MAY contain the following keys: -The header object MAY also contain the key: +- :field:`"layout"`: String. + A string either equal to :val:`"dense"` or :val:`"sparse"` to indicate whether the returned format uses a dense or sparse layout. -- :field:`"returned_ranges"`: Array of Object. - For dense data, and sparse data of one dimensional list properties, the array contains a single element which is a `slice object`_ representing the range of data present in the response. +The header object MAY also contain the keys: + +- :field:`"property_name"`: String. + The name of the property being provided. + +- :field:`"entry"`: Object. + An object that MUST have the following two keys: + + - :field:`"id"`: String. + The id of the entry of the property being provided. + + - :field:`"type"`: String. + The type of the entry of the property being provided. - :field:`"has_references"`: Boolean. An optional boolean to indicate whether any of the data lines in the response contains a reference marker. By including this field and giving it the value :val:`false`, a server MAY indicate that the client does not have to process any of the lines to detect reference markers. Once the client has encountered an end-of-data-marker, any data not covered by any of the encountered slices are to be assigned the value :val:`null`. - If the field :field:`"representation"` is `"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data (possibly with a `step` between continuous indices) from the start of the array up to the number of elements given until reaching the end-of-data-marker or next-marker. -If :field:`"returned_ranges"` is included and the client encounters a next-marker before receiving all lines indicated by the slice, it should proceed by not assigning any values to those items, i.e., this is not an error. -Since the remaining values are not assigned a value, they will be :val:`null` if they are not assigned in another response retrieved via a next link encountered before the end-of-data-marker. -(Since there is no requirement that values are assigned in order between responses, it is possible the omitted values have already been assigned. -In that case they shall remain as assigned, i.e., they are not overwritten by :val:`null` in this situation.) - In the specific case of a hierarchy of list properties represented as a sparse multi-dimensional array, if the field :field:`"returned_ranges"` is given, it MUST contain one slice object per dimension of the multi-dimensional array, representing slices for each dimension that cover the data given in the response. -The format of data lines of the response (i.e., all lines except the first and the last) depends on whether the header object specifies the representation as :val:`"dense"` or :val:`"sparse"`. +- :field:`"links"`: Object. + + An object to provide relevant links for the property being provided. + It MAY contain the following key: + + - :field:`base_url`: String. + The base URL of the implementation serving the database to which this property belongs. + +- :field:`"returned_ranges"`: Array of Object. + For dense data, and sparse data of one dimensional list properties, the array contains a single element which is a `slice object`_ representing the range of data present in the response. + +If the field :field:`"layout"` is :val:`"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data-marker or next-marker. +If :field:`"returned_ranges"` is included and the client encounters a next-marker before receiving all lines indicated by the slice, it should proceed by not assigning any values to the corresponding items, i.e., this is not an error. +Since the remaining values are not assigned a value, they will be :val:`null` if they are not assigned values by another response retrieved via a next link encountered before the end-of-data-marker. +(Since there is no requirement that values are assigned in a specific order between responses, it is possible that the omitted values are already assigned. +In that case the values shall remain as assigned, i.e., they are not overwritten by :val:`null` in this situation.) +In the specific case of a hierarchy of list properties represented as a sparse multi-dimensional array, if the field :field:`"returned_ranges"` is given, it MUST contain one slice object per dimension of the multi-dimensional array, representing slices for each dimension that cover the data given in the response. -- **Dense representation:** In the dense partial data representation, each data line reproduces one list item in the OPTIMADE list property being transmitted in JSON format. +The format of data lines of the response (i.e., all lines except the first and the last) depends on whether the header object specifies the layout as :val:`"dense"` or :val:`"sparse"`. + +- **Dense layout:** In the dense partial data layout, each data line reproduces one list item in the OPTIMADE list property being transmitted in JSON format. If OPTIMADE list properties are embedded inside the item, they can either be included in full or replaced with a reference-marker. If a list is replaced by a reference marker, the client MAY use the provided URL to obtain the list items. -- **Sparse representation for one-dimensional list:** When the response sparsely communicates items for a one-dimensional OPTIMADE list property, each data line contains a JSON array on the format: +- **Sparse layout for one-dimensional list:** When the response sparsely communicates items for a one-dimensional OPTIMADE list property, each data line contains a JSON array on the format: - The first item is the zero-based index of the item provided. - - The second item is a JSON representation of the item, with the same format as the lines in the dense format. - In the same way as for the dense representation, reference-markers are allowed for data that does not fit in the response (see example below). + - The second item is a JSON layout of the item, with the same format as the lines in the dense format. + In the same way as for the dense layout, reference-markers are allowed for data that does not fit in the response (see example below). -- **Sparse representation for multi-dimensional lists:** We provide a sparse representation specifically for the case that the OPTIMADE property represents a series of directly hierarchically embedded lists (i.e., a multidimensional sparse array). - Then, the server MAY represent them using the following sparse multi-dimensional representation for a number of aggregated dimensions. +- **Sparse layout for multi-dimensional lists:** We provide a sparse layout specifically for the case that the OPTIMADE property represents a series of directly hierarchically embedded lists (i.e., a multidimensional sparse array). + Then, the server MAY represent them using the following sparse multi-dimensional layout for a number of aggregated dimensions. In this case, each data line contains a JSON array in the format of: - All items except the last item are integer zero-based indices of the value being provided in this line; these indices refer to the aggregated dimensions in the order of outermost to innermost. - - The last item is a JSON representation of the item at those coordinates, with the same format as the lines in the dense representation. - In the same way as for the dense representation, reference-markers are allowed for data that does not fit in the response. + - The last item is a JSON layout of the item at those coordinates, with the same format as the lines in the dense layout. + In the same way as for the dense layout, reference-markers are allowed for data that does not fit in the response. Examples --------- +~~~~~~~~ Below follows an example of a dense response for a partial array data of integer values. The request returns the first three items and provides the next-marker link to continue fetching data: .. code:: json - {"representation": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]} + + {"optimade-partial-data": {"format": "1.2.0"}, "layout": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]} 123 345 -12.6 @@ -3599,35 +3626,39 @@ The item with index 12 in the list, the second data item provided since start=10 The third provided item (index 14 in the original list) is only partially returned: it is a list of three items, the first and last are explicitly provided, the second one is only referenced. .. code:: json - {"representation": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]} + + {"optimade-partial-data": {"format": "1.2.0"}, "layout": "dense", "returned_ranges": [{"start": 10, "stop": 20, "step": 2}]} [[10,20,21], [30,40,50]] ["PARTIAL-DATA-REF", ["https://example.db.org/value2"]] [[11, 110], ["PARTIAL-DATA-REF", ["https://example.db.org/value3"]], [550, 333]] ["PARTIAL-DATA-NEXT", ["https://example.db.org/value4"]] -Below follows an example of the sparse representation for multi-dimensional lists with three aggregated dimensions. +Below follows an example of the sparse layout for multi-dimensional lists with three aggregated dimensions. The underlying property value can be taken to be sparse data in lists in four dimensions of 10000 x 10000 x 10000 x N, where the innermost list is a non-sparse list of abitrary length of numbers. The only non-null items in the outer three dimensions are, say, [3,5,19], [30,15,9], and [42,54,17]. The response below communicates the first item explicitly; the second one by deferring the innermost list using a reference-marker; and the third item is not included in this response, but deferred to another page via a next-marker. .. code:: json - {"representation": "sparse"} + + {"optimade-partial-data": {"format": "1.2.0"}, "layout": "sparse"} [3,5,19, [10,20,21,30]] [30,15,9, ["PARTIAL-DATA-REF", ["https://example.db.org/value1"]]] ["PARTIAL-DATA-NEXT", ["https://example.db.org/"]] -An example of the sparse representation for multi-dimensional lists with three aggregated dimensions and integer values: +An example of the sparse layout for multi-dimensional lists with three aggregated dimensions and integer values: .. code:: json - {"representation": "sparse"} + + {"optimade-partial-data": {"format": "1.2.0"}, "layout": "sparse"} [3,5,19, 10] [30,15,9, 31] ["PARTIAL-DATA-NEXT", ["https://example.db.org/"]] -An example of the sparse representation for multi-dimensional lists with three aggregated dimensions and values that are multidimensional lists of integers of arbitrary lengths: +An example of the sparse layout for multi-dimensional lists with three aggregated dimensions and values that are multidimensional lists of integers of arbitrary lengths: .. code:: json - {"representation": "sparse"} + + {"optimade-partial-data": {"format": "1.2.0"}, "layout": "sparse"} [3,5,19, [ [10,20,21], [30,40,50] ] [3,7,19, ["PARTIAL-DATA-REF", ["https://example.db.org/value2"]]] [4,5,19, [ [11, 110], ["PARTIAL-DATA-REF", ["https://example.db.org/value3"]], [550, 333]] From 4d906a2b07246c69f7fbb69b512ed7c3b47900d8 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 11:51:37 +0200 Subject: [PATCH 42/60] Remove accidental leftover text. --- optimade.rst | 2 -- 1 file changed, 2 deletions(-) diff --git a/optimade.rst b/optimade.rst index 37484a0c3..996a5815d 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3544,8 +3544,6 @@ The header object MUST contain the keys: - :field:`"format"`: String. Specifies the minor version of the partial data format used. The string MUST be of the format "MAJOR.MINOR", referring to the version of the OPTIMADE standard that describes the format. The version number string MUST NOT be prefixed by, e.g., "v". In implementations of the present version of the standard, the value MUST be exactly :val:`1.2`. - It MAY contain the following keys: - - :field:`"layout"`: String. A string either equal to :val:`"dense"` or :val:`"sparse"` to indicate whether the returned format uses a dense or sparse layout. From b6ab3aed1cc6c3f71ae19a7d048c99523911e184 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 11:54:33 +0200 Subject: [PATCH 43/60] Fix segment incorrectly placed --- optimade.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index 996a5815d..cddaada59 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3564,7 +3564,6 @@ The header object MAY also contain the keys: - :field:`"has_references"`: Boolean. An optional boolean to indicate whether any of the data lines in the response contains a reference marker. By including this field and giving it the value :val:`false`, a server MAY indicate that the client does not have to process any of the lines to detect reference markers. - Once the client has encountered an end-of-data-marker, any data not covered by any of the encountered slices are to be assigned the value :val:`null`. - :field:`"links"`: Object. @@ -3577,6 +3576,8 @@ The header object MAY also contain the keys: - :field:`"returned_ranges"`: Array of Object. For dense data, and sparse data of one dimensional list properties, the array contains a single element which is a `slice object`_ representing the range of data present in the response. +Once the client has encountered an end-of-data-marker, any data not covered by any of the encountered slices are to be assigned the value :val:`null`. + If the field :field:`"layout"` is :val:`"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data-marker or next-marker. If :field:`"returned_ranges"` is included and the client encounters a next-marker before receiving all lines indicated by the slice, it should proceed by not assigning any values to the corresponding items, i.e., this is not an error. Since the remaining values are not assigned a value, they will be :val:`null` if they are not assigned values by another response retrieved via a next link encountered before the end-of-data-marker. From ee4c1e36cf7a7752536d0e4b2c0194cf421a47b9 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 13:09:32 +0200 Subject: [PATCH 44/60] Fix braces in partial data examples Co-authored-by: Johan Bergsma <29785380+JPBergsma@users.noreply.github.com> --- optimade.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/optimade.rst b/optimade.rst index cddaada59..0ae4a4faa 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3658,7 +3658,7 @@ An example of the sparse layout for multi-dimensional lists with three aggregate .. code:: json {"optimade-partial-data": {"format": "1.2.0"}, "layout": "sparse"} - [3,5,19, [ [10,20,21], [30,40,50] ] + [3,5,19, [ [10,20,21], [30,40,50] ] ] [3,7,19, ["PARTIAL-DATA-REF", ["https://example.db.org/value2"]]] - [4,5,19, [ [11, 110], ["PARTIAL-DATA-REF", ["https://example.db.org/value3"]], [550, 333]] + [4,5,19, [ [11, 110], ["PARTIAL-DATA-REF", ["https://example.db.org/value3"]], [550, 333]]] ["PARTIAL-DATA-END", [""]] From 1b0d1a6ddc4ea3899a8523804dfa9970f05a017e Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 13:21:59 +0200 Subject: [PATCH 45/60] Make returned_range RECOMMENDED and move a sentence that had ended up elsewhere --- optimade.rst | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/optimade.rst b/optimade.rst index 0ae4a4faa..0cb2519bf 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3547,6 +3547,12 @@ The header object MUST contain the keys: - :field:`"layout"`: String. A string either equal to :val:`"dense"` or :val:`"sparse"` to indicate whether the returned format uses a dense or sparse layout. +The following key is RECOMMENDED in the header object: + +- :field:`"returned_ranges"`: Array of Object. + For dense layout, and sparse layout of one dimensional list properties, the array contains a single element which is a `slice object`_ representing the range of data present in the response. + In the specific case of a hierarchy of list properties represented as a sparse multi-dimensional array, if the field :field:`"returned_ranges"` is given, it MUST contain one slice object per dimension of the multi-dimensional array, representing slices for each dimension that cover the data given in the response. + The header object MAY also contain the keys: - :field:`"property_name"`: String. @@ -3573,8 +3579,6 @@ The header object MAY also contain the keys: - :field:`base_url`: String. The base URL of the implementation serving the database to which this property belongs. -- :field:`"returned_ranges"`: Array of Object. - For dense data, and sparse data of one dimensional list properties, the array contains a single element which is a `slice object`_ representing the range of data present in the response. Once the client has encountered an end-of-data-marker, any data not covered by any of the encountered slices are to be assigned the value :val:`null`. @@ -3583,7 +3587,6 @@ If :field:`"returned_ranges"` is included and the client encounters a next-marke Since the remaining values are not assigned a value, they will be :val:`null` if they are not assigned values by another response retrieved via a next link encountered before the end-of-data-marker. (Since there is no requirement that values are assigned in a specific order between responses, it is possible that the omitted values are already assigned. In that case the values shall remain as assigned, i.e., they are not overwritten by :val:`null` in this situation.) -In the specific case of a hierarchy of list properties represented as a sparse multi-dimensional array, if the field :field:`"returned_ranges"` is given, it MUST contain one slice object per dimension of the multi-dimensional array, representing slices for each dimension that cover the data given in the response. The format of data lines of the response (i.e., all lines except the first and the last) depends on whether the header object specifies the layout as :val:`"dense"` or :val:`"sparse"`. From 1b9c6073fa3e4153e5c0cd86dff7067371e21cc0 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 13:23:05 +0200 Subject: [PATCH 46/60] Fix whitespace --- optimade.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index 0cb2519bf..f0a1d1274 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3579,7 +3579,6 @@ The header object MAY also contain the keys: - :field:`base_url`: String. The base URL of the implementation serving the database to which this property belongs. - Once the client has encountered an end-of-data-marker, any data not covered by any of the encountered slices are to be assigned the value :val:`null`. If the field :field:`"layout"` is :val:`"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data-marker or next-marker. From 562d651ca7e7aaf2dfd4625c427c3b946c1d7f12 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 13:27:29 +0200 Subject: [PATCH 47/60] Improve formulation about partial data URLs --- optimade.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index f0a1d1274..139782b0e 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3519,7 +3519,7 @@ Furthermore, we also define the following special markers: - A *next-marker* is this exact JSON: :val:`["PARTIAL-DATA-NEXT", [""]]`, where :val:`""` is to be replaced with the target URL for the next link. There is no requirement on the syntax or format of the URLs provided in these markers. -The data provided via the URLs MUST be the JSON lines partial data format, i.e., the markers cannot be used to link to partial data provided in other formats. +When data is fetched from these URLs the response MUST be the JSON lines partial data format, i.e., the markers cannot be used to link to partial data provided in other formats. The markers have been deliberately designed to be valid JSON objects but *not* valid OPTIMADE property values. Since the OPTIMADE list data type is defined as a list of values of the same data type or :val:`null`, the above markers cannot be encountered inside the actual data of an OPTIMADE property. From 498d169730dacf8badbbda24616d876a3f8db9f2 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 13:28:38 +0200 Subject: [PATCH 48/60] Slightly adjust wording --- optimade.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index 139782b0e..0e1ada4f3 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3519,7 +3519,7 @@ Furthermore, we also define the following special markers: - A *next-marker* is this exact JSON: :val:`["PARTIAL-DATA-NEXT", [""]]`, where :val:`""` is to be replaced with the target URL for the next link. There is no requirement on the syntax or format of the URLs provided in these markers. -When data is fetched from these URLs the response MUST be the JSON lines partial data format, i.e., the markers cannot be used to link to partial data provided in other formats. +When data is fetched from these URLs the response MUST use the JSON lines partial data format, i.e., the markers cannot be used to link to partial data provided in other formats. The markers have been deliberately designed to be valid JSON objects but *not* valid OPTIMADE property values. Since the OPTIMADE list data type is defined as a list of values of the same data type or :val:`null`, the above markers cannot be encountered inside the actual data of an OPTIMADE property. From e5e60463a57a9880ef4ce30f3e98fcb13e9c7cb3 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 13:37:01 +0200 Subject: [PATCH 49/60] Slightly adjust wording --- optimade.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index 0e1ada4f3..04fae6d2c 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3523,7 +3523,7 @@ When data is fetched from these URLs the response MUST use the JSON lines partia The markers have been deliberately designed to be valid JSON objects but *not* valid OPTIMADE property values. Since the OPTIMADE list data type is defined as a list of values of the same data type or :val:`null`, the above markers cannot be encountered inside the actual data of an OPTIMADE property. - **Implementation note:** the recognizable string values for the markers should make it possible to, with a high level of precision, determine lines that do not need to be further processed for reference-markers by prescreening the raw JSON text data lines for the relevant string (alternatively, this screening can be done by the string parser used by the JSON parser). + **Implementation note:** the recognizable string values for the markers should make it possible to prescreen the raw JSON text data lines for the reference-marker string to determine which lines that one can exclude from further processing to resolve references (alternatively, this screening can be done by the string parser used by the JSON parser). The undelying design idea is that for lines that have reference-markers, the time it takes to process the data structure to locate the markers should be negliable compared to the time it takes to resolve and handle the large data they reference. Hence, the most relevant optimization is to avoid spending time processing data structures to find markers for lines where there are none. From 4906c4ff7d727f499b11222dfd64be7f29fe02cb Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 13:38:18 +0200 Subject: [PATCH 50/60] Slightly adjust wording --- optimade.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index 04fae6d2c..4183bdbed 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3523,7 +3523,7 @@ When data is fetched from these URLs the response MUST use the JSON lines partia The markers have been deliberately designed to be valid JSON objects but *not* valid OPTIMADE property values. Since the OPTIMADE list data type is defined as a list of values of the same data type or :val:`null`, the above markers cannot be encountered inside the actual data of an OPTIMADE property. - **Implementation note:** the recognizable string values for the markers should make it possible to prescreen the raw JSON text data lines for the reference-marker string to determine which lines that one can exclude from further processing to resolve references (alternatively, this screening can be done by the string parser used by the JSON parser). + **Implementation note:** the recognizable string values for the markers should make it possible to prescreen the raw text for the JSON data lines for the reference-marker string to determine which lines that one can exclude from further processing to resolve references (alternatively, this screening can be done by the string parser used by the JSON parser). The undelying design idea is that for lines that have reference-markers, the time it takes to process the data structure to locate the markers should be negliable compared to the time it takes to resolve and handle the large data they reference. Hence, the most relevant optimization is to avoid spending time processing data structures to find markers for lines where there are none. From 864450dd84bcc7bd572a622f1471ec87976f945d Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 13:39:39 +0200 Subject: [PATCH 51/60] Slightly adjust wording --- optimade.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index 4183bdbed..dbd2d083d 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3523,7 +3523,7 @@ When data is fetched from these URLs the response MUST use the JSON lines partia The markers have been deliberately designed to be valid JSON objects but *not* valid OPTIMADE property values. Since the OPTIMADE list data type is defined as a list of values of the same data type or :val:`null`, the above markers cannot be encountered inside the actual data of an OPTIMADE property. - **Implementation note:** the recognizable string values for the markers should make it possible to prescreen the raw text for the JSON data lines for the reference-marker string to determine which lines that one can exclude from further processing to resolve references (alternatively, this screening can be done by the string parser used by the JSON parser). + **Implementation note:** the recognizable string values for the markers should make it possible to prescreen the raw text of the JSON data lines for the reference-marker string to determine which lines that one can exclude from further processing to resolve references (alternatively, this screening can be done by the string parser used by the JSON parser). The undelying design idea is that for lines that have reference-markers, the time it takes to process the data structure to locate the markers should be negliable compared to the time it takes to resolve and handle the large data they reference. Hence, the most relevant optimization is to avoid spending time processing data structures to find markers for lines where there are none. From e5741068f1f954cf2d74a89b7d6684efb71fb69e Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 13:47:28 +0200 Subject: [PATCH 52/60] Minor reformulations --- optimade.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index dbd2d083d..a5482c05f 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3572,7 +3572,6 @@ The header object MAY also contain the keys: By including this field and giving it the value :val:`false`, a server MAY indicate that the client does not have to process any of the lines to detect reference markers. - :field:`"links"`: Object. - An object to provide relevant links for the property being provided. It MAY contain the following key: From 336ef21f135ec4b1b58448b6ff74791b97e4c1a0 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 13:47:59 +0200 Subject: [PATCH 53/60] Minor reformulations --- optimade.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/optimade.rst b/optimade.rst index a5482c05f..1821d3c2e 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3537,7 +3537,7 @@ The first line MUST be a JSON object providing header information. The header object MUST contain the keys: - :field:`"optimade-partial-data"`: Object. - An object that identifying the response as being on OPTIMADE partial data format. + An object identifying the response as being on OPTIMADE partial data format. It MUST contain the following key: @@ -3569,7 +3569,7 @@ The header object MAY also contain the keys: - :field:`"has_references"`: Boolean. An optional boolean to indicate whether any of the data lines in the response contains a reference marker. - By including this field and giving it the value :val:`false`, a server MAY indicate that the client does not have to process any of the lines to detect reference markers. + A value of :val:`false` means that the client does not have to process any of the lines to detect reference markers, which may speed up the parsing. - :field:`"links"`: Object. An object to provide relevant links for the property being provided. From 93ee58345862785fb0c858ab186fd050104f55d1 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 13:59:34 +0200 Subject: [PATCH 54/60] Rearrange some text to be more logical --- optimade.rst | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/optimade.rst b/optimade.rst index 1821d3c2e..100c735c3 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3578,19 +3578,12 @@ The header object MAY also contain the keys: - :field:`base_url`: String. The base URL of the implementation serving the database to which this property belongs. -Once the client has encountered an end-of-data-marker, any data not covered by any of the encountered slices are to be assigned the value :val:`null`. - -If the field :field:`"layout"` is :val:`"dense"` and :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data-marker or next-marker. -If :field:`"returned_ranges"` is included and the client encounters a next-marker before receiving all lines indicated by the slice, it should proceed by not assigning any values to the corresponding items, i.e., this is not an error. -Since the remaining values are not assigned a value, they will be :val:`null` if they are not assigned values by another response retrieved via a next link encountered before the end-of-data-marker. -(Since there is no requirement that values are assigned in a specific order between responses, it is possible that the omitted values are already assigned. -In that case the values shall remain as assigned, i.e., they are not overwritten by :val:`null` in this situation.) - The format of data lines of the response (i.e., all lines except the first and the last) depends on whether the header object specifies the layout as :val:`"dense"` or :val:`"sparse"`. - **Dense layout:** In the dense partial data layout, each data line reproduces one list item in the OPTIMADE list property being transmitted in JSON format. If OPTIMADE list properties are embedded inside the item, they can either be included in full or replaced with a reference-marker. If a list is replaced by a reference marker, the client MAY use the provided URL to obtain the list items. + If the field :field:`"returned_ranges"` is omitted, then the client MUST assume that the data is a continuous range of data from the start of the array up to the number of elements given until reaching the end-of-data-marker or next-marker. - **Sparse layout for one-dimensional list:** When the response sparsely communicates items for a one-dimensional OPTIMADE list property, each data line contains a JSON array on the format: @@ -3606,6 +3599,14 @@ The format of data lines of the response (i.e., all lines except the first and t - The last item is a JSON layout of the item at those coordinates, with the same format as the lines in the dense layout. In the same way as for the dense layout, reference-markers are allowed for data that does not fit in the response. +If the final line of the response is a next-marker, the client MAY continue fetching the data for the property by retriving another partial data response from the provided URL. +If the final line is an end-of-data-marker, any data not covered by any of the encountered slices are to be assigned the value :val:`null`. + +If :field:`"returned_ranges"` is included in the response and the client encounters a next-marker before receiving all lines indicated by the slice, it should proceed by not assigning any values to the corresponding items, i.e., this is not an error. +Since the remaining values are not assigned a value, they will be :val:`null` if they are not assigned values by another response retrieved via a next link encountered before the end-of-data-marker. +(Since there is no requirement that values are assigned in a specific order between responses, it is possible that the omitted values are already assigned. +In that case the values shall remain as assigned, i.e., they are not overwritten by :val:`null` in this situation.) + Examples ~~~~~~~~ From edf4f2547e42a21eafd6d30d1b5186e8c26fbb34 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 14:04:32 +0200 Subject: [PATCH 55/60] Clarify optimade-partial-data/format field futureproofing --- optimade.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/optimade.rst b/optimade.rst index 100c735c3..644267bb6 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3543,6 +3543,7 @@ The header object MUST contain the keys: - :field:`"format"`: String. Specifies the minor version of the partial data format used. The string MUST be of the format "MAJOR.MINOR", referring to the version of the OPTIMADE standard that describes the format. The version number string MUST NOT be prefixed by, e.g., "v". In implementations of the present version of the standard, the value MUST be exactly :val:`1.2`. + A client MUST NOT expect to be able to parse the format if the field is not a string of the format MAJOR.MINOR or if the MAJOR version number is unrecognized. - :field:`"layout"`: String. A string either equal to :val:`"dense"` or :val:`"sparse"` to indicate whether the returned format uses a dense or sparse layout. From 5b13315fac4cf0c5f47d055dc810c842ae95a865 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 14:32:18 +0200 Subject: [PATCH 56/60] Minor reformulations and adjustments --- optimade.rst | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/optimade.rst b/optimade.rst index 644267bb6..a1a76301b 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3516,6 +3516,7 @@ Furthermore, we also define the following special markers: - The *end-of-data-marker* is this exact JSON: :val:`["PARTIAL-DATA-END", [""]]`. - A *reference-marker* is this exact JSON: :val:`["PARTIAL-DATA-REF", [""]]`, where :val:`""` is to be replaced with a URL being referenced. + A reference-marker MUST only occur in a place where the property being communicated could have an embedded list. - A *next-marker* is this exact JSON: :val:`["PARTIAL-DATA-NEXT", [""]]`, where :val:`""` is to be replaced with the target URL for the next link. There is no requirement on the syntax or format of the URLs provided in these markers. @@ -3579,6 +3580,11 @@ The header object MAY also contain the keys: - :field:`base_url`: String. The base URL of the implementation serving the database to which this property belongs. + - :field:`"item_schema"`: String. + A URL to a JSON Schema that validates the data lines of the response. + The format SHOULD be the relevant partial extract of a valid property definition as described in `Property Definitions`_. + If a schema is provided, it MUST be a valid JSON schema using the same version of JSON schema as described in that section. + The format of data lines of the response (i.e., all lines except the first and the last) depends on whether the header object specifies the layout as :val:`"dense"` or :val:`"sparse"`. - **Dense layout:** In the dense partial data layout, each data line reproduces one list item in the OPTIMADE list property being transmitted in JSON format. @@ -3588,23 +3594,22 @@ The format of data lines of the response (i.e., all lines except the first and t - **Sparse layout for one-dimensional list:** When the response sparsely communicates items for a one-dimensional OPTIMADE list property, each data line contains a JSON array on the format: - - The first item is the zero-based index of the item provided. - - The second item is a JSON layout of the item, with the same format as the lines in the dense format. - In the same way as for the dense layout, reference-markers are allowed for data that does not fit in the response (see example below). + - The first item of the array is the zero-based index of list property item being provided by this line. + - The second item of the array is the list property item located at the indicated index, represented using the same format as each line in the dense layout. + In the same way as for the dense layout, reference-markers are allowed inside the item data for embedded lists that do not fit in the response (see example below). -- **Sparse layout for multi-dimensional lists:** We provide a sparse layout specifically for the case that the OPTIMADE property represents a series of directly hierarchically embedded lists (i.e., a multidimensional sparse array). - Then, the server MAY represent them using the following sparse multi-dimensional layout for a number of aggregated dimensions. - In this case, each data line contains a JSON array in the format of: +- **Sparse layout for multi-dimensional lists:** the server MAY use a specific sparse layout for the case that the OPTIMADE property represents a series of directly hierarchically embedded lists (i.e., a multidimensional sparse array). + In this case, each data line contains a JSON array of the format: - - All items except the last item are integer zero-based indices of the value being provided in this line; these indices refer to the aggregated dimensions in the order of outermost to innermost. - - The last item is a JSON layout of the item at those coordinates, with the same format as the lines in the dense layout. - In the same way as for the dense layout, reference-markers are allowed for data that does not fit in the response. + - All array items except the last one are integer zero-based indices of the list property item being provided by this line; these indices refer to the aggregated dimensions in the order of outermost to innermost. + - The last item of the array is the list property item located at the indicated coordinates, represented using the same format as each line in the dense layout. + In the same way as for the dense layout, reference-markers are allowed inside the item data for embedded lists that do not fit in the response (see example below). If the final line of the response is a next-marker, the client MAY continue fetching the data for the property by retriving another partial data response from the provided URL. -If the final line is an end-of-data-marker, any data not covered by any of the encountered slices are to be assigned the value :val:`null`. +If the final line is an end-of-data-marker, any data not covered by any of the responses are to be assigned the value :val:`null`. If :field:`"returned_ranges"` is included in the response and the client encounters a next-marker before receiving all lines indicated by the slice, it should proceed by not assigning any values to the corresponding items, i.e., this is not an error. -Since the remaining values are not assigned a value, they will be :val:`null` if they are not assigned values by another response retrieved via a next link encountered before the end-of-data-marker. +Since the remaining values are not assigned a value, they will be :val:`null` if they are not assigned values by another response retrieved via a next link encountered before the final end-of-data-marker. (Since there is no requirement that values are assigned in a specific order between responses, it is possible that the omitted values are already assigned. In that case the values shall remain as assigned, i.e., they are not overwritten by :val:`null` in this situation.) From 2cfe8c058c3b3b28b1640b264b4f3009b769e75a Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 15:31:26 +0200 Subject: [PATCH 57/60] Allow an inline item_schema in addition to the link --- optimade.rst | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/optimade.rst b/optimade.rst index a1a76301b..dd131b55c 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3573,6 +3573,11 @@ The header object MAY also contain the keys: An optional boolean to indicate whether any of the data lines in the response contains a reference marker. A value of :val:`false` means that the client does not have to process any of the lines to detect reference markers, which may speed up the parsing. +- :field:`"item_schema"`: Object. + An object that represents a JSON Schema that validates the data lines of the response. + The format SHOULD be the relevant partial extract of a valid property definition as described in `Property Definitions`_. + If a schema is provided, it MUST be a valid JSON schema using the same version of JSON schema as described in that section. + - :field:`"links"`: Object. An object to provide relevant links for the property being provided. It MAY contain the following key: @@ -3580,11 +3585,9 @@ The header object MAY also contain the keys: - :field:`base_url`: String. The base URL of the implementation serving the database to which this property belongs. - - :field:`"item_schema"`: String. - A URL to a JSON Schema that validates the data lines of the response. - The format SHOULD be the relevant partial extract of a valid property definition as described in `Property Definitions`_. - If a schema is provided, it MUST be a valid JSON schema using the same version of JSON schema as described in that section. - + - :field:`"item_describedby"`: String. + A URL to an external JSON Schema that validates the data lines of the response. + The format and requirements on this schema are the same as for the inline schema field :field:`item_schema`. The format of data lines of the response (i.e., all lines except the first and the last) depends on whether the header object specifies the layout as :val:`"dense"` or :val:`"sparse"`. - **Dense layout:** In the dense partial data layout, each data line reproduces one list item in the OPTIMADE list property being transmitted in JSON format. From 4e9fb4d7423364ecb6b998e41fd9ec8a6efb0dff Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Thu, 15 Jun 2023 15:34:55 +0200 Subject: [PATCH 58/60] Fix missing quotation marks --- optimade.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/optimade.rst b/optimade.rst index dd131b55c..92ffe082e 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3582,7 +3582,7 @@ The header object MAY also contain the keys: An object to provide relevant links for the property being provided. It MAY contain the following key: - - :field:`base_url`: String. + - :field:`"base_url"`: String. The base URL of the implementation serving the database to which this property belongs. - :field:`"item_describedby"`: String. From b50d93d637624c3f598262e0f3eadd1438974dcb Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Fri, 16 Jun 2023 03:35:17 +0200 Subject: [PATCH 59/60] Minor language corrections from review Co-authored-by: Giovanni Pizzi --- optimade.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/optimade.rst b/optimade.rst index 92ffe082e..2d77fe4ab 100644 --- a/optimade.rst +++ b/optimade.rst @@ -3524,8 +3524,8 @@ When data is fetched from these URLs the response MUST use the JSON lines partia The markers have been deliberately designed to be valid JSON objects but *not* valid OPTIMADE property values. Since the OPTIMADE list data type is defined as a list of values of the same data type or :val:`null`, the above markers cannot be encountered inside the actual data of an OPTIMADE property. - **Implementation note:** the recognizable string values for the markers should make it possible to prescreen the raw text of the JSON data lines for the reference-marker string to determine which lines that one can exclude from further processing to resolve references (alternatively, this screening can be done by the string parser used by the JSON parser). - The undelying design idea is that for lines that have reference-markers, the time it takes to process the data structure to locate the markers should be negliable compared to the time it takes to resolve and handle the large data they reference. + **Implementation note:** the recognizable string values for the markers should make it possible to prescreen the raw text of the JSON data lines for the reference-marker string to determine which are the lines that one can exclude from further processing to resolve references (alternatively, this screening can be done by the string parser used by the JSON parser). + The undelying design idea is that for lines that have reference-markers, the time it takes to process the data structure to locate the markers should be negligible compared to the time it takes to resolve and handle the large data they reference. Hence, the most relevant optimization is to avoid spending time processing data structures to find markers for lines where there are none. The full response MUST be valid `JSON Lines `__ that adheres to the following format: From dfc24d49f339f9ef20e762e815609dd175456194 Mon Sep 17 00:00:00 2001 From: Rickard Armiento Date: Fri, 16 Jun 2023 03:36:16 +0200 Subject: [PATCH 60/60] Add sentence about implementations decision on what is partial data --- optimade.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/optimade.rst b/optimade.rst index 2d77fe4ab..74a34d166 100644 --- a/optimade.rst +++ b/optimade.rst @@ -447,6 +447,7 @@ Transmission of large property values A property value may be too large to fit in a single response. OPTIMADE provides a mechanism for a client to handle such properties by fetching them in separate series of requests. +It is up to the implementation to decide which values are too large to represent in a single response, and this decision MAY change between responses. In this case, the response to the initial query gives the value :val:`null` for the property. A list of one or more data URLs together with their respective partial data formats are given in the response.