Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OPTIMADE JSON Lines specification appendix #531

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

ml-evs
Copy link
Member

@ml-evs ml-evs commented Jul 19, 2024

This PR begins the work on #471, by defining the conventions on top of JSON Lines for deserializing an entire OPTIMADE API into a file.

optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
@ml-evs ml-evs changed the title [WIP] OPTIMADE JSON Lines specification appendix OPTIMADE JSON Lines specification appendix Jul 29, 2024
Fix code block formatting

Update optimade.rst

Apply suggestions from code review

Apply suggestions from code review

Try to fix formatting again

Fix formatting and add note about provider fields
@ml-evs ml-evs force-pushed the ml-evs/jsonlines-draft branch from 6929f10 to c5f1f53 Compare July 29, 2024 20:51
@ml-evs ml-evs force-pushed the ml-evs/jsonlines-draft branch from 96b0231 to 78e158c Compare July 29, 2024 21:09
@ml-evs ml-evs marked this pull request as ready for review July 30, 2024 16:55
@ml-evs ml-evs requested a review from eimrek July 30, 2024 16:55
Copy link
Member

@eimrek eimrek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ml-evs, and sorry for the delays. Now had some time to take a look. I like the proposal and it can be used for optimade-maker as well. But I wrote some comments for consideration.


The `JSON Lines <https://jsonlines.org/>`__ format enforces the following rules:

- each line is a valid JSON object,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- each line is a valid JSON object,
- each line is a valid JSON value,

Is this more precise?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not sure, to me value implies something stored under a key, whereas object is the smallest chunk of data that can be parsed entirely as JSON?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on https://jsonlines.org/ they mention "values" (as even e.g. just strings or integers are supported). But I agree that in the vast majority of cases, people will use objects.

optimade.rst Outdated Show resolved Hide resolved

- The next line MAY contain a standard OPTIMADE ``meta`` object, following the same rules described in `JSON Response Schema: Common Fields`_, where every MUST and SHOULD rule can be reinterpreted as a MAY rule.
- The next block of lines provides the ``info`` endpoint responses.
- First the base info response MUST be provided, following the description at `Base Info Endpoint`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MUST

This breaks current optimade-maker jsonl files (well, just the li-ion-conductors one). Would a MAY, in principle, work here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite important IMO, we don't provide any other way of identifying the file (and in fact we might want to even expand this with a version of the JSONL specification like {'x-optimade': {'api_version': 1.1, 'jsonl_version': 1.2} if we want the JSONL file to be able to store OPTIMADE data of any version. Would you be against simply adding it to the files where it is missing atm?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not sure if my comment was precise enough. I was commenting on the base info endpoint, which doesn't contain any crucial info as far as i understand, but I see how it can be useful to know what follows in the next lines.

- The next line MAY contain a standard OPTIMADE ``meta`` object, following the same rules described in `JSON Response Schema: Common Fields`_, where every MUST and SHOULD rule can be reinterpreted as a MAY rule.
- The next block of lines provides the ``info`` endpoint responses.
- First the base info response MUST be provided, following the description at `Base Info Endpoint`_.
- The next lines MUST contain the entry info endpoint responses for the all entry types present later in the file, as described in `Entry Listing Info Endpoints`_. These MUST appear in alphabetical order by entry type name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These MUST appear in alphabetical order by entry type name.

Fine for me, but is it needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to relax this constraint, probably we'll never need to worry about an OPTIMADE JSONL file containing 100s of entry types (in which case you could use alphabetical order to short-circuit the search for a given entry type).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i see, but anyway, fine for me. good to keep consistent if we require the entry blocks to be sorted, where this is more valuable, i think.

Comment on lines +4554 to +4555
{"type": "structures", "id": "1", "attributes": {...}}
{"type": "references", "id": "2", "attributes": {...}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

references block should be ahead of structures, as required above. But again, is the alphabetical ordering needed?


- The next line MAY contain a standard OPTIMADE ``meta`` object, following the same rules described in `JSON Response Schema: Common Fields`_, where every MUST and SHOULD rule can be reinterpreted as a MAY rule.
- The next block of lines provides the ``info`` endpoint responses.
- First the base info response MUST be provided, following the description at `Base Info Endpoint`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not sure if my comment was precise enough. I was commenting on the base info endpoint, which doesn't contain any crucial info as far as i understand, but I see how it can be useful to know what follows in the next lines.

- The next line MAY contain a standard OPTIMADE ``meta`` object, following the same rules described in `JSON Response Schema: Common Fields`_, where every MUST and SHOULD rule can be reinterpreted as a MAY rule.
- The next block of lines provides the ``info`` endpoint responses.
- First the base info response MUST be provided, following the description at `Base Info Endpoint`_.
- The next lines MUST contain the entry info endpoint responses for the all entry types present later in the file, as described in `Entry Listing Info Endpoints`_. These MUST appear in alphabetical order by entry type name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i see, but anyway, fine for me. good to keep consistent if we require the entry blocks to be sorted, where this is more valuable, i think.

Comment on lines +4554 to +4555
{"type": "structures", "id": "1", "attributes": {...}}
{"type": "references", "id": "2", "attributes": {...}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{"type": "structures", "id": "1", "attributes": {...}}
{"type": "references", "id": "2", "attributes": {...}}
{"type": "references", "id": "1", "attributes": {...}}
{"type": "structures", "id": "2", "attributes": {...}}

switched em

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants