Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request]: Integration with JSON-LD and BioSchemas #68

Open
M-casado opened this issue Jun 22, 2023 · 2 comments
Open

[Feature request]: Integration with JSON-LD and BioSchemas #68

M-casado opened this issue Jun 22, 2023 · 2 comments

Comments

@M-casado
Copy link
Contributor

Summary

Integration of JSON-LD syntax into the validation of a JSON document

Motivation

The idea would be to expand the interpretation of a JSON-LD document with its context so that it can be further validated following Schema.org and BioSchemas types and profiles.

This request is open for discussion, given that I'm not fully sure about the implications of interpreting JSON-LD as well as JSON Schema and if the benefits outweigh the changes.

Details

The way I envision this feature is by interpreting a JSON-LD document within its context of types and profiles. It could be done both by:

  • Interpreting the schema. If a schema has the condition of a BioSchemas/Schemas.org property (e.g. Person), then Biovalidator would interpret it, fetch the BioSchemas/Schemas.org definition of that property and apply it during validation. For example, if the JSON looked like the following, Biovalidator would interpret that the schema to apply is the one defined by Schema.org for the Person type.
{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "$id": "https://raw.githubusercontent.com/EbiEga/ega-metadata-schema/main/schemas/person.json",
  "type": "object",
  "required": ["person"],
  "additionalProperties": false,
  "properties": {
    "person": {
      "@context": "https://schema.org/",
      "@type": "Person"
    }
  }
}

# If there is a place where "https://schema.org/Person" has its definition in raw format, the following could also be done (?)
{
  "$schema": "https://json-schema.org/draft/2019-09/schema",
  "$id": "https://raw.githubusercontent.com/EbiEga/ega-metadata-schema/main/schemas/person.json",
  "type": "object",
  "required": ["person"],
  "additionalProperties": false,
  "properties": {
    "person": {
      "$ref": "https://schema.org/Person"
    }
  }
}
  • Interpreting the data. If the data has the condition of a BioSchemas/Schemas.org property (e.g. Person), then Biovalidator would interpret it as such and apply it during validation. For example, if the JSON data looked like the following, Biovalidator would interpret that an extra layer to apply is the one defined by Schema.org for the Person type. If I am honest, I do not have trust in this approach, given that it would imply conditioning the data through the tool, rather than relying entirely on the schema for it.
{
  "@context": "https://schema.org",
  "@type": "Person",
  ...
}

Use-cases

  • Re-using Schemas.org and BioSchemas directly from the source by referencing their definitions
@M-casado
Copy link
Contributor Author

M-casado commented Jun 23, 2023

Regarding the validation of the properties, I found JSON-Paths mixed with URIs very handy to take advantage of Bioschemas having their validation definitions within GitHub. There is an error regarding Biovalidator compiling schemas with already existing $id that seems easy to solve (I hope), but I haven't done enough testing about this.

For example, let's say we at the EGA want to allow for a Gene entity in one of our schemas. Instead of defining it ourselves, we re-use what is already in Bioschemas, and use their bespoke type Gene. Since the validation of the entity is within the keyword $validation of their JSON-LD, we can reference it as ``https://raw.githubusercontent.com/BioSchemas/specifications/master/Phenotype/jsonld/Phenotype_v0.2-DRAFT.json#/@data[0]/$validation". Now, there are some caveats, like:

  • The fact that everything is flat within the array @graph, so we would be hoping to find the correct item in the first place, but could well not be the case.
  • The fact that JSON Pointer notation defined in RFC6901 may well not support the array indexing to reference the correct item in the array.

These may be solved were we to allow the interpretation of @id (e.g. "@id": "bioschemas:Phenotype" could be referenced anywhere like in any JSON-LD).

Following what I mentioned above, we could directly use them as follows:

{
  "schema": {
    "$ref": "https://raw.githubusercontent.com/BioSchemas/specifications/master/Phenotype/jsonld/Phenotype_v0.2-DRAFT.json#/@graph[0]/$validation"
  },
  "data": { }
}

The benefits of this are the common ones: fewer schemas for the same entities, increased interoperability and findability (due to Google rich results). But it may well not work 😅

@sneumann
Copy link

Hi, I only stumbled on your suggestion now, and while I don't fully understand it, it seems to be in line (well and beyond :-) with what I had in mind in #66 where @theisuru gave a few comments. Yours, Steffen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants