- FAIRtracks - metadata standard for genomic tracks
FAIRtracks is a set of JSON Schemas developed through the ELIXIR implementation study: "FAIRification of Genomic Tracks", as a minimal standard for genomic track metadata. For more information on the implementation study, please check out:
FAIRtracks v1.0.2
-
The FAIRtracks standard consists of a main JSON Schema and a set of subschemas. A JSON document of track metadata must validate towards the main FAIRtracks JSON Schema to be said to follow the standard.
-
The main FAIRtracks JSON Schema is simply named
fairtracks.schema.json
and is documented here:Title JSON Schema Schema documentation Example JSON document FAIRtracks JSON Schema fairtracks.schema.json fairtracks.md fairtracks.example.json -
This top-level FAIRtracks JSON Schema contains, in addition to some general metadata fields, four arrays of JSON sub-documents for the four main object types in FAIRtracks:
studies
,experiments
,samples
, andtracks
. Each of these object types are described in a separate sub-schema:Title JSON Schema Schema documentation Example JSON document Study fairtracks_study.schema.json fairtracks_study.md fairtracks_study.example.json Experiment fairtracks_experiment.schema.json fairtracks_experiment.md fairtracks_experiment.example.json Sample fairtracks_sample.schema.json fairtracks_sample.md fairtracks_sample.example.json Track fairtracks_track.schema.json fairtracks_track.md fairtracks_track.example.json -
FAIRtracks also contains the following convenience sub-schemas:
Title JSON Schema Schema documentation Example JSON document Phenotype fairtracks_phenotype.schema.json fairtracks_phenotype.md fairtracks_phenotype.example.json Contact fairtracks_contact.schema.json fairtracks_contact.md fairtracks_contact.example.json
- Linux-like shell with "bash". Mac OS X will do, but you probably need to install either XCode (from the App Store) or the XCode Command line tools.
- Python >= 3.6
- Node.js >= v10 and npm >= 3.10.8
- git (relatively recent version is probably best)
- On Mac OS X, all the above can be installed using HomeBrew.
- An OPML editor is also recommended, but not required. See OPML editors below for more information.
- Create personal fork in GitHub ("Fork" button).
- Clone the fork to your computer (e.g.,
git clone https://github.com/myusername/fairtracks_standard.git
). - Run
make raw
, and edit the raw OPML files to your liking. For more information about themake
targets, see below. - Run
make
ormake all
- Repeat step 4 and 5 until you are satisfied with the changes.
- Run
make rawclean
to remove the raw OPML files before committing. - Commit and push your changes to a feature branch in your personal fork and create a pull request, as described in the standard GitHub Flow workflow.
- Once the Pull Request is accepted:
- Pull the latest changes in the
master
branch to your local repo. - Rebase your feature branch on top of
master
. - Make sure that all commits are consistently built. The automatically
installed
git-hooks
will also check for consistency. To make a commit consistent, rebuild it with therebuild_all.sh
script. To clean up previous commits, use interactive rebase as described under 1b. make git-hooks below.
- Pull the latest changes in the
- Force push your feature branch to your personal fork, which should update the pull request, and notify us.
There is an inherent order to the different types of files in this repo,
defined in the Makefile
. The FAIRtracks standard is almost fully defined in
the OPML files found under json/overview
, with just a small bit of top-level
logic being handled by opml_to_json.py. All
the JSON Schema and JSON example files are automatically generated based upon
the OPML files. Such automatic file generation are handled by various make
targets:
These make
targets are run automatically if needed by the other make
targets, but are also available for manual use if there is need.
a. make venv
-
Autogenerates a Python virtual environment in the
.venv
directory, if not already present. In case the Python executable you want to link up to the virtual environment is located in a non-standard path, you can use the environment variablePYTHON_EXE
before the firstmake venv
command. For instance:PYTHON_EXE=/path/to/my/python3 make .venv
b. make git-hooks
-
Installs the version-controlled git hooks into the local repo. The git hooks makes sure that:
- All changed files are committed together
- All secondary files have been recompiled with
make
The checks are run before git commits or remote pushes are finalized.
It is especially important that the git hooks are installed before merging or rebasing is done, as the SHA256 signatures of the JSON files may then need to be recalculated (by
make
) on merged/rebased commits. To fix such issues (which will appear when trying to push to GitHub) one will need to carry out an interactive rebase:- Start interactive rebase:
git rebase -i $FIRST_COMMIT^
, where$FIRST_COMMIT
is the first commit that need editing (you can find this in the log messages from the failed remote push). - In the editor that appears, replace
pick
withedit
for the commits that needs editing. You should also at this point plan to clean up your commits by reordering or squashing them, as well as improving the commit messages. ./rebuild_all.sh
- For all changed files:
git add $FILE
git commit --amend
git rebase --continue
- Repeat iii-vi for all commits selected for editing.
c. make jsonschema2md
- Installs the node package "jsonschema2md" which is used to generate the JSON Schema documentation. The package is installed under "node_modules", together with all its dependencies.
The following process should be followed when changing the contents of the
FAIRtracks
standard itself:
a. make raw
- This makes copies of the existing *.opml files into similarly names *.raw.opml files. The raw OPML files are made to be opened for editing in specialized outlining tools. As such tools vary in the exact content of the exported OPML files, the raw OPML files need to be compiled into standardized, cleaned-up versions before they are committed to git.
- You only need to run
make raw
once. If you accidentally run the command twice, any existing raw OPML files will be renamed to *.raw.opml.old. - The raw OPML files are ignored by git and can be edited in an OPML editor of choice. See OPML file format below for more information.
- Be sure to delete the raw OPML files (with
make rawclean
) before carrying out any git commands. This is important, as e.g. changing branches will not change the raw OPML files, since they are ignored by git. Thus, if one fails to remove the raw OPML files before switching commits,make
will just regenerate the prevous commit on top of the new one.
b. make
or make all
-
After the raw OPML files have been edited,
make
runs:make opml
to generate cleaned up, standardized versons of the raw OPML files.make json
to generate JSON Schema files and related example JSON files from the cleaned up OPML files.make docs
to generate Markdown documentation files under thedocs
directory.
All the generated JSON Schema files, as well as the top-level JSON example file, include a stable SHA256 signature of their contents.
a. make signature
- Computes and prints the stable SHA256 signature for all the JSON files.
b. make rawclean
- Removes all raw OPML and related .old files.
- Should only be run if you are sure that all changes in the raw OPML files
have propagated to other files, i.e. you should make sure that you have run
make
first. - Raw OPML files must be removed prior to running any
git
command, as explained above, section 2a.
c. make clean
- Runs
make rawclean
, in addition to removing the virtual environment in the.venv
directory, the git hooks, and thenode_modules
directory.
OPML is a standard file format defined specifically for outlining software.
Raw OPML files can be edited by specific outlining tools, but as the format it is a subtype of XML one can also use generic XML editors:
- On Mac OS, we recommend using the commercial tool OmniOutliner, as there are really no open source alternatives with similar user interface.
- As an open source, platform-agnostic alternative, we recommend TreeLine.
- The OPML files can of course also be edited manually, in which case you can ignore the raw OPML files completely.
-
Each
<outline>
tag defines a JSON property, with the hierarchy defined by the XML hierarchy. -
The details for each JSON property is defined by a set of possible attributes for each tag. Many of the standard JSON Schema keywords are directly supported:
Attribute Description _name
The name of the JSON property. const
Constant value (the only value allowed). default
Default value if no value is provided. description
Human-readable description of the property. enum
Set of values allowed, separated by |
.examples
Set of example values, separated by |
. All properties must have the same number of examples (or none) within each JSON Schema.format
Format of current string property. Supports all of the standard JSON Schema formats, and in addition we support two custom formats: "curie" and "term", for respectively Identifiers.org-resolvable CURIEs and ontology terms. minItems
Minimal number of items in current array property. pattern
Regexp format for current string property. ref
JSON Pointer to another JSON Schema to import under the property. required
If "true"
the current property is required.title
Title of the JSON Schema type
Data type of the current property: string, object, array, number, or boolean. In addition to the standard JSON keywords detailed above, a set of extended attributes have been defined:
Attribute Description ancestors
Ontology labels, separated by |
, used to validate properties interm
format. At least one of these terms must be an ancestor of the value in one of the specified ontologies.autogenerated
If true
, the contents of the current property will be filled automatically by the FAIRtracks autogenerate service (to be implemented later).comments
Comments that will remain in the OPML files only. constIf
If the specified if_property
has the specifiedif_value
, the current property must follow the specifiedthen_value
, interpreted asconst
.foreignProperty
JSON Pointer to a linked identifier property in another schema. Two JSON documents, one following the current schema and the other following the foreign schema, are related if the values in the two linked properties are the same. matchType
Validation rule. For properties in curie
format: eitherbasic
,loose
, orcanonical
. For properties incurie
format: eitherexact
,suffix
, orlabel
.namespace
Namespaces, separated by |
, registered in http://identifiers.org. Is used to validatecurie
values.ontology
URLs to downloadable ontologies in OWL format, separated by |
. To be used to validate properties interm
format, which is used for ontologyterm_id
properties.ontologyTermPair
Pair of JSON Pointers in the format id=IDPTR;label=LABELPTR
, whereIDPTR
andLABELPTR
are JSON Pointers to, respectively, an ontology term id and its corresponding (primary) label. Currently only pointers to child properties are supported, e.g.id=0/term_id;label=0/term_label
. To be used in autogeneration and validation.requireAnyOf
For every level of the object hierarchy, at least one of the properties with requireAnyOf="true"
at that level isrequired
.requireIf
If the specified if_property
has the specifiedif_value
, the current property isrequired
.unique
If "true"
the value of the current property must be unique across all JSON documents.For more information, please visit the FAIRtracks validator GitHub repository (see VALIDATION.md for directions).
-
The
constIf
andrequireIf
attributes require the value to follow a specific pattern:Pattern part Description Attribute(s) Obligatory Example if_property=
Relative JSON Pointer to property to check constIf
requireIf
Yes 2/technique/term_id=
if_value
Value to check for constIf
requireIf
Yes http://purl.obolibrary.org/obo/OBI_0001853
;
If-then delimiter constIf
Yes ;
then_property=
Relative JSON Pointer to property to acquire const
valueconstIf
No 1/term_id=
then_value
const
value forthen_property
constIf
Yes http://purl.obolibrary.org/obo/SO_0000685
|
Pattern delimiter (between patterns if more than one) constIf
requireIf
No -
In order to support multiple OPML editors, the first
<outline>
tag in the OPML files (the one with_text="#title"
) should contain all properties in alphabetical order, with an attached value (typically"."
or"0"
). These parameters are ignored for that line (as it is just used to generate thetitle
of the JSON Schema). -
When adding, removing, or renaming attributes:
- Please update the first
<outline>
tag (with_text="#title"
) for all OPML files, as described directly above. - In most cases, new attributes should also be added to the
ATTRIBS_TO_IMPORT
constant in the opml_to_json.py script, in the order in which they should appear in the generated JSON Schemas.
- Please update the first
- Please visit the VALIDATION.md document.