This section describes how to calculate metrics for several datasets produced by Open Targets. Most of them are generated by the Open Targets Platform ETL; however, a limited subset of metrics can be computed on pre-ETL data as well.
The only parameter which needs to be set before the run is the version of the upcoming Open Targets release: export OT_RELEASE=...
The format of this variable, as well as the name of the resulting metrics output, will depend on the type of the run. Note that for post-ETL runs, a timestamp on when the run was completed is appended:
Run type | OT_RELEASE format | Example metrics output name |
Pre-ETL | YY.MM_pre | 23.12_pre |
Post-ETL, regular | YY.MM | 23.12_2023-10-31 |
Post-ETL, PPP | partners/YY.MM | 23.12_ppp_2023-11-24 |
export PROJECT=open-targets-eu-dev
export REGION=europe-west1
export BUCKET=gs://ot-release-metrics
export SUBNET=ot-dataproc-serverless
gcloud dataproc batches submit pyspark \
--container-image ${IMAGE} \
--region ${REGION} \
--project ${PROJECT} \
--deps-bucket ${BUCKET} \
--subnet ${SUBNET} \
--files config/config.yaml \
--properties "spark.executor.cores=16" \
metric-calculation/src/metric_calculation/ \
-- \
If a recently completed run doesn't show up in the app, click on the “Refresh list of runs” button.