Skip to content

Latest commit

 

History

History
37 lines (30 loc) · 1.76 KB

metric-calculation.md

File metadata and controls

37 lines (30 loc) · 1.76 KB

Metric calculation

This section describes how to calculate metrics for several datasets produced by Open Targets. Most of them are generated by the Open Targets Platform ETL; however, a limited subset of metrics can be computed on pre-ETL data as well.

The only parameter which needs to be set before the run is the version of the upcoming Open Targets release: export OT_RELEASE=...

The format of this variable, as well as the name of the resulting metrics output, will depend on the type of the run. Note that for post-ETL runs, a timestamp on when the run was completed is appended:

Run type OT_RELEASE format Example metrics output name
Pre-ETL YY.MM_pre 23.12_pre
Post-ETL, regular YY.MM 23.12_2023-10-31
Post-ETL, PPP partners/YY.MM 23.12_ppp_2023-11-24

Submit job to Dataproc

export IMAGE=europe-west1-docker.pkg.dev/open-targets-eu-dev/ot-release-metrics/metric-calculation:latest
export PROJECT=open-targets-eu-dev
export REGION=europe-west1
export BUCKET=gs://ot-release-metrics
export SUBNET=ot-dataproc-serverless
gcloud dataproc batches submit pyspark \
    --container-image ${IMAGE} \
    --region ${REGION} \
    --project ${PROJECT} \
    --deps-bucket ${BUCKET} \
    --subnet ${SUBNET} \
    --files config/config.yaml \
    --properties "spark.executor.cores=16" \
    metric-calculation/src/metric_calculation/metrics.py \
    -- \
    metric_calculation.ot_release=${OT_RELEASE}

Updating the Streamlit app

If a recently completed run doesn't show up in the app, click on the “Refresh list of runs” button.