Skip to content

Latest commit

 

History

History
 
 

mimic-iv

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

MIMIC-IV Concepts

  • buildmimic - Scripts to build MIMIC-IV in various relational database management system (RDMS), in particular postgres is a popular open source option
  • concepts - SQL scripts to extract data from MIMIC-IV including demographics, organ failure scores, severity of illness scores, durations of treatment, and so on. These concepts are written in the BigQuery dialect.
  • notebooks - Jupyter notebooks to demonstrate how to use the data in MIMIC-IV
  • mapping - Mapping concepts within MIMIC-IV to standard ontologies

Concepts

The MIMIC-IV concepts are written in an SQL syntax compatible with BigQuery. The BigQuery physionet-data.mimic_derived dataset contains the output of the SQL scripts present in the concepts folder. These tables are generated using the code in the latest release on GitHub. Access to this dataset is available to MIMIC-IV approved users: see the cloud instructions.

Generating the concepts

If you just want to use the data generated by the concepts scripts, you can access each table as physionet-data.mimiciv_derived.* on BigQuery. See the cloud instructions for access details.

These concepts assume the output schema is mimiciv_derived. If you would like a different schema, you will need to make a few edits to the scripts.

All concepts are originally written in the BigQuery Standard SQL Dialect. A Python package is used to convert these BigQuery scripts into other dialects such as PostgreSQL. These scripts have been converted to PostgreSQL by a script. To generate the concepts in PostgreSQL, see the MIMIC-IV postgresql concepts subfolder. See below for how scripts in non-bigquery dialects were generated.

BigQuery

Generating the concepts requires the Google Cloud SDK to be installed. A shell script, make_concepts.sh, is provided which iterates over each folder and creates a table with the same name as the concept file. Concept names have been chosen to avoid collisions.

Generating a single concept can be done by calling the Google Cloud SDK as follows:

bq query --use_legacy_sql=False --replace --destination_table=my_bigquery_dataset.age < demographics/age.sql

PostgreSQL

The postgres folder contains concepts in a PostgreSQL compatible dialect.

DuckDB

The duckdb folder contains concepts in a DuckDB compatible dialect.

Transpile

The Python package sqlglot is used to convert from concepts in BigQuery syntax to the other syntaxes ("transpile"). This package parses the SQL into an abstract syntax tree (AST), after which it can be re-written into a specific dialect. Not all functions are supported by sqlglot, so a helper package was written which adds support for the missing functions. Most of this process is done the transpile.py file.

An entrypoint is provided for convenience. To transpile a single file, run:

# convert_file <source_file> <destination_file> --destination_dialect <dialect>
mimic_utils convert_file mimic-iv/concepts/demographics/age.sql age.sql --destination_dialect duckdb

To transpile all files in a folder, run:

# convert_folder <source_folder> <destination_folder> --destination_dialect <dialect>
mimic_utils convert_folder mimic-iv/concepts mimic-iv/concepts_duckdb --destination_dialect duckdb