Likelihood of Receiving a Treatment, across Race-Ethnicity for Septic Patients in the ICU
The goal of this project is to investigate disparities between races in critically ill sepsis patients in regard to likelihood of receiving one of three life-sustaining treatmens, i.e. renal replacement therapy (RRT), vasopressor use (VP), or mechanical ventilation (MV) in cohorts curated from MIMIC IV (2008-2019).
Run the following command in your terminal.
git clone https://github.com/joamats/mit-sepsis-tx.git
Run the following command:
pip install -r src/setup/requirements.txt
MIMIC data can be found in PhysioNet, a repository of freely-available medical research data, managed by the MIT Laboratory for Computational Physiology. Due to its sensitive nature, credentialing is required to access both datasets.
Documentation for MIMIC-IV's can be found here.
In this section, we explain how to set up GCP and your environment in order to run SQL queries through GCP right from your local Python setting. Follow these steps:
- Create a Google account if you don't have one and go to Google Cloud Platform
- Enable the BigQuery API
- Create a Service Account, where you can download your JSON keys
- Place your JSON keys in the parent folder (for example) of your project
- Create a .env file with the command
cp env.example env
- Update your .env file with your JSON keys path and the id of your project in BigQuery
After getting credentialing at PhysioNet, you must sign the data use agreement and connect the database with GCP, either asking for permission or uploading the data to your project.
Having all the necessary tables for the cohort generation query in your project (you have to run all the auxillary queries manually on BigQuery), run the following command to fetch the data as a dataframe that will be saved as CSV in your local project. Make sure you have all required files and folders.
python3 src/py_scripts/get_data.py --sql "src/sql_queries/main.sql" --destination "data/MIMIC_data.csv"
And transform into a ready to use dataframe by running all scripts in 2_preprocessing sequentially.
The ICD-9 to ICD-10 translation based on this GitHub Repo.
Run the scripts in 3_models.