Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

38 add data to train model #42

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from
Draft
Changes from 25 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
5e658b8
typos and broken links
thibaultdvx Mar 15, 2024
f0e9f85
more typos
thibaultdvx Mar 18, 2024
5d8bc0b
more typos and layout issues
thibaultdvx Mar 18, 2024
bcc244f
reduce the size of the test set
thibaultdvx Mar 18, 2024
ff06f79
typos
thibaultdvx Mar 18, 2024
b03a884
correction on the number of splits for kfold cross validation
thibaultdvx Mar 18, 2024
8d53534
correction of bug in data leakage check
thibaultdvx Mar 18, 2024
2d2e850
typos
thibaultdvx Mar 18, 2024
26f84cb
typos
thibaultdvx Mar 19, 2024
b251350
remove superfluous line in data leakage check
thibaultdvx Mar 19, 2024
c4fa99c
add missing directory
thibaultdvx Mar 19, 2024
5e6cccc
typos
thibaultdvx Mar 19, 2024
1a2c953
remove outdated comment (output dir is now an argument)
thibaultdvx Mar 19, 2024
e6aadd5
clarification on caps
thibaultdvx Mar 19, 2024
b92cab3
remove mention of prepare-data (not used yet)
thibaultdvx Mar 19, 2024
4bbd91f
Merge remote-tracking branch 'upstream/main'
thibaultdvx Mar 19, 2024
4272913
clarify training section
thibaultdvx Mar 19, 2024
bdf56da
reduce size of an image
thibaultdvx Mar 19, 2024
88447b3
typos and layout
thibaultdvx Mar 25, 2024
a65c8c0
Merge branch 'aramis-lab:main' into main
thibaultdvx Mar 25, 2024
4015394
minor change
thibaultdvx Mar 25, 2024
58964c7
Merge branch 'main' of https://github.com/thibaultdvx/clinicadl_handbook
thibaultdvx Mar 25, 2024
89a1ba1
correction on wrong CAPS folder
thibaultdvx Mar 26, 2024
38808ef
Merge branch 'main' of https://github.com/thibaultdvx/clinicadl_handbook
thibaultdvx Mar 26, 2024
729ef09
add new training dataset
thibaultdvx Mar 26, 2024
0bf0496
minor typo
thibaultdvx Mar 26, 2024
31a8a33
typo
thibaultdvx Mar 27, 2024
6961ade
add some info on multi-CNN
thibaultdvx Mar 27, 2024
e2af673
Merge branch '38-add-data-to-train-model' of https://github.com/thiba…
thibaultdvx Mar 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 42 additions & 44 deletions src/training_classification.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,17 +142,11 @@
# %% [markdown]
# ## Before starting
# If you failed to obtain the preprocessing using the `t1-linear` pipeline,
# please uncomment the next cell. You can extract tensors from this CAPS, but
# for the training part you will need a bigger dataset.
# please uncomment the next cell.
# %%
# !curl -k https://aramislab.paris.inria.fr/clinicadl/files/handbook_2023/data_oasis/CAPS_example.tar.gz -o oasisCaps.tar.gz
# !tar xf oasisCaps.tar.gz

# %% [markdown]
# If you have already downloaded the full dataset and converted it to
# CAPS, you can give the path to the dataset directory by changing
# the CAPS path. If not, just run it as written but the results will
# not be relevant.
# %% [markdown]
# To perform the feature extraction for our dataset, run the following cell:
# %%
Expand Down Expand Up @@ -198,6 +192,17 @@
# based on the labels and splits obtained in the [previous section](./label_extraction.ipynb).
# Of course, you can use another dataset, on which you will also have to perform
# labels extraction and data splitting.
#
# The purpose of this notebook is not to fully train a network, but rather to understand
# how ClinicaDL works. Therefore, we will keep working with a subset of OASIS-1. This new
# subset contains 10 T1w images, pre-processed with the pipeline `t1-linear`of Clinia. The
thibaultdvx marked this conversation as resolved.
Show resolved Hide resolved
# `prepare-data` pipeline has already been performed on the dataset.
#
# You can remove your old `data_oasis` folder and download the new one:

# %%
# !curl -k https://aramislab.paris.inria.fr/clinicadl/files/handbook_2023/data_oasis/CAPS_example_train.tar.gz -o oasisCaps.tar.gz
# !tar xf oasisCaps.tar.gz

# %% [markdown]
# ## `train classification`
Expand Down Expand Up @@ -234,9 +239,9 @@
# %% [markdown]
# ### Prerequisites
#
# You need to execute `clinicadl tsvtools get-labels` and `clinicadl tsvtools
# {split|kfold}` commands prior to running this task to have the correct TSV file
# organization. Moreover, there should be a CAPS, obtained running the
# If you use your own dataset, you need to execute `clinicadl tsvtools get-labels`
# and `clinicadl tsvtools {split|kfold}` commands prior to running this task to have
# the correct TSV file organization. Moreover, there should be a CAPS, obtained running the
# preprocessing pipeline wanted.
# %% [markdown]
# ### Running the task
Expand All @@ -252,7 +257,7 @@
# [CAPS](https://aramislab.paris.inria.fr/clinica/docs/public/latest/CAPS/Introduction/)
# hierarchy. In case of multi-cohort training, must be a path to a TSV file.
# - `PREPROCESSING_JSON` (str) is the name of the preprocessing json file stored
# in the `CAPS_DIRECTORY` that corresponds to the `clinicadl extract` output.
# in the `CAPS_DIRECTORY` that corresponds to the `clinicadl prepare-data` output.
# This will be used to load the correct tensor inputs with the wanted
# preprocessing.
# - `TSV_DIRECTORY` (Path) is the input folder of a TSV file tree generated by
Expand Down Expand Up @@ -299,21 +304,26 @@
# The default label for the classification task is `diagnosis` but as long as it
# is a categorical variable, it can be of any type.
# %% [markdown]
# The next cell train a `resnet18` to classify 2D slices of t1-linear MRI by
# The next cells train `resnet18` networks to classify 2D slices of t1-linear MRI by
# diagnosis (AD or CN).
# Please note that the purpose of this notebook is not to fully train a network
# because we don't have enough data. The objective is to understand how ClinicaDL
# works and make inferences using pretrained models in the next section.
# Please note once again that we don't expect any interesting results with a
# network trained on only 10 MRI images. That's why we will train the networks for
# only few epochs.
#
# Let's first train a **single-CNN** on all slice locations (actually 4 networks are
# trained, one for each split):


# %%
# 2D-slice single-CNN training
#!clinicadl train classification -h
!clinicadl train classification data_oasis/CAPS_example slice_classification_t1 data_oasis/split/4_fold/ data_oasis/maps_classification_2D_slice_resnet18 --n_splits 4 --architecture resnet18

!clinicadl train classification data_oasis/CAPS_example slice_classification_t1 data_oasis/split/4_fold/ data_oasis/maps_classification_2D_slice_resnet18 --n_splits 4 --architecture resnet18 --batch_size 8 --epochs 5
# %% [markdown]
# Then, let's train a **multi-CNN** (i.e.one CNN is trained per slice location).
# 168 models will be trained, so this command may take a while. If you don't want
# to run it, the results can be downloaded a few lines further on.
# %%
# 2D-slice multi-CNN training
!clinicadl train classification data_oasis/CAPS_example slice_classification_t1 data_oasis/split/4_fold/ data_oasis/maps_classification_2D_slice_multi --n_splits 4 --architecture resnet18 --multi_network
!clinicadl train classification data_oasis/CAPS_example slice_classification_t1 data_oasis/split/4_fold/ data_oasis/maps_classification_2D_slice_multi --n_splits 4 --split 0 --architecture resnet18 --batch_size 2 --epochs 1 --multi_network

# %% [markdown]
# The `clinicadl train` command outputs a MAPS structure in which there are only
Expand Down Expand Up @@ -363,12 +373,12 @@
#```

# You can find more information about MAPS structure on our
# [documentation](https://clinicadl.readthedocs.io/en/latest/Introduction/#maps-definition)
# [documentation](https://clinicadl.readthedocs.io/en/latest/Introduction/#maps-definition).

# %% [markdown]
# # Inference using pretrained models
#
# (If you failed to train the model please uncomment the next cell)
# If you failed to train the model please uncomment the next cells:
# %%
!curl -k https://aramislab.paris.inria.fr/clinicadl/files/handbook_2023/data_oasis/maps_classification_2D_slice_multi.tar.gz -o maps_classification_2D_slice_multi.tar.gz
!tar xf maps_classification_2D_slice_multi.tar.gz
Expand All @@ -377,23 +387,12 @@
!curl -k https://aramislab.paris.inria.fr/clinicadl/files/handbook_2023/data_oasis/maps_classification_2D_slice_resnet.tar.gz -o maps_classification_2D_slice_resnet.tar.gz
!tar xf maps_classification_2D_slice_resnet.tar.gz

# %% [markdown]
# If you failed to train the model, you also need to download the TSV files with
# the list of participants for each split used for the training because `clinicadl
# tsvtools split` and `clinicadl tsvtools kfold` commands randomly split data so
# you can have data leakage error (see previous [notebook](notebooks/labels_extraction.ipynb)
# for more information about data leakage).

# %%
!curl -k https://aramislab.paris.inria.fr/clinicadl/files/handbook_2023/data_oasis/split.tar.gz -o training_split.tar.gz
!tar xf training_split.tar.gz

# %% [markdown]
# The `predict` functionality performs individual prediction and metrics
# computation on a set of data using models trained with `clinicadl train` or
# `clinicadl random-search` tasks.
# It can also use any pretrained models if they are structured like a
# [MAPS](https://clinicadl.readthedocs.io/en/latest/Introduction/#maps-definition)
# [MAPS](https://clinicadl.readthedocs.io/en/latest/Introduction/#maps-definition).

# %% [markdown]
# ### Running the task
Expand All @@ -403,33 +402,32 @@
# clinicadl predict [OPTIONS] INPUT_MAPS_DIRECTORY DATA_GROUP
#```
# where:
# - INPUT_MAPS_DIRECTORY (Path) is the path to the MAPS of the pretrained model.
# - DATA_GROUP (str) is the name of the data group used for the prediction.
# - `INPUT_MAPS_DIRECTORY` (Path) is the path to the MAPS of the pretrained model.
# - `DATA_GROUP` (str) is the name of the data group used for the prediction.

# ```{warning}
# For ClinicaDL, a data group is linked to a list of participants / sessions and
# a CAPS directory. When performing a prediction, interpretation or tensor
# serialization the user must give a data group. If this data group does not
# exist, the user MUST give a caps_directory and a participants_tsv. If this
# data group already exists, the user MUST not give any caps_directory or
# participants_tsv, or set overwrite to True.
# serialization, the user must give a data group. If this data group does not
# exist (in the MAPS), the user MUST give a `caps_directory` and a `participants_tsv`. If this
# data group already exists, the user MUST not give any `caps_directory` or
# `participants_tsv`, or set overwrite to True.
# ```

# If you want to add optional argument you can check the
# [documentation](https://clinicadl.readthedocs.io/en/latest/Predict/).

# %%
# !clinicadl predict -h
!clinicadl predict data_oasis/maps_classification_2D_slice_resnet18 'test-Oasis2' --participants_tsv ./data_oasis/split/test_baseline.tsv --caps_directory data_oasis/CAPS_example
!clinicadl predict data_oasis/maps_classification_2D_slice_resnet18 'test-Oasis' --participants_tsv ./data_oasis/split/test_baseline.tsv --caps_directory data_oasis/CAPS_example

# %%
!clinicadl predict data_oasis/maps_classification_2D_slice_multi 'test-Oasis' --participants_tsv ./data_oasis/split/test_baseline.tsv --caps_directory data_oasis/CAPS_example

# %% [markdown]
# Results are stored in the MAPS of path `model_path`, according to the
# Results are stored in the MAPS, according to the
# following file system:
# ```text
# model_path>
# <maps_directory>
# ├── split-0
# ├── ...
# └── split-<i>
Expand All @@ -447,6 +445,6 @@
# running the next cell:
# %%
import pandas as pd
metrics = pd.read_csv("data_oasis/maps_classification_2D_slice_resnet18/split-0/best-loss/test-Oasis/test-OASIS_slice_level_metrics.tsv", sep="\t")
metrics = pd.read_csv("data_oasis/maps_classification_2D_slice_resnet18/split-0/best-loss/test-Oasis/test-Oasis_slice_level_metrics.tsv", sep="\t")
metrics.head()
# %%