-
Notifications
You must be signed in to change notification settings - Fork 1
Introduction
The slideToolKit is a collection of open-source scripts to handle each step from digital whole-slide images (WSI) to the storage of your results. A common slideToolKit workflow consists of four consecutive steps.
In the first step, acquisition, whole slide images are collected and converted to TIFF files. In the second step, preparation, all the required files are created and organized. The third step, tiles, creates multiple manageable tiles to count. The fourth step, analysis, is the actual tissue analysis and saves the results in a meaningful data set.
A set of tools is designed for each step. Instructions on how to use each tool can be found running the --help
flag (e.g. slideConvert --help
).
Here you can find a graphical workflow for the slideToolkit.
Most slide scanners are, in addition to their own proprietary format, capable of storing the digital slides in pyramid TIFF files. The slideToolkit uses the Bio-Formats library to convert other microscopy formats (Bio-Formats supports over 120 different file formats, openmicroscopy.org) into the compatible pyramid TIFF format if needed. TIFF is a tag-based file format for raster images. A TIFF file can hold multiple images in a single file, this is known as a multi-layered TIFF. The term "Pyramid TIFF" is used to describe a multi-layered TIFF file that wraps a sequence of raster images that each represents the same image at increasing resolutions. The different layers contain, among others, the slide label and multiple enlargements of the tissue on the slide.
Some slides do not have the proper filenames. Sometimes you want the filename to be exactly like the content of the barcode, sometimes you want all your slides to start with the project name (e.g. MyProject.original.slide.name.TIF
). slideRename
makes it easy to rename multiple slides at once.
To read whole slide images, the open-source libTIFF
libraries and the OpenSlide
libraries are used. These libraries are also applied to extract metadata (e.g. scan time, magnification and image compression) of the scanned slides. Descriptive information about the slide is stored as metadata and contains, for example, pixels per micrometer, presence of different layers, and scan date.
For image processing we use ImageMagick in bash
-versions of some scripts, and OpenSlide
and OpenCV
in python
-versions. ImageMagick is a command-line image manipulation tool that is fast, highly adjustable and capable of handling big pyramid TIFF files. Generally we recommend to use the python
-version as this offers more speed than bash
-versions, and more flexibility in terms of reading image-formats through the OpenSlide
and OpenCV
libraries.
The tools designed for step 1:
-
slideConvert
- Convert different file types of WSI to TIFF format.
-
slideDirectory
- Creates staging directories to process images.
-
slideDirectory
- Creates staging directories to process images.
-
slideInfo
- Fetch slide metadata (resolution, dates, magnification, et cetera).
-
slideLookup
- Lookup a list of virtual slides.
-
slideRename
- Rename virtual slides, this methods supports auto-renaming using barcodes.
In the following steps multiple output files for each slide are created. For each digital slide, a staging directory is constructed in which the slide, and all output data concerning the slide are stored.
Thumbnails contain a photo of the whole slide, including the label. This makes it easy to identify your slides.
In digital image manipulation, a mask defines what part of the image will be analyzed and what part will be hidden. Usually a mask can be defined as black (hidden) or white (not hidden). The slideToolKit creates a mask and a miniature version of the whole slide image using convert
(from the ImageMagick
library). To create the masks the image is blurred, this will remove dust and speckles. Now, the white background is identified using a fuzzy, non-stringent selection and then background is replaced with black. Settings for blur and fuzziness can be found and changed in the slideMask tool. Generated masks can be adjusted manually in an image editor of choice (such as the freely available GNU Image Manipulation Program; GIMP
). Sometimes this is necessary to remove unwanted areas on the whole slide image (like marker stripes or air bubbles under the coverslip).
The tools designed for step 2:
-
slideDirectory
- Create a staging directory per slide.
-
slideExtract
- Extract a slide thumbnail, including label, and scaled macro version of the WSI (in
.png
-format).
- Extract a slide thumbnail, including label, and scaled macro version of the WSI (in
-
slideMacro
- Create a scaled macro version from a slide (in
.png
-format).
- Create a scaled macro version from a slide (in
-
slideNormalize
- Create a normalized version from a macro version of a given slide (in
.png
-format).
- Create a normalized version from a macro version of a given slide (in
-
slideThumb
- Create slide thumbnail, including label from a slide (in
.png
-format).
- Create slide thumbnail, including label from a slide (in
As sometimes the former slideMask
was unable to make proper masks, especially when the contrast between tissue and background is very low, we created slideEntropyMasker
(python
version) and slideEMask
(C++ program).
The following (legacy) masking tools are available:
-
slideEMask
- Create a scaled masked macro version from a slide (in
.png
-format) using image entropy.
- Create a scaled masked macro version from a slide (in
-
slideEntropyMasker
- Create a scaled masked macro version from a slide (in
.png
-format) using image entropy.
- Create a scaled masked macro version from a slide (in
-
slideMask
- Create a scaled masked macro version from a slide (in
.png
-format) usingImageMagick
.
- Create a scaled masked macro version from a slide (in
Image analysis of memory intensive, whole 20x representations of the digitized slides is currently impossible due to hardware and software limitations. The goal of this step is to create multiple smaller images (i.e. 'tiles') from the 20x magnified WSI. An upscaled version of the mask is placed over the 20x WSI (in our example this is layer 3 of the multi layered TIFF). Image manipulation on 20x sized WSI requires large amounts of computer RAM. To make it possible for computers without sufficient RAM to handle these files, the slideToolKit uses a memory-mapped disk file of the program memory. Using disk mapped memory files (ImageMagick
.mpc
-files), the slideToolKit can efficiently extract all tiles. Without a mask, a faster and more memory efficient method is used using the openslide
library.
The tools designed for step 3:
-
slide2Tiles
- Cut virtual slide into tiles (
bash
version).
- Cut virtual slide into tiles (
At this step, multiple tiles containing tissue data have been made. And now the different objects in this tissue can be identified. Although you can use any image analysis program from now on, we prefer CellProfiler
. CellProfiler is designed to quantitatively measure phenotypes from thousands of images automatically without training in computer vision or programming. CellProfiler
can run using a graphical user interface (GUI) or a command-line interface (CLI). Using the CellProfiler
’s GUI, different algorithms for image analysis are available as individual modules that can be modified and placed in sequential order to form a pipeline. Such a pipeline can be used to identify and measure biological objects and features in images. Pipelines can be stored and reused in future projects. The CLI can be used to run the pipeline for actual image analysis.
An illustrated example on how to create pipelines in CellProfiler
is described by Vokes and Carpenter in their manuscript "Using CellProfiler
for Automatic Identification and Measurement of Biological Objects in Images".
CellProfiler
is able to output its measurements in .gct
and/or .csv
-format. The .csv
files are commenly used data files and can be imported in nearly every statistical program.
The tools designed for step 4:
-
slideAppend
- Appends the output from
CellProfiler
, which is per tile, into one.csv
file. Also possible to use the.gct
-format withslideAppendGCT.sh
- Appends the output from
-
slideJobChecker
- Checks the output of the given step from the slideToolKit.
CellProfiler
is also able to output .sql
-files, where the .sql
-file contains the structure of the data and the .csv
-file contains the actual measurements. With the (no longer supported) legacy slideSQLheader
you can extract the SQL
structure and add it as a header row to the .csv
-file.
The slideToolKit contains a collection of script meant to be used manually and separately as needed, but it can also be used as an all-in-one workflow. For this purpose the above steps are capture in a few separate bash
-scripts which can be called locally with slideQuantifyOSX
on macOS, or on a SLURM-based LINUX-server with slideQuantify
.
-
slideQuantifyOSX
orslideQuantify
- Main script to run the slideToolKit on a given collection of WSI from steps 1 through 4. Run locally (
slideQuantifyOSX
) or on a SLURM-based LINUX server (slideQuantify
). Executes the following sequentially:-
slideQuantify_1_expresshist_mask.sh
/slideQuantify_mask.sh
- Creates thumbnails and scaled (masked) macro versions of a given set of images.
-
slideQuantify_2_expresshist_tile.sh
/slideQuantify_tiling.sh
- Create image tiles from the macro-version while masking non-tissue areas using the masked images.
-
slideQuantify_3_tile_normalizing.sh
/slideQuantify_normalizing.sh
- Normalizes tiled images.
-
slideQuantify_4_cellprofiler.sh
/slideQuantify_cellprofiler.sh
- Run a
CellProfiler
pipeline of a given set of tiled, masked, and normalized images.
- Run a
-
slideQuantify_5_wrapup.sh
/slideQuantify_wrapup.sh
- Wraps up the results and produces the final dataset in
.csv
format.
- Wraps up the results and produces the final dataset in
-
- Main script to run the slideToolKit on a given collection of WSI from steps 1 through 4. Run locally (
Licence. The MIT License (MIT): http://opensource.org/licenses/MIT.
Copyright (c) 2014-2024, Bas G.L. Nelissen & Sander W. van der Laan, UMC Utrecht, Utrecht, the Netherlands.
Introduction
General instructions
slide2Tiles
slideAppend.sh
slideAppendGCT.sh
slideConvert
slideDirectory
slideDupIdentify.py
slideEMask
slideEntropySegmentation.py
slideExtract.py
slideExtractTiles.py
slideInfo
slideInfo.py
slideJobChecker
slideLookup
slideMacro
slideMacro.py
slideMask
slideMoveNewWSI.py
slideNormalize
slideRename
slideRename.py
slideThumb
slideThumb.py
slideQuantify_v1
slideQuantify_v1_1_expresshist_mask.sh
slideQuantify_v1_2_expresshist_tile.sh
slideQuantify_v1_3_tile_normalizing.sh
slideQuantify_v1_4_cellprofiler.sh
slideQuantify_v1_5_wrapup.sh
slideQuantify_v2
slideQuantify_v2_1_entropy_segmentation.sh
slideQuantify_v2_2_extract_tiles.sh
slideQuantify_v2_3_tile_normalizing.sh
slideQuantify_v2_4_cellprofiler.sh
slideQuantify_v2_5_wrapup.sh
slideQuantifyOSX
slideQuantify_cellprofiler.sh
slideQuantify_mask.sh
slideQuantify_normalizing.sh
slideQuantify_tiling.sh
slideQuantify_wrapup.sh
Conda version (default/preferred)
Homebrew version
Rocky 8 Conda version (default/preferred)
Ubuntu 16.04 LTS
Ubuntu 12.04
CentOS7 Conda version with modules
Administrator version