Skip to content

CentOS7: Conda version with modules

Sander W. van der Laan edited this page Sep 1, 2023 · 1 revision

These instructions are for CentOS7 on a high-performance computer cluster, such as the HPC at the UMC Utrecht. Any given HPC is usually managed and users do not normally have administrator rights. Therefore it is advised to install a distribution of conda. Another advantage is that using a virtual environment within an conda-installment would preserve versions of installed packages.

Step 1: check what is available on the system

Let's first check whether some packages are installed already.

for PACK in curl cvs gcc gcc-c++ gimp git libtool openjpeg perl svn vim wget giflib-devel libjpeg-devel libtiff-devel libpng-devel freetype-devel; do echo "* checking [ "$PACK" ]...."; command -v "$PACK"; echo "---------"; echo ""; done

Make a list of those that aren't installed and add them to your conda. No worries, below in Controlling the virtual environment you'll learn how to install.

Step 2: get conda

We require conda to have full control on the installation of required libraries and packages for slideToolKit, it will work with python 3.7+. If you want to also use this environment for deeplearning with TensorFlow make sure you install the python 3.7 version of conda, TensorFlow is not yet optimized for newer python versions.

Step 2A: update anaconda

Perhaps conda is already installed. Just make sure it is up-to-date.

conda update -n base conda

Step 2B: installation conda

You should install anaconda (or miniconda). We prefer python 3.8, this should be Anaconda3-2021.05-Linux-x86_64, because it has many other packages for 'omics' we tend to use. A much lighter and faster version of conda is released as mamba available here. Finding and updating packages is much faster, yet it has all the features that the 'regular' anaconda or miniconda have. So, it's up to you. Whichever you choose, the same process as described below applies for each type of conda.

wget https://repo.anaconda.com/archive/Anaconda3-2021.05-Linux-x86_64.sh

Next execute the following code.

bash Anaconda3-2021.05-Linux-x86_64.sh

And follow the instructions.

Step 2C: wrap-up installation

If you don't want conda to be loaded on startup, you can execute the following command.

conda config --set auto_activate_base false

And don't forget to cleanup afterwards.

rm -v Anaconda3-2021.05-Linux-x86_64.sh

Step 3: make a module for anaconda

You can create a modulefile to make loading this particular anaconda installment easy and fun.

Create a text file with the following contents.

help(
[[ anaconda(version 3-8.202105) Anaconda, containing CellProfiler 4.1.3
]])

whatis("anaconda(version 3-8.202105) Anaconda, containing CellProfiler 4.1.3")

local version = "3-8.202105"

local base = "/hpc/local/$MY_DISTRO/$MY_GROUP/software/Anaconda3_2021_05"

conflict("anaconda")

prepend_path("PATH", pathJoin(base, "bin"))

Your administrator can tell you precisely where these modulefiles should be stored. In our case, they are in a folder /hpc/local/$MY_DISTRO/dhl_ec/etc/modulefiles/anaconda for our group. Save the text-file as 3-8.2021.05.lua in the modulesfiles-directory.

Also, $MY_DISTRO is the name of your server, e.g. Rocky8 or CentOS7. $MY_GROUP is the name of your group on the server, e.g. gendep, or depcardio.

Restart your shell:

source $HOME/.bashrc
source $HOME/.bash_profile

You can now load your fresh anaconda installation:

module load anaconda/3-8.202105

Step 4: installation slideToolkit

Download and install the latest version of the slideToolKit from GitHub. First create and go to the git directory, then download the slideToolkit.

/hpc/local/$MY_DISTRO/dhl_ec: this is the folder where you are supposed to install software on your system.

mkdir -p /hpc/local/$MY_DISTRO/$MY_GROUP/software/ && cd /hpc/local/$MY_DISTRO/$MY_GROUP/software
if [ -d /hpc/local/$MY_DISTRO/$MY_GROUP/software/slideToolKit/.git ]; then \
		cd /hpc/local/$MY_DISTRO/$MY_GROUP/software/slideToolKit && git pull; \
	else \
		cd /hpc/local/$MY_DISTRO/$MY_GROUP/software/ && git clone https://github.com/swvanderlaan/slideToolKit.git; \
	fi

Add symbolic links in ~/bin/. Now the slideToolkit will be availabe in your PATH. Adding the slideToolkit tools to your PATH makes it easier to acces the slideToolkit commands.

mkdir -p ~/bin/ && ln -s -f -v /hpc/local/$MY_DISTRO/$MY_GROUP/software/slideToolKit/slide* ~/bin/

Step 5: install a virtual environment

Next you should create a virtual environment within which we will install the required packages for slideToolKit and CellProfiler. We created a yml which will setup a virtual environment and installs the required packages through conda and pip3. You can find the yml here: [PATHTO]/slideToolKit/conda_yml/anaconda3_8_2021_05_cp413.v2.yml.

Next enter the following.

conda env create -f anaconda3_8_2021_05_cp413.v2.yml

Controlling the virtual environment

Activating

Activating (or switching between) your virtual environment(s) is easy.

conda activate cp4

This modifies the PATH and shell variables to point your macOS to the specific python set-up you (just) installed. You'll note that the command prompt now indicates which Conda environment you are currently in by prepending (cp4).

Listing available environments

You can also list the available environments. I tend to forget what I installed, so it comes in handy for me 🙈.

conda env list

This results in the following for example:

# conda environments:
#
base                  *  /hpc/local/Rocky8/dhl_ec/software/Anaconda3_2021_05/
cp4                      /hpc/local/Rocky8/dhl_ec/software/Anaconda3_2021_05/envs/cp4
Installing additional python packages

The big advantage of using conda is that you can create a virtual environment that contains a specific set-up of python packages for a specific purpose. Just select your virtual environment -n cp4 and the [package] you wish to install.

conda install -n cp4 [package]

Don't forget to specify the virtual environment, because otherwise the package will be installed in the root python installation. You can use this to install any missing packages that are required, that you discovered above in Step 1.

Deactivating/exiting the virtual environment

To end a virtual environment session, i.e. deactivating or exiting, is easy. This will reset the PATH and shell to the base settings of macOS.

conda deactivate
Deleting a virtual environment

You may want to delete a specific conda environment. You can do this by entering the following.

conda remove -n cp4 -all
Other useful commands

List all the conda environments available:

conda info --envs

Create new environment named as envname.

conda create --name envname

Remove environment and its dependencies.

conda remove --name envname --all

Clone an existing environment.

conda create --name clone_envname --clone envname

Step 6: make a module for slideToolKit

Let's create a modulefile to make loading slideToolKit easy and fun.

Create a text file with the following contents.

help(
[[ slideToolKit(version 1.0) slideToolKit
]])

whatis("slideToolKit(version 1.0) slideToolKit")

local version = "1.0"

local base = "/hpc/local/$MY_DISTRO/$MY_GROUP/software/slideToolKit/" .. version 

conflict("slideToolkit")

prepend_path("PATH", base)

load("gnu-parallel/20170122")
prereq("gnu-parallel/20170122")

load("zlib/1.2.8")
prereq("zlib/1.2.8")

load("imagemagick/6.9.3-10")
prereq("imagemagick/6.9.3-10")

load("openslide/3.4.1")
prereq("openslide/3.4.1")

load("libdmtx/0.7.4")
prereq("libdmtx/0.7.4")

load("dmtx-utils/0.7.4")
prereq("dmtx-utils/0.7.4")

load("graphicsmagick/1.3.26")
prereq("graphicsmagick/1.3.26")

Save the text-file as /hpc/local/Rocky8/dhl_ec/etc/modulefiles/slidetoolkit/version1.0.lua in the modulesfiles-directory.

Restart your shell:

source $HOME/.bashrc
source $HOME/.bash_profile

You can now load your fresh slideToolKit installation:

module load slideToolKit/version1.0

Cleanup, restart & you're done!

Source your bash_profile.

source ~/.bash_profile

And make sure you load the new modules:

module load anaconda/3-8.202105 slideToolKit

Step 7: test the environment

And now you should not have any problem running the following script.

python slideToolKitTest.py

This will calculate your age, just for fun. But the most important is to check whether you have the right versions of openslide, opencv and cellprofiler installed. These should be the following.

Printing the installed versions.
* Python version:  3.8.12 | packaged by conda-forge | (default, Oct 12 2021, 21:50:38)
[Clang 11.1.0 ]
* OpenSlide version:  1.1.2
* OpenSlide library version:  3.4.1
* CellProfiler version:  4.1.3

A note on installing with mamba

Installing through mamba works a bit different - so far I haven't been able to create a proper .yml-file to help with this.

Step 1: get mamba

First, you'll need mamba. Download the right version for you system - here I used MacOSX-x86_64 because we use Rosetta2.

wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-MacOSX-x86_64.sh --no-check-certificate

You might get a message regarding the certificates, hence the addition of --no-check-certificate.

Next, install and following the instructions. Make sure to close and restart your Terminal-app once it's done.

bash Mambaforge-MacOSX-x86_64.sh

Step 2: create the cp4 environment

Now you're ready to create the cp4 environment.

mamba create --name cp4 python=3.8

Step 3: install the dependencies

We need quite a few packages to be able to (automatically) read barcodes on whole-slide images, run analyses, etc. We do this first.

mamba install --channel bioconda pip "numpy==1.*" matplotlib pandas openjdk scikit-learn mahotas gtk2 gtk3 "Jinja2==3.*" "inflect==5.*" "wxpython==4.*" "mysqlclient==1.*" "sentry-sdk==0.18.*" centrosome gensim FuzzyTM xarray python-javabridge bftools cairo freetype gettext giflib imagemagick java-jdk jpeg wmctrl zbar tclap

The " are important here.

Step 4: install cellprofiler and opencv

Finally, we're ready to install cellprofiler and opencv to handle and analyse WSI.

pip install "cellprofiler==4.1.3" arrow pathlib opencv-contrib-python "openslide-python>=1.1.2"

Inspired by

https://www.pyimagesearch.com/2019/01/30/macos-mojave-install-tensorflow-and-keras-for-deep-learning/
https://medium.com/swlh/how-to-setup-your-python-projects-1eb5108086b1
https://towardsdatascience.com/how-to-successfully-install-anaconda-on-a-mac-and-actually-get-it-to-work-53ce18025f97
https://gist.github.com/rxaviers/7360908
https://github.com/CellProfiler/CellProfiler/wiki/Conda-Installation

Clone this wiki locally