Skip to content

How to start

Daniel Fischer edited this page Feb 8, 2023 · 2 revisions

Preparations

Download the pipeline

Setting up the project folder

First, prepare a new folder for the project. It is easiest to do it by creating the folder where the RAW datafiles will be stored, as it creates that way the entire path

mkdir -p /path/to/your/project/FASTQ/RAW

and then copy the required files into the project folder

cd /path/to/your/project
cp ~/git/Snakebite-RNAseq/*.yaml .
cp ~/git/Snakebite-RNAseq/*.sh .

Samplesheet

Next, we create the sample sheet. That can be done in the most cases mechanically like this, depending on the file names obtained from your sequencer, you might need to adjust the script slightly

# Create the column files
  ls FASTQ/RAW | awk -F'_R' '{ print $1 }'| sort | uniq > rawsample
  awk -F'_S[0-9]' '{ print $1 }' rawsample > sample_name
  awk -F'_L' '{ print "L"$2 }' rawsample > lane
# Paste the proper file endings to the rawsamples
  sed -e 's/$/_R1_001.fastq.gz/' rawsample > read1
  sed -e 's/$/_R2_001.fastq.gz/' rawsample > read2
# Combine the files
  paste -d'\t' rawsample sample_name lane read1 read2 > samplesheet.tsv
# Add the header line
  sed -i '1s/^/rawsample\tsample_name\tlane\tread1\tread2\n/' samplesheet.tsv
# Remove the temporary files
  rm rawsample sample_name lane read1 read2

Other files

Then fill the information in the two yaml files and thats it

(more details on that will come...)

Clone this wiki locally