-
Notifications
You must be signed in to change notification settings - Fork 0
How to start
Daniel Fischer edited this page Feb 8, 2023
·
2 revisions
Download the pipeline
First, prepare a new folder for the project. It is easiest to do it by creating the folder where the RAW datafiles will be stored, as it creates that way the entire path
mkdir -p /path/to/your/project/FASTQ/RAW
and then copy the required files into the project folder
cd /path/to/your/project
cp ~/git/Snakebite-RNAseq/*.yaml .
cp ~/git/Snakebite-RNAseq/*.sh .
Next, we create the sample sheet. That can be done in the most cases mechanically like this, depending on the file names obtained from your sequencer, you might need to adjust the script slightly
# Create the column files
ls FASTQ/RAW | awk -F'_R' '{ print $1 }'| sort | uniq > rawsample
awk -F'_S[0-9]' '{ print $1 }' rawsample > sample_name
awk -F'_L' '{ print "L"$2 }' rawsample > lane
# Paste the proper file endings to the rawsamples
sed -e 's/$/_R1_001.fastq.gz/' rawsample > read1
sed -e 's/$/_R2_001.fastq.gz/' rawsample > read2
# Combine the files
paste -d'\t' rawsample sample_name lane read1 read2 > samplesheet.tsv
# Add the header line
sed -i '1s/^/rawsample\tsample_name\tlane\tread1\tread2\n/' samplesheet.tsv
# Remove the temporary files
rm rawsample sample_name lane read1 read2
Then fill the information in the two yaml files and thats it
(more details on that will come...)