This repo contains analyses for processing nanopore sequencing data of anabaena strains sequencing in the Azolla lab at Utrecht University. Specifically, we're looking for large in/del variants in sequenced strains created by RNA guided transposition.
The anabaena/nostoc reference strain: Nostoc spec pcc7120 Downloaded from: https://www.ncbi.nlm.nih.gov/assembly/GCF_000009705.1/
The analyses documented here include two main approaches. First a denovo approach, assembling the anabaena genomes one by one, and second a reference based approach.
The denovo approach includes:
- de-novo assembly with flye (dir
denovo
) - assembly polishing with medaka (dir
denovo/sample/polished-medaka
) - assembly annotation with both prokka and bakta
- mapping of sample and reference strain reads with minimap2
- locating regions of interest with blat
- visualisation of all generated data with igv
The reference based approach can take various reference-sample combinations and do:
- mapping to a reference with minimap2, then variant calling with medaka
- mapping to a reference with ngmlr, then calling structural variants with sniffles
- extract fasta files with insertion and deletion SVs
- locating regions of interest with blat
- visualisation of all generated data with igv
A mapping table should match up sample names with their appropriate reference like so
#ref samples
WT1 sampleA sampleB sampleC
WT2 sampleX sampleY sampleZ
IGV snapshots are exported as png and svg images. High resolution png images are created by importing svg images in inkscape, then exporting as png again at dpi=1000.