You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our workflow runs bbmap first to deduplicate sequences. This creates deduplicated fasta with sequence header names like dup1>dup2.
These sequences are then cluster by mmseqs2. mmseqs createtsv makes a 2 column file - first column rep seq, second column seq (one row for each seq).
This script takes in as argument the output of mmseqs createtsv and returns a file in the same format with the deduped sequences each on their own line.