You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello @jonassibbesen,
Just reaching out for a bit of assistance
Out of 33,462 candidate SVs called with Manta, I have successfully genotyped 26,606 SVs. To compare the relative frequency of SV type and size with those identified with other pipelines, I would like to count the number of deletions, duplications, insertions and inversions, and estimate their sizes.
I have tried to add symbolic alleles as per this approach, however only a few hundred deletions and insertions could be identified.
I have also tried to identify SVs that overlap with the candidate SV file before and after running bayestyperTools convertAllele and found that very few sites intersect or overlap (again, all deletions or insertions).
After trolling through the internet I wasn't able to much about converting from the long-sequence SV annotation format to symbolic alleles, which has me thinking that I'm missing something super obvious??? In the original paper for bayesTyper, how did you identify the different SV types for comparisons?
I have thought about splitting the candidate SV calls into different groups then carrying on from the convertAllele step, but I wasn't sure if this would negatively impact the genotype outputs. Especially since looking for an intersection between these type specific converted VCFs did not overlap well with the final output with all SVs run together.
Anyway, really like the tool and super keen for any help or advice you may have!
The text was updated successfully, but these errors were encountered:
janawold1
changed the title
Identifying SV type and
Identifying SV type from output
Feb 3, 2022
Thanks. If you are just interested in summary statistics (type, length, frequency etc.) of the genotyped variants you can use this script: "src/bayesTyperTools/scripts/getSummary.cpp" (binary should be under "bin/scripts" when compiled).
There is also a script for converting sequences to symbolic alleles ("convertSeqToAlleleId.cpp"), however I think the genotypes might be lost when using it. Also, the script only works for DEL, DUP and INV.
Regarding the small overlap between pre and post convertAllele SV sets. I am really surprised about this since convertAllele should not trim the alleles and only moves inversion (by a single base). Can you provide more details on how you did this comparison?
Hello @jonassibbesen,
Just reaching out for a bit of assistance
Out of 33,462 candidate SVs called with Manta, I have successfully genotyped 26,606 SVs. To compare the relative frequency of SV type and size with those identified with other pipelines, I would like to count the number of deletions, duplications, insertions and inversions, and estimate their sizes.
I have tried to add symbolic alleles as per this approach, however only a few hundred deletions and insertions could be identified.
I have also tried to identify SVs that overlap with the candidate SV file before and after running
bayestyperTools convertAllele
and found that very few sites intersect or overlap (again, all deletions or insertions).After trolling through the internet I wasn't able to much about converting from the long-sequence SV annotation format to symbolic alleles, which has me thinking that I'm missing something super obvious??? In the original paper for bayesTyper, how did you identify the different SV types for comparisons?
I have thought about splitting the candidate SV calls into different groups then carrying on from the
convertAllele
step, but I wasn't sure if this would negatively impact the genotype outputs. Especially since looking for an intersection between these type specific converted VCFs did not overlap well with the final output with all SVs run together.Anyway, really like the tool and super keen for any help or advice you may have!
The text was updated successfully, but these errors were encountered: