-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running BayesTyper on custom VCF file #32
Comments
That is weird. Did you remember to convert the symbolic alleles in the input vcf to sequences before running cluster? You can do this using If that is not the problem would it then be possible to share the output from BayesTyper (both stdout and vcf file)? |
Thanks for your response. I didn't know about the
|
It looks like you have a lot of zero-inflation in your k-mer counts (see the high estimated variance). This also results in a lot of the genotypes being filtered due to a high number of zero count k-mers across the variant (format attribute SAF=2). I am not sure why that is the case. Does the input reads have a high error rate? Or are you using a non-random subset of the reads/kmer? It could also be that you are missing smaller variants around or in the SVs, but I would not expect that to be that severe. You could try to remove the filter using the Also, given the low number of input variant I would recommend running with the option Please let me know if you run into any other problems. |
Come to think about it. Incorrect SV breakpoints could also result in inflated zero count kmers. Do you see that more deletions are filtered compared to insertions? However, that would only explain the filtering and not the really high kmer count variance. |
I'm not doing any filtering on the reads or kmers, however I generally don't expect these breakpoints to be exact to the basepair. I tried rerunning with the above flags and now much more gets genotyped. Around 2800 (1000 deletions, 1800 insertions) are still missing genotypes, but accuracy among those that have been genotyped is somewhat reasonable (~83%). |
BayesTyper does not perform well when the breakpoints are not exact due to its use of exact matching kmers. If you can you share the latest results then I will take a look the remaining filtered variants. |
Thanks for your time! The new results are here. Btw I combined the previous events with the prior events and genotyped the whole set, but now I'm only getting ~1500 of my events in the final genotype set (based on ACO value or |
Thanks, I am happy to help. Would it be possible to also share the log files (stdout) from BayesTyper? |
Here is the stdout for the clustering and genotyping stages:
Thanks again. |
Hi all, I met the same problem that most events are skipped (./.). I do not think we can get the exact breakpoint for most of SVs by only use the short reads sequencing. If this problem can be solved, it will be highly welcome. Sincerely, |
@Parsoa I have looked at the results and most of the remaining filtered genotypes are due to a low genotype posterior probability (GPP). By default BayesTyper filters all genotypes with a posterior below 0.99. You could try to decrease this threshold if you are interested in the remaining genotypes, but be aware that the results will be more uncertain. You can either rerun Regarding the combined set. |
I've been trying to run BayesTyper on a VCF file (uploaded here which is basically the merged set of calls from HG00514 and HG00733 from here with duplicates removed). BayesTyper's clustering puts all the input variants into the Complex category although they are deletions and insertions. The genotyping stage also runs without errors but almost all the events are skipped (./.) and the remaining few are genotyped as '0/0'. Around 60% of the events here should get a 1/0 or 1/1 genotype.
Note that I have not combined the VCF file with BayesTyper's variation priors as I'm not interested in SNVs or any other variants. When I try to do that however, all of my input events are skipped and the combined set includes only events from the prior.
This is the command I use for running BayesTyper:
Am I missing anything here? Thanks.
The text was updated successfully, but these errors were encountered: