Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about generating CNV Input file #141

Open
Inceid opened this issue Feb 23, 2023 · 0 comments
Open

Questions about generating CNV Input file #141

Inceid opened this issue Feb 23, 2023 · 0 comments

Comments

@Inceid
Copy link

Inceid commented Feb 23, 2023

Hello PhyloWGS Devs!

I'm currently trying to run PhyloWGS with copy number variation (CNV) data obtained from whole-exome sequencing. I used a different tool than Battenburg / TITAN to call CNVs and am trying to convert the CNV calls into a format similar to the provided cnv_data.txt.

However, I am having trouble understanding how to calculate the number of reference reads a covering a given CNV.

  • Our copy number calls give us the integer copy numbers of each allele and the prevalence of the CNV, e.g. (2,1) with prevalence 0.25. We also have reference and variant read counts for each SSM.
  • How would I calculate a from the above information? My default presumption is to multiply the CNV prevalence by total read count, but I was wondering if you had a different recommendation.

Additionally, I wanted to clarify my understanding of the example cnv_data.txt input file provided:

cnv	a	d	ssms	physical_cnvs
c0	66023,50883,62757,36056,58777	126755,100469,121941,71263,115417	s2,1,2;s4,0,1	chrom=1,start=1234,end=5678,major_cn=2,minor_cn=1,cell_prev=0.8;chrom=X,start=15,end=10000,major_cn=2,minor_cn=0,cell_prev=0.8;chrom=22,start=123,end=456,major_cn=1,minor_cn=0,cell_prev=0.8

This example shows that SSMs s2 and s4 overlap with CNV c0.

  • However, they appear to harbor different major/minor copy numbers. s2 harbors (1,2) whereas s4 harbors (0,1).
  • How can the same CNV have two different copy number states in the input?
  • Furthermore, why does the same CNV have different states in the last column? I see copy number states (2,1), (2,0), and (1,0) in the last column, but I thought a single CNV should have a single copy number state.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant