Missing cnv's in cnv_data.txt #134

ardydavari · 2021-03-02T00:30:50Z

I have been having an issue where the number of cnvs that are present in cnv_data.txt are much smaller than the number created by parse_cnvs.py

When I run parse_cnv.py on my tumor sample I get approximately 534 lines with many regions that have nondiploid copy numbers.

biopsy_cnvs.txt

chromosome      start   end     copy_number     minor_cn        major_cn        cellular_prevalence
1       770502  7955500 4       0       4       0.76228
1       7969259 19193393        5       1       4       0.76228
1       19199091        37788546        4       0       4       0.76228
1       37848458        40188029        5       1       4       0.76228
1       40255487        47175369        4       0       4       0.76228
1       47182705        47737136        4       0       4       0.0843364667695
1       47182705        47737136        5       1       4       0.677943533231
1       47738318        48888191        5       1       4       0.436941290323
1       47738318        48888191        6       2       4       0.325338709677
1       48888622        79171641        4       0       4       0.76228
1       79196750        79316969        6       2       4       0.76228
1       79336373        83974480        4       0       4       0.76228
1       84227477        84735545        6       2       4       0.76228
1       84737421        85233983        4       0       4       0.76228

After running create_phylowgs_inputs.py (command below). I get only 16 variants in my final file

create_phylowgs_inputs.py \
-s 5000 \
--cnvs biopsy=/data/biopsy_cnvs.txt \
--vcf-type biopsy=sanger \
biopsy=/data/biopsy.muts.vcf \

cnvs_data.txt

cnv     a       d       ssms    physical_cnvs
c0      135530  219000  s1699,3,3;s1700,3,3;s1701,3,3;s1702,3,3;s1703,3,3;s1704,3,3;s1705,3,3;s1706,3,3;s1707,3,3;s1708,3,3;s1709,3,3;s1710,3,3;s1711,0,3;s
c1      16253   25479   s1655,0,1;s1656,0,1     chrom=18,start=4518557,end=5017167,major_cn=1,minor_cn=0,cell_prev=0.724137931034
c2      120207  186989  s996,0,1        chrom=8,start=16302361,end=19961643,major_cn=1,minor_cn=0,cell_prev=0.714285714286
c3      55739   84363   s997,0,1        chrom=8,start=22409353,end=24060296,major_cn=1,minor_cn=0,cell_prev=0.678571428571
c4      151214  219000  s1527,1,2;s1528,1,2;s1529,1,2;s1530,1,2;s1531,1,2;s1532,1,2;s1533,1,2;s1534,1,2;s1535,1,2;s1536,1,2;s1537,1,2;s1538,1,2;s1539,1,2;s
c5      74672   92697           chrom=8,start=25388663,end=27202687,major_cn=2,minor_cn=1,cell_prev=0.388888888889
c6      40686   49728   s981,1,2        chrom=8,start=1511992,end=2485144,major_cn=2,minor_cn=1,cell_prev=0.363636363636
c7      2174    2652            chrom=8,start=2500261,end=2552162,major_cn=1,minor_cn=0,cell_prev=0.360396039604
c8      2390    2900            chrom=18,start=5018374,end=5075134,major_cn=2,minor_cn=1,cell_prev=0.351635514019
c9      37378   45136   s1625,1,2;s1626,1,2     chrom=17,start=21854462,end=22737746,major_cn=2,minor_cn=1,cell_prev=0.34375
c10     194113  219000  s259,1,2;s260,1,2;s261,1,2;s262,1,2;s263,1,2;s264,1,2;s265,1,2;s266,1,2;s267,1,2;s268,1,2;s269,1,2;s270,1,2;s271,1,2;s272,1,2;s273,
c11     195195  219000  s1263,1,2;s1264,1,2;s1265,1,2;s1266,1,2;s1267,1,2;s1268,1,2;s1269,1,2;s1270,1,2;s1271,1,2;s1272,1,2;s1273,1,2;s1274,1,2;s1275,1,2;s
c12     16957   18743   s1208,0,1       chrom=10,start=46165506,end=46532287,major_cn=1,minor_cn=0,cell_prev=0.190476190476
c13     168974  173802          chrom=8,start=27206516,end=30607739,major_cn=2,minor_cn=1,cell_prev=0.0555555555556
c14     40427   41520           chrom=8,start=13435023,end=14247547,major_cn=2,minor_cn=1,cell_prev=0.0526315789474
c15     19905   20430           chrom=8,start=212218,end=612014,major_cn=2,minor_cn=1,cell_prev=0.0513149454779
c16     39563   40578           chrom=8,start=9695182,end=10489271,major_cn=1,minor_cn=0,cell_prev=0.05

What happened to the other copy number variants?

The text was updated successfully, but these errors were encountered:

shaghayeghsoudi · 2021-03-02T16:51:00Z

May I ask how did you run parse_cnvs.py? did you have Battenberg cnvs? Did you do any specific filtering before running parse_cnvs.py on your CNV data?
I am getting weird message

python ./parse_cnvs.py -f battenberg -c 0.27 data.test.battenberg.txt

error:
File "./parse_cnvs.py", line 195, in 
main()
File "./parse_cnvs.py", line 191, in main
regions = parser.parse()
File "./parse_cnvs.py", line 111, in parse
end = int(fields[3 + self._field_offset])
ValueError: invalid literal for int() with base 10: '0.610923189999321'

Wonder if you run it in a different way? I appreciate your answer

ardydavari · 2021-03-03T00:24:58Z

I noticed that on lines 409-423 the CNVs are separated into two types.

I noticed that the ones in the in the else block are being included in the cnv_data.txt. But I still don't understand what's happening to the other ones. Are they being treated like SNVs?

ardydavari · 2021-03-03T00:53:02Z

I also took a closer look at my cnv_data.txt file, it seems like many of the missing cnvs are present under row c0 and column physical_cnvs. However none of the other rows have multiple cnvs associated with them.

ardydavari · 2021-03-04T22:52:35Z

I found that if i change line 726 from return None to continue, the missing CNVs are now showing up.

phylowgs/parser/create_phylowgs_inputs.py

Lines 711 to 726 in 2205be1

    
           for sampidx, cell_prev, major, minor in zip(cnv['sampidx'], cnv['cell_prev'], cnv['major_cn'], cnv['minor_cn']): 
        
             # Region may be (clonal or subclonal) normal in a sample, so ignore such records. 
        
             if self._is_region_normal_cn(chrom, major, minor): 
        
               continue 
        
             # Either we haven't observed an abnormal CN state in this region before, 
        
             # or the observed abnormal state matches what we've already seen. 
        
             if abnormal_state is None or abnormal_state == (major, minor): 
        
               abnormal_state = (major, minor) 
        
               filtered.append({'sampidx': sampidx, 'cell_prev': cell_prev, 'major_cn': major, 'minor_cn': minor}) 
        
               continue 
        
             # The abnormal state (i.e., major & minor alleles) is *different* from 
        
             # what we've seen before. The PWGS model doesn't currently account for 
        
             # such cases, so ignore the region. 
        
             else: 
        
               return None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing cnv's in cnv_data.txt #134

Missing cnv's in cnv_data.txt #134

ardydavari commented Mar 2, 2021 •

edited

Loading

shaghayeghsoudi commented Mar 2, 2021

ardydavari commented Mar 3, 2021

ardydavari commented Mar 3, 2021

ardydavari commented Mar 4, 2021 •

edited

Loading

Missing cnv's in cnv_data.txt #134

Missing cnv's in cnv_data.txt #134

Comments

ardydavari commented Mar 2, 2021 • edited Loading

biopsy_cnvs.txt

cnvs_data.txt

shaghayeghsoudi commented Mar 2, 2021

ardydavari commented Mar 3, 2021

ardydavari commented Mar 3, 2021

ardydavari commented Mar 4, 2021 • edited Loading

ardydavari commented Mar 2, 2021 •

edited

Loading

ardydavari commented Mar 4, 2021 •

edited

Loading