Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A bug seems to occur when editing contigs information of a vcf header. #1314

Open
Dentalium opened this issue Oct 21, 2024 · 1 comment
Open

Comments

@Dentalium
Copy link

Hi, dear developers and maintainers. I tried to use pysam to edit the contig information of the header of a vcf file, then write records into it. However, the contig name of the record newly written into the new file seems to encounter a bug. Here's my code:

import pysam
# 不用with语句
# 写入vcf文件
# 使用with语句
src=pysam.VariantFile("2mismatch.vcf.gz")
# 修改contig信息
# 建立空header
new_header = pysam.VariantHeader()
# 导入除了contig以外的records
for i in src.header.records:
    if i.key != 'contig':
        new_header.add_record(i)
        print(i)
# 加入samples,注意列名行不是records
new_header.add_samples(src.header.samples)

# 手动添加contig
new_header.contigs.add("Horvu_VADA_Un01G000200.1",2673)
new_header.contigs.add(id="mychr",length=100)

# 手动添加一行信息
new_header.add_meta(key="test meta",value="this is a test")

# 测试
print(list(new_header.contigs)[-2:])
print(new_header.contigs["mychr"].length)

#编辑新的
dst=pysam.VariantFile("test.vcf","w", header=new_header)
for rec in src.fetch("Horvu_VADA_Un01G000200.1"):
    # 染色体名和坐标
    print(rec.chrom)
    print(rec.pos)
    # 写入新文件
    dst.write(rec)
    print(rec)
dst.close()
src.close()

# 查看新文件
my_vcf=pysam.VariantFile("test.vcf")
print(my_vcf.header)
for rec in my_vcf.fetch():
    print(rec)

my_vcf.close()

Here's the output:

##fileformat=VCFv4.2

##FILTER=<ID=PASS,Description="All filters passed">

##source=minipileup-1.0-r11

##reference=rawdata/Vada.all.cds.fasta

##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">

##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

##bcftools_viewVersion=1.19+htslib-1.19

##bcftools_viewCommand=view -Oz 2mismatch.vcf; Date=Sat Oct 19 16:46:14 2024

['Horvu_VADA_Un01G000200.1', 'mychr']
100
Horvu_VADA_Un01G000200.1
1729
Horvu_VADA_Un01G000200.1	1729	.	C	G	1	.	.	GT:AD	[./.:0](https://vscode-remote+ssh-002dremote-002b10-002e10-002e200-002e58.vscode-resource.vscode-cdn.net/home/data/ljb/proj/scripts/Python/test/pysam/.:0),0	[./.:0](https://vscode-remote+ssh-002dremote-002b10-002e10-002e200-002e58.vscode-resource.vscode-cdn.net/home/data/ljb/proj/scripts/Python/test/pysam/.:0),0	[./.:0](https://vscode-remote+ssh-002dremote-002b10-002e10-002e200-002e58.vscode-resource.vscode-cdn.net/home/data/ljb/proj/scripts/Python/test/pysam/.:0),0	1/1:0,1

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##source=minipileup-1.0-r11
##reference=rawdata/Vada.all.cds.fasta
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##bcftools_viewVersion=1.19+htslib-1.19
##bcftools_viewCommand=view -Oz 2mismatch.vcf; Date=Sat Oct 19 16:46:14 2024
##contig=<ID=Horvu_VADA_Un01G000200.1,length=2673>
##contig=<ID=mychr,length=100>
##test meta=this is a test
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	[./result/02_align/2mismatch/L128-LGK2361_L1.dedup.bam](https://vscode-remote+ssh-002dremote-002b10-002e10-002e200-002e58.vscode-resource.vscode-cdn.net/home/data/ljb/proj/scripts/Python/test/pysam/result/02_align/2mismatch/L128-LGK2361_L1.dedup.bam)	[./result/02_align/2mismatch/L136-LGK2362_L1.dedup.bam](https://vscode-remote+ssh-002dremote-002b10-002e10-002e200-002e58.vscode-resource.vscode-cdn.net/home/data/ljb/proj/scripts/Python/test/pysam/result/02_align/2mismatch/L136-LGK2362_L1.dedup.bam)	[./result/02_align/2mismatch/L5-LGK2360_L1.dedup.bam](https://vscode-remote+ssh-002dremote-002b10-002e10-002e200-002e58.vscode-resource.vscode-cdn.net/home/data/ljb/proj/scripts/Python/test/pysam/result/02_align/2mismatch/L5-LGK2360_L1.dedup.bam)	[./result/02_align/2mismatch/47-1.dedup.bam](https://vscode-remote+ssh-002dremote-002b10-002e10-002e200-002e58.vscode-resource.vscode-cdn.net/home/data/ljb/proj/scripts/Python/test/pysam/result/02_align/2mismatch/47-1.dedup.bam)

mychr	1729	.	C	G	1	.	.	GT:AD	[./.:0](https://vscode-remote+ssh-002dremote-002b10-002e10-002e200-002e58.vscode-resource.vscode-cdn.net/home/data/ljb/proj/scripts/Python/test/pysam/.:0),0	[./.:0](https://vscode-remote+ssh-002dremote-002b10-002e10-002e200-002e58.vscode-resource.vscode-cdn.net/home/data/ljb/proj/scripts/Python/test/pysam/.:0),0	[./.:0](https://vscode-remote+ssh-002dremote-002b10-002e10-002e200-002e58.vscode-resource.vscode-cdn.net/home/data/ljb/proj/scripts/Python/test/pysam/.:0),0	1/1:0,1

I firstly copied all the header information from an old vcf file except contigs. Then I mannually copied a specific contig ("Horvu_VADA_Un01G000200.1"), and added a new one ("mychr"). After that I tried to copy a single record from the old file.
Although the output of the print(rec.chrom)andprint(rec)shows correct contig name ("Horvu_VADA_Un01G000200.1"), after writing into the new file and re-open the file, it wiredly become "mychr", which is the same as the newly added header information. Is that a bug? I am working with python 3.12.2 and pysam 0.22.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
@Dentalium and others