Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gff file incompatibility? #9

Open
Ruth-hals opened this issue Mar 14, 2024 · 2 comments
Open

gff file incompatibility? #9

Ruth-hals opened this issue Mar 14, 2024 · 2 comments

Comments

@Ruth-hals
Copy link

Ruth-hals commented Mar 14, 2024

Hi Matthias,
Thank you for making such a nice tool!
I would be interested to use it but I cannot seem to format my gff so that it would be compatible.
What version of tabix should I be using?

I'm getting the following error;
annotation_fn=f'sorted_fixed_input.gff3.gz' #create the IsoTools transcriptome object from the reference annotation isoseq=Transcriptome.from_reference(annotation_fn)

  0%|                                                                                       | 0.00/15.1M [00:00<?, ?B/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[8], line 3
      1 annotation_fn=f'sorted_fixed_input.gff3.gz'
      2 #create the IsoTools transcriptome object from the reference annotation
----> 3 isoseq=Transcriptome.from_reference(annotation_fn)

File ~/anaconda3/envs/metacell/lib/python3.11/site-packages/isotools/transcriptome.py:55, in Transcriptome.from_reference(cls, reference_file, file_format, **kwargs)
     53 tr = cls()
     54 tr.chimeric = {}
---> 55 tr.data = import_ref_transcripts(reference_file, tr, file_format,  **kwargs)
     56 tr.infos = {'reference_file': reference_file, 'isotools_version': __version__}
     57 tr.filter = {'gene': DEFAULT_GENE_FILTER.copy(),
     58              'transcript': DEFAULT_TRANSCRIPT_FILTER.copy(),
     59              'reference': DEFAULT_REF_TRANSCRIPT_FILTER.copy()}

File ~/anaconda3/envs/metacell/lib/python3.11/site-packages/isotools/_transcriptome_io.py:1064, in import_ref_transcripts(fn, transcriptome, file_format, chromosomes, gene_categories, short_exon_th, **kwargs)
   1062     exons, transcripts, gene_infos, cds_start, cds_stop, skipped = _read_gtf_file(fn, chromosomes, **kwargs)
   1063 else:  # gff/gff3
-> 1064     exons, transcripts, gene_infos, cds_start, cds_stop, skipped = _read_gff_file(fn, chromosomes, **kwargs)
   1066 if skipped:
   1067     logger.info('skipped the following categories: %s', skipped)

File ~/anaconda3/envs/metacell/lib/python3.11/site-packages/isotools/_transcriptome_io.py:1012, in _read_gff_file(file_name, chromosomes, progress_bar)
   1010 with tqdm(total=path.getsize(file_name), unit_scale=True, unit='B', unit_divisor=1024, disable=not progress_bar) as pbar, TabixFile(file_name) as gff:
   1011     chrom_ids = get_gff_chrom_dict(gff, chromosomes)
-> 1012     for line in gff.fetch():
   1013         file_pos = gff.tell() >> 16  # the lower 16 bit are the position within the zipped block
   1014         if pbar.n < file_pos:

File ~/anaconda3/envs/metacell/lib/python3.11/site-packages/pysam/libctabix.pyx:499, in pysam.libctabix.TabixFile.fetch()

ValueError: could not create iterator, possible tabix version mismatch

Thank you very much for your help,
Best,
Ruth

@Ruth-hals
Copy link
Author

Hi,
A more recent htslib version (HTSlib/1.17-GCC-12.2.0) solved my issue.

Thanks,
Best,
Ruth

@MatthiasLienhard
Copy link
Owner

Hi, thank you for reporting. I will leave this open until I fixed the version for the dependencies.

IceFreez3r added a commit to IceFreez3r/isotools that referenced this issue Dec 10, 2024
* Fixes for coordination test

fix producing TSS/PAS ASEvents where primary and alternative share the same start/end coordinate due to intron retention in the first exon
fix division by 0 when no coordinated events are found
filter_stats uses *kwargs
few more types

* Coordination min_dist_AB

Filter events if the distance of node A and B in an event is below a threshold
More types
Minor fixes of docs

* TSS/PAS only by first/last node

* Testing new heuristic to set the TSS

Don't use the peak calling directly anymore. Instead look upstream from the peak for a reference position and if one exists use that.

* Don't extend tss past ref exon ends

* Missed some renamed variables

* TSS correction toggle

* Remove duplicate params string

* Better min_dist_events docs

* Move dist_AB to _filter_event

* Adapt filter_stats call in run_isotools
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants