illumina paired end files may have 1:N:0 and 2:N:0 #38
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We welcome feedback and issue reporting for all bioBakery tools through our [Discourse site]
https://forum.biobakery.org/c/pull-request/featurepull-request/). For users that would like to directly contribute to the tools we are happy to field PRs to address bug fixes. Please note the turn around time on our end might be a bit long to field these but that does not mean we don't value the contribution! We currently don't accept PRs to add new functionality to tools but we would be happy to receive your feedback on [Discourse]
https://forum.biobakery.org/c/pull-request/featurepull-request/).
Also, we will make sure to attribute your contribution in our User’s manual(README.md) and in any associated paper Acknowledgements.
Description
Add a new "if" in utilities.get_reformatted_identifiers function
Using the regular expression r'\d+:N:\d+:(?=[A|T|C|G])' to get the issue part of sequence label.
And then just delete it.
Related Issue
I found some users alway get zero file of two paired files [issue1]
(https://forum.biobakery.org/t/paired-end-data-results-in-unpaired-output/928/2) [issue2]
(https://forum.biobakery.org/t/no-reads-in-the-final-output-files-created/3419).
I also have that issue, my sequence label was like
R1: @A00456:506:H5GC2DSXY:1:1101:6027:1000 1:N:0:CAACACAG+CAAGGTAC
R2: @A00456:506:H5GC2DSXY:1:1101:6027:1000 2:N:0:CAACACAG+CAAGGTAC
So. in old "get_reformatted_identifiers adding" "#0/1" and "#0/2" not solving this issue. and I just delete the part. it worked.
Screenshots (if appropriate):