Incorporate bug fixes and update version number #46

SPPearce · 2023-07-06T09:58:40Z

This pull request incorporates the bug fixes included in #31, #37, as well as some further changes:
I have been using this version "in production" for the last 2 years, running over 3000 samples through NGSCheckMate in batches. The tool is extremely useful, and I am ready to include it into several Nextflow workflows in the nf-core community, but can not without the fix that #31 provides to prevent divide by zero crashes in particular.

@SooLee , please can you take a look at this PR, and release a new version of the tool.

BAM mode no longer crashes if samples with few reads (i.e. failed samples) is provided, causing a divide by zero error (incorporating Fix dataset reading and pearson correlation is 0 #31)
split the directory only once if the dir contains "=" in the name (incorporating Update ncm.py #37)
The output_tag prefix is added to the output_corr_matrix.txt file
Making the size of the dendrogram pdf scale better as more (100+) samples are added
Ensured a dendrogram is generated if only two samples are compared
Changed the dendrogram background colour to white from grey
Updated the version information from v1.0 to v1.0.1

SPPearce · 2023-07-06T10:03:06Z

@SooLee , I see that you have been active on the repository, it would be great if we can get a new tagged version of the tool especially with the divide by zero fix, so that I can update the bioconda container (which will update the docker/singularity images on biocontainers) and then I can incorporate the tool properly into pipelines in the nf-core project.

SPPearce · 2023-07-14T09:11:54Z

@SooLee , please can you look at this?

SPPearce · 2023-08-13T19:19:59Z

@SooLee , are you able to look at this please?

SooLee · 2023-08-13T21:09:03Z

@sejooning Can you please look at this and merge?

SPPearce · 2023-08-14T13:11:26Z

Hi @SooLee , there has been no activity on github from @sejooning for several years, have you been able to contact them outside of github?

Thanks

aleelab · 2023-08-14T14:40:38Z

Hi Simon, I'm contacting Sejoon to see if he can work on this. Thank you for your interest in our tool and the suggestion. Best, Alice

…

On Mon, Aug 14, 2023 at 9:11 AM Simon Pearce ***@***.***> wrote: Hi @SooLee <https://github.com/SooLee> , there has been no activity on github from @sejooning <https://github.com/sejooning> for several years, have you been able to contact them outside of github? Thanks — Reply to this email directly, view it on GitHub <#46 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB67IYXEU6BUWYCLQJ23JWDXVIPYVANCNFSM6AAAAAA2AGBLTQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

SPPearce · 2023-08-14T15:36:15Z

Thank you, that is great!

sejooning · 2023-08-15T13:11:19Z

Hi Simon,

Thank you for addressing the bug and offering valuable suggestions for version updates. The mentioned changes have been committed. It would be fantastic if NGSCheckMate could be incorporated into Nextflow. Thank you very much.

Best regards,
Sejoon Lee

SPPearce · 2023-08-15T13:28:11Z

Thank you both! Can you please release a version with the 1.0.1 tag, to make it clear what that version number relates to? Unless you are planning on making any other changes to the tool in the near future?

ImNotaGit · 2023-11-29T04:07:49Z

Hi, @SPPearce, I am a regular user of NGSCheckMate, thanks a lot for the effort! There is still one very minor issue though:

The output_tag prefix is added to the output_corr_matrix.txt file

You made changes corresponding to the point above only in ncm.py but not in vaf_ncm.py, thus make the naming of the output files inconsistent between fastq mode and bam/vcf mode. Besides, the "output" in "output_corr_matrix.txt" is supposed to be the default prefix, and when providing a custom prefix with -N the output should be <prefix>_corr_matrix.txt; however, in your code the output would become <prefix>_output_corr_matrix.txt (also, the default output changed from output_corr_matrix.txt to output_output_corr_matrix.txt.) It happened that I have been using a snakemake pipeline that I wrote which wraps NGSCheckMate, and the above change and esp. inconsistency between fastq and bam/vcf modes broke my pipeline:)

Although I can make tweak in my pipeline to allow fastq mode and bam/vcf mode outputs being handled differently, it would be great if you could help address the above inconsistency issues upstream. Basically, 1. remove the "output" part in the file name in ncm.py; 2. do the same for vaf_ncm.py.

Thanks a lot.

SPPearce · 2023-11-29T07:02:07Z

Hi @ImNotaGit ,
I'm glad it has been helpful to you too.
Yes, I spotted those issues when incorporating the tool into the Nextflow community organisation nf-core. I generally use the bam mode so hadn't noticed that vaf_ncm.py needed to be synced up with the ncm.py.
Do you also find that the last .vaf file is missed when using vaf_ncm.py?

SPPearce · 2023-11-29T07:26:21Z

(to be precise, that the last file is missing from the correlation matrix, but is present in the text file)

ImNotaGit · 2023-11-30T03:05:49Z

Do you also find that the last .vaf file is missed when using vaf_ncm.py?
(to be precise, that the last file is missing from the correlation matrix, but is present in the text file)

Honestly, I did not notice that, since (now that you reminded me) in my pipeline I do not use the correlation matrix (output_corr_matrix.txt), but generate a report using results in the output_all.txt instead -- I would regenerate the correlation matrix manually by reshaping the output_all.txt matrix, with a tiny loss of numerical precision (I noticed that the values in output_corr_matrix.txt have more digits after the decimal point). I do it this way because:

I noticed that the predicted unmatched sample pairs have their raw correlation values replaced with zeros in the output_corr_matrix.txt, but I would want to check the raw correlation matrix.
Specifically in fastq mode the output_corr_matrix.txt is often not made symmetric, that only one of the two off-diagonal values is filled, but in a random way (i.e. sometimes in the upper triangle and sometimes in the lower...)

Oh, BTW, I recall another inconsistency between fastq and bam/vcf modes: in bam/vcf mode a output_matched.txt table is generated from NGSCheckMate, but in fastq mode this file is not automatically generated...

Yeah, so it seems that there are additional things to be fixed:P

SPPearce · 2023-11-30T08:25:58Z

Yes, I'm also aware that the correlation matrix is only half filled in in the fastq mode too, and the lack of the matched file.
I think I have fixed all those bits except for the missing last sample by copying code over from ncm.py, I'll push to a new PR for a new version soon. Can't immediately see why the last sample is missing from the correlation matrix at the moment.
Is there anything else that could do with fixing?

ImNotaGit · 2023-11-30T17:02:01Z

Thanks a lot again for all the efforts! I cannot think of other things that need to be fixed for the moment.

SPPearce · 2023-11-30T20:57:09Z

@ImNotaGit , can you test out this vaf_ncm.py file before I make a PR. I'm struggling to test it outside a container on my laptop.

SPPearce added 2 commits July 6, 2023 10:02

Update ncm.py

116b893

Update README.md

51e2634

SPPearce changed the title ~~Update ncm.py~~ Incorporate bug fixes and update version number Jul 6, 2023

sejooning merged commit ef7a38c into parklab:master Aug 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorporate bug fixes and update version number #46

Incorporate bug fixes and update version number #46

SPPearce commented Jul 6, 2023

SPPearce commented Jul 6, 2023

SPPearce commented Jul 14, 2023

SPPearce commented Aug 13, 2023

SooLee commented Aug 13, 2023

SPPearce commented Aug 14, 2023

aleelab commented Aug 14, 2023 via email

SPPearce commented Aug 14, 2023

sejooning commented Aug 15, 2023

SPPearce commented Aug 15, 2023

ImNotaGit commented Nov 29, 2023 •

edited

Loading

SPPearce commented Nov 29, 2023

SPPearce commented Nov 29, 2023

ImNotaGit commented Nov 30, 2023 •

edited

Loading

SPPearce commented Nov 30, 2023

ImNotaGit commented Nov 30, 2023

SPPearce commented Nov 30, 2023 •

edited

Loading

Incorporate bug fixes and update version number #46

Incorporate bug fixes and update version number #46

Conversation

SPPearce commented Jul 6, 2023

SPPearce commented Jul 6, 2023

SPPearce commented Jul 14, 2023

SPPearce commented Aug 13, 2023

SooLee commented Aug 13, 2023

SPPearce commented Aug 14, 2023

aleelab commented Aug 14, 2023 via email

SPPearce commented Aug 14, 2023

sejooning commented Aug 15, 2023

SPPearce commented Aug 15, 2023

ImNotaGit commented Nov 29, 2023 • edited Loading

SPPearce commented Nov 29, 2023

SPPearce commented Nov 29, 2023

ImNotaGit commented Nov 30, 2023 • edited Loading

SPPearce commented Nov 30, 2023

ImNotaGit commented Nov 30, 2023

SPPearce commented Nov 30, 2023 • edited Loading

ImNotaGit commented Nov 29, 2023 •

edited

Loading

ImNotaGit commented Nov 30, 2023 •

edited

Loading

SPPearce commented Nov 30, 2023 •

edited

Loading