Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DMRfind generates thousands of chunk.tsv files #94

Open
MeganAdler opened this issue Aug 26, 2024 · 9 comments
Open

DMRfind generates thousands of chunk.tsv files #94

MeganAdler opened this issue Aug 26, 2024 · 9 comments

Comments

@MeganAdler
Copy link

Hi Yupeng,

I'm running into some issues with the results generated by DMRfind. I have EM-seq reads from two different organisms to use in my DMR analysis. I've run DMRfind (methylpy 1.4.7) on Arabidopsis with the simplified command below with success:

methylpy DMRfind \
	--allc-files /allc_Arabidopsis_1_merged.tsv.gz /allc_Arabidopsis_2_merged.tsv.gz
	--samples A1 A2 \
	--mc-type "CGN" \
	--chroms 1 2 3 4 5 \
	--num-procs 10 \
	--output-prefix /results/CGN_DMR_A1_A2_merged

Results:

  • CGN_DMR_A1_A2_merged_rms_results.tsv.gz
  • CGN_DMR_A1_A2_merged_rms_results_collapsed.tsv
  • CGN_DMR_A1_A2_merged_rms_results_collapsed.tsv.DMR.bed
  • CGN_DMR_A1_A2_merged_rms_results_collapsed.tsv.DMS.bed

However, running this similar command on another organism (with a scaffold genome) led to the generation of thousands of scaffold#_chunk#.tsv files in the results directory:

methylpy DMRfind \
	--allc-files /allc_Sample_1_merged.tsv.gz /allc_Sample_2_merged.tsv.gz
	--samples S1 S2 \
	--mc-type "CGN" \
	--num-procs 10 \
	--output-prefix /results/CGN_DMR_S1_S2_merged

Results:

  • CGN_DMR_S1_S2_merged_rms_results.tsv.gz
  • CGN_DMR_S1_S2_merged_rms_results_collapsed.tsv
  • CGN_DMR_S1_S2_merged_rms_results_collapsed.tsv.DMR.bed
  • CGN_DMR_S1_S2_merged_rms_results_collapsed.tsv.DMS.bed
  • CGN_DMR_S1_S2_merged_rms_results_for_organism_scaffold1_chunk_0.tsv
  • CGN_DMR_S1_S2_merged_rms_results_for_organism_scaffold1_chunk_1.tsv
  • CGN_DMR_S1_S2_merged_rms_results_for_organism_scaffold1_chunk_2.tsv
  • thousands more of the rms chunk files

It's interesting because there was no obvious error in the output file and some of the DMRs seem to have compiled in the top four files.

Thank you for your help, and please let me know if you need any more information for troubleshooting this issue.

@yupenghe
Copy link
Owner

yupenghe commented Sep 4, 2024

Yes the behavior of generating thousands of chunk files is expected. The chunk files are expected to be removed at the end of the run but it does not seem to be the case. I don't know what went wrong. One potential explanation is that the program died without error message. If you run it again, do you still see the chunk files?

@markram4
Copy link

I am experiencing the same issue. I'm re-running DMRFind on allc files we previously used and I'm no longer getting the appropriate output files and the chunk files remain. The rms_results.tsv is empty

Filtering allc files using 6 node(s).
Thu Dec 12 17:57:07 2024

Splitting allc files for chromosome Chr1
Thu Dec 12 17:57:59 2024

Running rms tests for chromosome Chr1
Thu Dec 12 17:58:50 2024

Splitting allc files for chromosome Chr2
Thu Dec 12 17:58:50 2024

Running rms tests for chromosome Chr2
Thu Dec 12 17:59:23 2024

Splitting allc files for chromosome Chr3
Thu Dec 12 17:59:23 2024

Running rms tests for chromosome Chr3
Thu Dec 12 18:00:04 2024

Splitting allc files for chromosome Chr4
Thu Dec 12 18:00:04 2024

Running rms tests for chromosome Chr4
Thu Dec 12 18:00:36 2024

Splitting allc files for chromosome Chr5
Thu Dec 12 18:00:36 2024

Running rms tests for chromosome Chr5
Thu Dec 12 18:01:22 2024

Merging sorted test_rms_results.tsv files.
Thu Dec 12 18:01:22 2024

Begin FDR Correction
Thu Dec 12 18:01:22 2024

m0 estimate for iteration 0: 0
Histogram FDR correction did not converge. Using the smallest p-value 0.0003333333333333333 as cutoff.

Calculating Residual Cutoff
Thu Dec 12 18:01:22 2024

There are no null residuals to calculate resid_cutoff. Using 2 as the cutoff.
Begin Defining Windows
Thu Dec 12 18:01:22 2024

Adding Methylation Levels
Thu Dec 12 18:01:22 2024

Done
Thu Dec 12 18:01:23 2024

@yupenghe
Copy link
Owner

Could you put together a small example to reproduce the error?

@markram4
Copy link

Hi Yupeng, thanks for your help. I'm attaching small subsets of the data set. I'm also attaching the yml for the methylpy conda environment.
allc_group2_rep3.sample.tsv.gz
allc_group2_rep2.sample.tsv.gz
allc_group2_rep1.sample.tsv.gz
allc_group1_rep3.sample.tsv.gz
allc_group1_rep2.sample.tsv.gz
allc_group1_rep1.sample.tsv.gz
env.yml.txt

This is the command I ran:
methylpy DMRfind --mc-type CNN --num-procs 10 --min-cov 3 --sig-cutoff 0.05 --allc-files allc_group1_rep2.sample.tsv.gz allc_group2_rep3.sample.tsv.gz allc_group1_rep3.sample.tsv.gz allc_group2_rep1.sample.tsv.gz allc_group1_rep1.sample.tsv.gz allc_group2_rep2.sample.tsv.gz --dmr-max-dist 100 --chroms Chr1 --output-prefix ~/DMRfind/test.sample
Filtering allc files using 6 node(s).
Fri Dec 13 19:44:10 2024

Splitting allc files for chromosome Chr1
Fri Dec 13 19:44:10 2024

Running rms tests for chromosome Chr1
Fri Dec 13 19:44:10 2024

Merging sorted /opt/afernabio/sequencing_data/mkramer/projects/emseq/data/test_at_data/DMRfind/test.sample_rms_results.tsv files.
Fri Dec 13 19:44:11 2024

Begin FDR Correction
Fri Dec 13 19:44:11 2024

m0 estimate for iteration 0: 0
Histogram FDR correction did not converge. Using the smallest p-value 0.0003333333333333333 as cutoff.

Calculating Residual Cutoff
Fri Dec 13 19:44:11 2024

There are no null residuals to calculate resid_cutoff. Using 2 as the cutoff.
Begin Defining Windows
Fri Dec 13 19:44:11 2024

Adding Methylation Levels
Fri Dec 13 19:44:11 2024

Done
Fri Dec 13 19:44:11 2024

In the output there are many intermediate files, and an empty rms_results.tsv file.

Thanks for your help

@yupenghe
Copy link
Owner

Weird. I cannot reproduce the error.

Here was the log I got.

Filtering allc files using single node.
Sun Dec 15 18:15:21 2024

Splitting allc files for chromosome Chr1
Sun Dec 15 18:15:21 2024

Running rms tests for chromosome Chr1
Sun Dec 15 18:15:21 2024

Ran 1738 root mean square tests. Results in test_output_rms_results_for_Chr1_chunk_0.tsv
Merging sorted test_output_rms_results.tsv files.
Sun Dec 15 18:15:39 2024

Begin FDR Correction
Sun Dec 15 18:15:39 2024

m0 estimate for iteration 0: 1738
m0 estimate for iteration 1: 1721.3173333333334
m0 estimate for iteration 2: 1720.7213173333334
Difference between best and last m0 estimate: 0.5960159999999632
The closest p-value cutoff for your desired FDR is 0.0003333333333333333 which corresponds to an FDR of 0.03687259965714285

Calculating Residual Cutoff
Sun Dec 15 18:15:40 2024

Begin Defining Windows
Sun Dec 15 18:15:40 2024

Adding Methylation Levels
Sun Dec 15 18:15:40 2024

Done
Sun Dec 15 18:15:40 2024

@yupenghe
Copy link
Owner

Could you try one more thing? If you just run methylpy, what is the path in the help info?

For example

You are using methylpy 1.4.7 version
(/AAA/BBB/CCC/DDD/methylpy/)

When you get the path, could you try the below command?

/AAA/BBB/CCC/DDD/methylpy/run_rms_tests.out

Are you able to see the below output?

Usage: ./rms.out <chunk files> <output file> <samples> <min_cov> <num_sims> <num_sig_tests> <seed>

@markram4
Copy link

markram4 commented Dec 16, 2024

How strange... Yes, I get that output.
You are using methylpy 1.4.7 version (/home/ubuntu/miniconda3/envs/methylpy_env/lib/python3.10/site-packages/methylpy/)

/home/ubuntu/miniconda3/envs/methylpy_env/lib/python3.10/site-packages/methylpy/run_rms_tests.out
Usage: ./rms.out <min_cov> <num_sims> <num_sig_tests> < seed>

@markram4
Copy link

I'm trying to create a new conda env to ensure that isn't the issue, but it's taking a long time. Which download method did you do/do you recommend?

@yupenghe
Copy link
Owner

Have you tried mamba? I found it much better than conda,.
https://mamba.readthedocs.io/en/latest/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants