-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Q: HMMRATAC producing too many peaks #636
Comments
One thing you can try is to increase the cutoff for calling candidate regions in the macs3 version of Please let me know if this can help. |
I saw that, but wasn't sure how to properly use the output to determine the optimal values. Do you have an example documented somewhere? |
The cutoff-analysis for And a similar cutoff-analysis for And if you can generate the cutoff analysis report from |
I am sharing the cutoff report here (I had to add .txt because curiously GitHub does not allow .tsv): It might even be easier to just paste it in whole:
|
The fold change values for reasonable peaks are within the lower range. The default parameters for How is the current HMM from |
Yes, each sample is auto-scaled independently since peak-calling is independent as well. |
Also related to #638, it's highly possible that the hidden markov model learned from the low quality data can't capture the expected chromatin structure around accessible sites. This is a common problem of "machine learning". In this case, perhaps a general peak calling process such as using "callpeak" function is more appropriate. |
I am not surprised that machine learning is failing for these samples. I was primarily concerned about the large difference between the Java and MACS3 implementations. |
We are implementing a different emission model for HMM in MACS3 called the Poisson Emission model. During our internal tests, this simple model (compared with the Gaussian Emission model) generated results that were similar to those of the Java implementation. Stay tuned if you are interested in it :) |
I ran HMMRATAC on two samples that should be biological replicates. The quality difference between them is high, although the quality is far from ideal for both. However, the low quality sample produced more than 10x as many peaks (<10k versus >100k) with the default settings.
For comparison, the Java version of HMMRATAC produced <1k peaks for the first sample and no peaks for the second sample (score cutoff of 30).
Here is a genome browser screenshot to illustrate the situation:
The called peaks for the first sample look like peaks. Not so much for the second sample. Is this expected? Are there some guidelines to help avoid this?
The text was updated successfully, but these errors were encountered: