-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recommend repeat mask? #64
Comments
Hi @sheinasim Pooja |
In this NCBI EGAPX pipeline,why not use the masked.fasta?I can not understand,Can you explain it in more detail?Thanks ! |
EGAPx is predominantly an evidence-based predictor, using RNA-seq and protein alignments as the primary basis for nearly all models. Most aligners, including STAR and miniprot, don't care about and ignore soft-masking, and recommend against hard-masking, so there's no need for it. lncRNAs and 3' UTRs of coding genes also often include repeats which are valid to include in the model. A carefully vetted masking library can be useful for identifying gene predictions on transposons and other repeats; however, without curation that can over-filter real genes (e.g. high-copy number gene families like histones can get masked). EGAPx includes some alternative logic to identify gene predictions that are predominantly transposon based on protein hits, and we pre-filter our protein evidence sets to remove repeat-based proteins. It is an area that we've been exploring for improvements, but I think focusing on protein characteristics (e.g. domain analysis) will be more suitable for the purpose. We've also set up EGAPx to require at least some alignment evidence for all models, whereas in RefSeq EGAP we include some models that are entirely based on ab initio prediction. That ab initio path can find a few more real genes, but is the major source of transposon noise in RefSeq XP models, so the EGAPx settings help improve precision. |
Hello there!
I'm running the latest version of EGAPx alpha and I was wondering what the recommendation is for repeat masking the assembly before annotation. Is there a repeat masker that is recommended, or is it necessary?
Thanks!
Sheina
The text was updated successfully, but these errors were encountered: