Recommend repeat mask? #64

sheinasim · 2024-12-06T22:16:21Z

Hello there!

I'm running the latest version of EGAPx alpha and I was wondering what the recommendation is for repeat masking the assembly before annotation. Is there a repeat masker that is recommended, or is it necessary?

Thanks!
Sheina

pstrope · 2024-12-09T16:14:50Z

Hi @sheinasim
Masking is not needed, and not used in EGAPx at this time.

Pooja

xinghui-guo · 2024-12-12T05:38:56Z

In this NCBI EGAPX pipeline，why not use the masked.fasta？I can not understand，Can you explain it in more detail？Thanks !

murphyte · 2024-12-13T17:08:41Z

EGAPx is predominantly an evidence-based predictor, using RNA-seq and protein alignments as the primary basis for nearly all models. Most aligners, including STAR and miniprot, don't care about and ignore soft-masking, and recommend against hard-masking, so there's no need for it. lncRNAs and 3' UTRs of coding genes also often include repeats which are valid to include in the model.

A carefully vetted masking library can be useful for identifying gene predictions on transposons and other repeats; however, without curation that can over-filter real genes (e.g. high-copy number gene families like histones can get masked). EGAPx includes some alternative logic to identify gene predictions that are predominantly transposon based on protein hits, and we pre-filter our protein evidence sets to remove repeat-based proteins. It is an area that we've been exploring for improvements, but I think focusing on protein characteristics (e.g. domain analysis) will be more suitable for the purpose. We've also set up EGAPx to require at least some alignment evidence for all models, whereas in RefSeq EGAP we include some models that are entirely based on ab initio prediction. That ab initio path can find a few more real genes, but is the major source of transposon noise in RefSeq XP models, so the EGAPx settings help improve precision.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recommend repeat mask? #64

Recommend repeat mask? #64

sheinasim commented Dec 6, 2024

pstrope commented Dec 9, 2024

xinghui-guo commented Dec 12, 2024

murphyte commented Dec 13, 2024

Recommend repeat mask? #64

Recommend repeat mask? #64

Comments

sheinasim commented Dec 6, 2024

pstrope commented Dec 9, 2024

xinghui-guo commented Dec 12, 2024

murphyte commented Dec 13, 2024