Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] expanded/updated documentation of anvi-compute-functional-enrichment (& friends) + blog post #2380

Open
12 tasks
adw96 opened this issue Dec 17, 2024 · 0 comments
Assignees

Comments

@adw96
Copy link
Contributor

adw96 commented Dec 17, 2024

The need

Functional enrichment is a widely used piece of the pangenomics workflow, but as with any automated statistical procedure, its important to give clear guidance on its use case and monitor its use in the wild. A quick survey of papers citing Shaiber et al (by @ivagljiva and @adw96 ) suggested that most users are doing a great job of using the method appropriately. That said, there are a few things that I (as the original author of the underpinning script) could do to clarify its use case, point out some potential pitfalls, and generally guide people in the right direction.

The solution

I aspire to do the following

  • documentation
    • clarify that use case is for pre-determined groups. Groups should not be determined using the pangenome and then tested for differentially enriched functions.
    • clarify use case is two-group comparison
    • point people to blog post for more complex designs
  • blog post
    • how to pull out the relevant data and import into R
    • showcase flexibility of general procedure
      • provide clear interpretation of estimated parameters
      • incorporating additional covariates
      • how you could look at time series data
      • how you could do a global test eg if you have >2 groups
    • how to fit a different model or run a different test. Showcase happi as example 🥕

A challenge will be that, unlike the two group comparison case, users now need to choose what model is reasonable. While many uses have fantastic intuition for this, writing out the "rules" is very difficult, and many people aren't going to get good statistical instruction (especially not from chatgpt/the internet). So, how to we guide people without writing a textbook. (Could point them to In Press NM paper?)

I aspire to have a draft on a branch by the beginning of February. I will ask @ivagljiva and @tucker4 for feedback.

Beneficiaries

Folx using the pangenomics workflow.

@adw96 adw96 self-assigned this Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant