Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where to get detailed description of GFAse's computational methods? #28

Open
xujialupaoli opened this issue Jul 12, 2024 · 3 comments

Comments

@xujialupaoli
Copy link

Hi,

I am very interested in the methods of GFAse. I have carefully read your paper ‘Phased nanopore assembly with Shasta and modular graph phasing with GFAse’, but I don’t quite understand the Computational methods of GFAse. I would like to know more details about the Computational methods. I see that Shasta has a related webpage introduction https://paoloshasta.github.io/shasta/ComputationalMethods.html, but I can’t find the one for gfase. Can you tell me where to find it?

Looking forward to your reply!

@rlorigro
Copy link
Owner

Hi Rebecca, unfortunately there is no existing description other than the cited papers that is more thorough. However I am happy to answer any questions you have. The summary of it is that there is a preprocessing step (to find bubbles and estimate the contact map), an optimization step, and then a chaining step. Is there anything in particular you want to know about?

@xujialupaoli
Copy link
Author

Thank you very much for your reply!

I saw in your article that "To phase the graph, GFAse first identifies diploid, haplotypic bubbles. Two methods are available in GFAse: assembler annotation and sequence similarity search", but the article did not introduce how GFAse implements the phase process through assembler annotation. Could you tell me about this process?

In addition, in the introduction to GfAse, your article mentioned that "Phases are optimized using a stochastic method that approximates a solution to the optimization variant of the max-cut problem (Selvaraj et al. 2013; Edge et al. 2017; Cheng et al. 2021). The method depends on an objective function that penalizes inconsistent contacts and rewards consistent contacts." Can I know the specific judgment method of the objective function you are talking about?

@rlorigro
Copy link
Owner

the article did not introduce how GFAse implements the phase process through assembler annotation. Could you tell me about this process?

This one is very simple, it just means that Shasta (Mode 2) already labels its GFA by a naming convention. For example, two node names in the GFA (S lines) that are members of the same bubble will have identical prefixes, but their suffix would be either .0 or .1. You can see the naming convention in the figure below, for the nodes starting with PR. However, using the node names is no longer recommended with Shasta Mode 3, so we have reverted to using homology search.

Can I know the specific judgment method of the objective function you are talking about?

The objective that we maximize is the "consistency score" which we have defined as simply the sum of the consistent contacts minus the sum of the inconsistent contacts. The code is here. In the figure below, the bubble orientations are described with the red/blue node fill colors. The consistent weights are along the curved edges in green and the inconsistent weights are in red. In this example the objective/score would be (68+104)-(6+9). If you flipped one of the bubbles then it would be the opposite.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants