Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping from target to model positions does not contain all positions of internal model numbering #213

Open
sacdallago opened this issue Apr 2, 2019 · 18 comments

Comments

@sacdallago
Copy link
Member

Traceback (most recent call last):
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_backend_develop/lib/python3.5/site-packages/evcouplings/utils/pipeline.py", line 389, in execute_wrapped
    outcfg = execute(**config)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_backend_develop/lib/python3.5/site-packages/evcouplings/utils/pipeline.py", line 185, in execute
    outcfg = runner(**incfg)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_backend_develop/lib/python3.5/site-packages/evcouplings/complex/protocol.py", line 576, in run
    return PROTOCOLS[kwargs["protocol"]](**kwargs)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_backend_develop/lib/python3.5/site-packages/evcouplings/complex/protocol.py", line 454, in best_hit
    kwargs["paralog_identity_threshold"]
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_backend_develop/lib/python3.5/site-packages/evcouplings/complex/protocol.py", line 427, in _load_monomer_info
    annotation_table = read_species_annotation_table(annotations_file)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_backend_develop/lib/python3.5/site-packages/evcouplings/complex/similarity.py", line 61, in read_species_annotation_table
    annotation_file
evcouplings.utils.config.InvalidParameterError: provided annotation file 7a74f5bd34b11f505e287d1fd767fbf9/align_1/7a74f5bd34b11f505e287d1fd767fbf9_annotation.csv has no annotation information
[cd174@login06 runs]$ clearcat 00c7b99d38370a666e9d32e337d25cd0/00c7b99d38370a666e9d32e337d25cd0.failed^C
[cd174@login06 runs]$ cat 00c7b99d38370a666e9d32e337d25cd0/00c7b99d38370a666e9d32e337d25cd0.failed
Traceback (most recent call last):
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_backend_develop/lib/python3.5/site-packages/evcouplings/couplings/mapping.py", line 219, in patch_model
    for pos in model.index_list
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_backend_develop/lib/python3.5/site-packages/evcouplings/couplings/mapping.py", line 219, in <listcomp>
    for pos in model.index_list
KeyError: 4

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_backend_develop/lib/python3.5/site-packages/evcouplings/utils/pipeline.py", line 389, in execute_wrapped
    outcfg = execute(**config)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_backend_develop/lib/python3.5/site-packages/evcouplings/utils/pipeline.py", line 185, in execute
    outcfg = runner(**incfg)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_backend_develop/lib/python3.5/site-packages/evcouplings/mutate/protocol.py", line 326, in run
    return PROTOCOLS[kwargs["protocol"]](**kwargs)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_backend_develop/lib/python3.5/site-packages/evcouplings/mutate/protocol.py", line 202, in complex
    c = MultiSegmentCouplingsModel(kwargs["model_file"], *segment_objects)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_backend_develop/lib/python3.5/site-packages/evcouplings/couplings/mapping.py", line 388, in __init__
    r.patch_model(model=self)
  File "/n/groups/marks/software/anaconda_o2/envs/evcouplings_backend_develop/lib/python3.5/site-packages/evcouplings/couplings/mapping.py", line 223, in patch_model
    "Mapping from target to model positions does "
ValueError: Mapping from target to model positions does not contain all positions of internal model numbering

Runs with this error:

/n/groups/marks/web/backend/runs/

8dbdbb37177af143b6a2f815e805dd05
80600be20c3735f21011334f3b1235e0
9e1b6ed888a8a15aecce724939bca904
90c3840d96414159e4d37765a38f71e2
2ef90e06afc175f200bac8dab09a9ed5
00c7b99d38370a666e9d32e337d25cd0

That's 7 out of 15 failed jobs.

@sacdallago sacdallago changed the title [COMPLEX] Mapping from target to model positions does not contain all positions of internal model numbering Mapping from target to model positions does not contain all positions of internal model numbering Apr 2, 2019
@sacdallago
Copy link
Member Author

Another job has failed on this. Any update on reason / what should be told to the people that encounter this error?

@thomashopf
Copy link
Contributor

@aggreen any idea why this happens?

@aggreen
Copy link
Contributor

aggreen commented May 24, 2019

This is a bug on our end. When we define the segments after monomer concatenation, we do so in the following way (line 68 of complex/protocol.py):

    # merge segments - this allows to have more than one segment per
    # "monomer" alignment
    segments_1 = _modify_segments(kwargs["first_segments"], "A")
    segments_2 = _modify_segments(kwargs["second_segments"], "B")
    segments_complex = segments_1 + segments_2
    outcfg["segments"] = [s.to_list() for s in segments_complex]

The problem with this is that it assumes that the same positions that were lowercased in the monomer alignments are lowercased in the concatenated alignment. This is not a valid assumption, because after concatenation we repeat the row and column filtering procedure. This can sometimes result in fewer positions defined in the segments kwarg than actually exist in the model, leading to the error users are seeing.

The good news is that this only seems to affect the mutate stage (specifically the MultiSegmentCouplingsModel class). The CouplingScores.csv file has a number of positions that corresponds to the actual concatenated alignment, with the correct number of positions per segment.

TLDR: tell users that this is an issue with the mutate and dock stages that we are trying to address. The results for the couplings and compare stage, which is probably what they care the most about anyway, are valid

@thomashopf
Copy link
Contributor

Okay so from that I take it makes most sense for now to disable mutate and dock on EVcomplex server runs at the moment, right? Because it kills the entire run if the problem occurs.

@aggreen
Copy link
Contributor

aggreen commented May 24, 2019 via email

@sacdallago
Copy link
Member Author

@thomashopf @aggreen has that been done, or should I do it? 🤔

@aggreen
Copy link
Contributor

aggreen commented May 27, 2019 via email

@thomashopf
Copy link
Contributor

Not done yet - please update if you can

@sacdallago
Copy link
Member Author

Sorry -- My bad, I got lost.

I've changed:

stages:
    - align_1
    - align_2
    - concatenate
    - couplings
    - compare
    - mutate
    - fold

to

stages:
    - align_1
    - align_2
    - concatenate
    - couplings
    - compare

aka. Removed mutate and fold. I assumed docking is fold in @thomashopf 's comment #213 (comment) .

@thomashopf
Copy link
Contributor

Thanks!

@cross12tamu
Copy link

I am getting this error as well in a few of my jobs. Let me know if you'd like any of the specific error logs.

@thomashopf
Copy link
Contributor

@cross12tamu Can you temporarily work around the error by disabling the mutate and dock stages (see answer by aggreen above)?

@aggreen Is this fixed in your complex update PR?

@aggreen
Copy link
Contributor

aggreen commented Mar 20, 2020

@thomashopf No this has not been fixed yet

@cross12tamu
Copy link

Commenting out the above steps in the pipeline permit completion with no errors (just fyi, as I just realized that I hadn't answered your question)

@cross12tamu
Copy link

cross12tamu commented Apr 1, 2020

This is related to this issue, since the complex pipeline will raise the above error during running. Is there a way to feed EC pairs, results from complex, to the monomer pipeline, and have the fold step follow those restraints as well?

For example:

Protein_1 (P1), residue=15 is a EC with Protein_2 (P2), residue=30. Fold step for monomer pipeline is restrained by P1_15 and P2_30

Is there a way to feed that information to the monomer pipeline? Does it make sense to do so as well?

Hope you guys are doing well! Thanks as always for the responses.

@aggreen
Copy link
Contributor

aggreen commented Apr 3, 2020

Hi Curtis, Currently there is not a way to do what you ask. The protocol we use internally to find the structure of complexes is:

  1. Get the structure of each monomer, either by finding an example in the PDB or by using EVfold to model the structure, if no experimentally solved example can be found
  2. Generate restraints for inter-molecular docking using the top inter-molecular ECs from the evcomplex run
  3. Dock the two monomers using HADDOCK (https://haddock.science.uu.nl/services/HADDOCK2.2/) or some other molecular docking software

I hope you find this helpful!

@thomashopf
Copy link
Contributor

thomashopf commented Apr 3, 2020

As aggreen's reply suggests, it's typically preferally to just model the monomer structures using EVcouplings monomer pipeline runs and then derive the inter-ECs using a complex pipeline run and use these to dock the monomer.

If one really wanted to use the intra-ECs from the complex pipeline to fold monomer structures, you could do this outside the standard pipeline: run the "standard" folding protocol by calling the respective Python function (

def standard(**kwargs):
) and feeding it all the relevant inputs. But again, not something we would necessarily advise to do.

@cross12tamu
Copy link

Thank you guys for the response 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants