Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example notebooks for HydrogenBondAnalysis #133

Merged
merged 24 commits into from
Jun 23, 2021

Conversation

p-j-smith
Copy link
Member

Hi, sorry it has been so long since we spoke about this in #105. I have finally written a couple of notebooks to illustrate how to use HydrogenBondAnalysis and to calculate hydrogen bond lifetimes.

I wasn't sure exactly what to include/how much detail to go into for the examples, but I've add three notebooks:

  1. https://p-j-smith.github.io/UserGuide/examples/analysis/hydrogen_bonds/hbonds.html This shows the basic usage the class, including the helper methods. It also shows how to use the results for further analysis - in this case, plotting the number of hbonds as a function of z. The final thing is how to store data, including saving it in a pandas DataFrame.

  2. https://p-j-smith.github.io/UserGuide/examples/analysis/hydrogen_bonds/hbonds-selections.html This shows some advanced selections, including use of the between argument.

  3. https://p-j-smith.github.io/UserGuide/examples/analysis/hydrogen_bonds/hbonds-lifetimes.html This shows how to calculate hydrogen bond lifetimes.

For now I've used the current protein-only trajectory rather than adding a new system (as was suggested in #105), but am happy to change this if people think it will provide a better example.

mieczyslaw and others added 19 commits September 25, 2020 13:58
* Fixes issue MDAnalysis#100 
* adds examples of selecting residues from each segment, or atoms from each residue
* Fixes MDAnalysis#93 
* Adds tutorials on various ways to do analysis in parallel
- Changed 'selection' keyword to 'select'
- Added radius_cut_q and explained results more
- Added background section
- Added bit more intro to readme
- added `match_atoms` keyword
- clarified some of the `in_memory`/filename reasons
Changes:
- Added links to API docs for each class/function in each tutorial
- Updated links in Quickstart guide
- Fixed link to alignment example notebook
Add Binder badge
… hbonds

Merge latest commits from upstream master
@p-j-smith
Copy link
Member Author

I've just seen that @lilyminium has already created a PR with a notebook showing how to perform the hydrogen bond analysis (PR #125) - sorry again I took so long! I'm not sure what's easiest - perhaps I could remove the first notebook I've written (hbonds.ipynb) and keep the second and third to complement @lilyminium's?

@orbeckst
Copy link
Member

I don't know what the best way forward is but I do know that we desperately need a proper example for HydrogenBondAnalysis. @p-j-smith @lilyminium could you two figure out how to get the work here and/or in PR #125 into the User Guide rather sooner than later?

I was just reminded by a comment on the mailing list https://groups.google.com/g/mdnalysis-discussion/c/p2vFYvXBM78/m/suZUf_fWBAAJ

@p-j-smith
Copy link
Member Author

Tomorrow I will use @lilyminium's notebook as a base and add a couple of things from my first notebook (e.g. how to plot hbonds as a function of ,z and an image showing the geometric criteria for an hbond). Although I'm open to other suggestions too?

Should the notebooks be designed to work with the new develop for 2.0? I mean, should the results be accessed via hbonds.results rather than hbonds.hbonds ?

Also, I think I created a branch off develop rather than off master, so on this PR there are a couple of commits from @lilyminium. Should I create a new branch off master and submit a new PR?

And sorry about how confusing the docs are for people - they clearly need improving. I don't have much experience writing documentation, so I'd be happy to hear about how the docs can be made better.

@orbeckst
Copy link
Member

Thank you! Docs should be for 2.x at this stage.

For anything else @lilyminium will have better answers than I.

@lilyminium
Copy link
Member

Many apologies for the delay on this.

Also, I think I created a branch off develop rather than off master

@p-j-smith that depends a little on what you would like to include in your notebook. Master is at 1.1.1 for now and develop is still for 2.x, so it's up to you if you want to make a PR to one and I can back/forwardport it to the other -- but if you're planning to include the new Results thingy then please make it to develop <3

@lilyminium
Copy link
Member

lilyminium commented Jun 20, 2021

hbonds notebook review (diff is too large to review on GitHub and I haven't worked out how to do this in VS Code yet):

Overall a great introduction to hbond analysis, thank you! I think you need to make this PR to develop as you use f-strings (python 3.6+) and make reference to pickling.

--- We will load a small water-only systems containing 5 water molecules and 10 frames then find the hydrogen bonds present at each frame.
+++ We will load a small water-only system containing 5 water molecules and 10 frames, and find the hydrogen bonds present at each frame.
  • Nit pick: could you please make the DHA angle more obvious in the picture? and label the donor and acceptor? (maybe even the hydrogen). Your r_DA line actually goes through another H so it's not totally clear.

  • Could you please comment that acceptors_sel="name OH2", selects oxygens and that users' oxygens may not be called the same thing? Same with the Hs

  • I would find your note on updating selections confusing if I weren't clear on what updating atom selections are. Could you please link to the updatingatomgroup selection documentation, and break up the sentence into multiple parts? e.g.

  • "For a small performance boost, update_selections can always be set to False unless the given selections will result in an AtomGroup that changes across frames, e.g if acceptors_sel="name OH2 and around 3.0 protein" then update_selections must be True" -->

"For a small performance boost, update_selections can always be set to False. However, always set update_selections to True if you think that your selection will update over time -- for example, the number of oxygens within 3.0 angstrom may update over time if you choose acceptors_sel="name OH2 and around 3.0 protein"".

  • for the hbonds results array:

    • Full stop at the end of "The results are stored as a numpy array in the hbonds.hbonds attribute" please
    • Could you please show the shape of the array
    • Could you please show 1 row of the array and explain what each field is so users can see an example
    • Could you please show that the types are floats and that users will need to convert to int to do index stuff with it
  • plt.title("Number of hydrogeon bonds over time", weight="bold"), hydrogeon -> hydrogen

  • Could you please refer to the example output in your explanation of the output of hbonds.count_by_type(). i.e in "Each row contains a unique hydrogen bond type. The three columns correspond to: the donor resname and donor name" add a little "(here, the OT atom of the TIP3 water residue)"

  • n_frames = hbonds.frames.size does hbonds not have a n_frames attribute?

  • "In a water-only system the average number of hyrogen bonds" hyrogen -> hydrogen

  • u.atoms[most_common[DONOR]], u.atoms[most_common[HYDROGEN]], u.atoms[most_common[ACCEPTOR]] could you please split this up over multiple lines and print "Most common donor: {u.atoms[most_common[DONOR]]}" etc?

  • for ts in u.trajectory[hbonds.frames]: could you please explain what you are doing here (slicing the trajectory)?

  • u.atoms[hbonds.hbonds[hbonds.hbonds[:, FRAME] == ts.frame, DONOR].astype(int)] could you simplify this example?

for frame, donor_indices, *_ in hbonds.hbonds:
	u.trajectory[frame]
	donors = u.atoms[donor_indices.astype(int)]
	zpos = donors.positions[:,2]
    hist, *_ = np.histogram(zpos, bins=bin_edges)
    counts += hist
  • "Number of hydrogen bonds as a funcion of height" funcion -> function
  • mean_xy_area = np.mean([np.product(ts.dimensions[:2]) for ts in u.trajectory[hbonds.frames]]) could you please split this up into multiple lines to follow pep8?
  • interface_zpos could you please explain what interface_zpos is?
  • The pandas dataframe construction looks quite complicated. What do you think of this instead?
df = pd.DataFrame(hbonds.hbonds.T[:4].astype(int),
				  columns=["Frame",
				  		   "Donor_ix",
			      		   "Hydrogen_ix",
			      		   "Acceptor_ix",])
df["Distances"] = hbonds.hbonds.T[5]
df["Angles"] = hbonds.hbonds.T[6]
df["Donor resname"] = u.atoms[df.Donor_ix].resnames
df["Acceptor resname"] = u.atoms[df.Acceptor_ix].resnames
... and so on
  • df.to_csv("data/hbonds.csv", index=False) there's no guarantee the user will have "data", could you just save to hbonds.csv?

@lilyminium
Copy link
Member

hbonds_selections:

  • print(f"{hbonds.hydrogens_sel=}") This is 3.8+, could you please only use code that works for 3.6+?
  • Is it possible to use a universe that has not had water removed, so you could plot some output?

That's all I have to say for this one, short and sweet :) Perhaps you could link to it from the hbonds notebook though, as an "if you want more specific selections please see this notebook"?

@lilyminium
Copy link
Member

hbonds-selections:

Nice notebook, I like all the graphs. Just a few notes:

  • tau_times = tau_frames * u.trajectory.dt does hbonds not have hbonds.times? .frames, .n_frames, .times should be part of all AnalysisBase results now
  • Could you from scipy.optimize import curve_fit in the function where you need it instead of up top? (Just a personal preference, others may disagree -- but I skimmed past the imports and was surprised when I got to that part of the notebook)
  • Could you also print out the curve equation, people might want it?
  • f"{time_constant=:.2f} ps" here as well, python 3.6+ only please.
  • "This allows a hydrogen bond to break for up to a specified number of frames and be" the rest of this sentence seems to be missing!
  • "it can not be considered present at " -> "it will not be considered present at "
  • "Below we see how changing the intermittenct" -> intermittenct -> intermittency
  • "Let’s first find the 10 most prevalent hydrogen bonds" Full stop please :)
  • print(f"{d.resname}-{d.resid}-{d.name}\t{h.name}\t{a.resname}-{a.resid}-{a.name}\t{count=}") python 3.6+ please
  • for hbond in tqdm(counts[:10]): could you comment your code a bit more, e.g. explain what tqdm is doing here and say it's optional?
  • "The shape of these curves indicates we have poor statistics in our lifetime calculations - we used only 100 frames and consider a single hydrogen bond." Could you indicate what kind of curve we should be looking for and maybe link a resource for more reading?

@p-j-smith
Copy link
Member Author

Thank you for the thorough reviews @lilyminium! I'll work on the notebooks later today

@yannclaveau
Copy link

yannclaveau commented Jun 21, 2021

Hi,
I tested to calculate the number of hbonds / oxygen in a 20x20x20 water molecule box using lammps.

Following https://www.lammps.org/tutorials/italy14/HandsonAnalysisScripting.pdf
all properties seem correct. However, the hbonds analysis, gives me
TIP4P:O to TIP4P:O = 1.21
which is far from the ~3.3.

I compare your notebook with mine, they are similar.

My simulation ran for 10ps. As the other properties were converged I thought it was enough. Moreover, when I plot the number of hbonds with respect to time it is converged after the 4th time step.

Do you have any idea why I do not get 3.3 ?

[edit] Here is the code I used. I compare with your notebook, the result is the same.

import matplotlib.pyplot as plt
import numpy as np
import MDAnalysis
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis as HBA

dump      ="wat.lammpsdump"
lammpsData="wat.data"

u = MDAnalysis.Universe(lammpsData,dump,dt=1)

u.add_TopologyAttr('names')
u.select_atoms("type 2").names = "O"
u.select_atoms("type 1").names = "H"

u.add_TopologyAttr('resname')
u.atoms.residues.resnames = 'TIP4P'

hbonds = HBA(universe=u, donors_sel='name O', hydrogens_sel='name H', acceptors_sel='name O', d_h_a_angle_cutoff=150 ,d_a_cutoff=3.0)
h= hbonds.run()

O   = u.select_atoms('name O')
NO  = O.n_atoms
h   = hbonds.count_by_time()
h   = h/NO
t   = hbonds.times

plt.scatter(t,h)
plt.xlabel('time in ps')
plt.ylabel('number of h bonds / H2O')
plt.show()

havg = np.mean(h[3:])
print('average number of h bonds =',havg)

@p-j-smith p-j-smith changed the base branch from master to develop June 22, 2021 16:51
@p-j-smith
Copy link
Member Author

Hi @yannclaveau, generally you should ask questions related to usage of MDAalysis on the Google group (https://groups.google.com/g/mdnalysis-discussion) or on Discord (you can sign up to the MDAnalysis server here: https://discord.gg/fXTSfDJyxE).

But to answer your question, hbonds.count_by_time() counts the total of hydrogen bonds present at each frame. However, each water-water hydrogen bond involves two water molecules, so to get the count per molecule havg should be multiplied by two. That will still only be 2.42 hbonds per molecule. The number you get will also depend on the temperature and pressure of your system, plus you might need more than 10 ps to see the formation of a hydrogen bond network throughout your water box.

@p-j-smith
Copy link
Member Author

I think I've made all the changes you suggested in your review @lilyminium.

For the hbonds intro notebook:

  • u.atoms[hbonds.hbonds[hbonds.hbonds[:, FRAME] == ts.frame, DONOR].astype(int)] could you simplify this example?

for frame, donor_indices, *_ in hbonds.hbonds:
    u.trajectory[frame]
    donors = u.atoms[donor_indices.astype(int)]
    zpos = donors.positions[:,2]
    hist, *_ = np.histogram(zpos, bins=bin_edges)
    counts += hist

Yep! Your way is much more readable. It does involve iterating over every hydrogen bond individually, rather than every frame, but I guess for the tutorial readability takes precedence.

@p-j-smith
Copy link
Member Author

p-j-smith commented Jun 23, 2021

hbonds-selections:

  • Is it possible to use a universe that has not had water removed, so you could plot some output?

I think there is currently no protein-water system in the data files, but there is discussion about it in MDAnalysis/MDAnalysisData#45

Perhaps for now we can use the protein-only system, and when there is a protein-water system available I'll update the notebook?

  • Perhaps you could link to it from the hbonds notebook though, as an "if you want more specific selections please see this notebook"?

At the top of the notebook there are links to the other two hbond notebooks, although I can make them more prominent?

@p-j-smith
Copy link
Member Author

p-j-smith commented Jun 23, 2021

hbonds-lifetimes:

  • tau_times = tau_frames * u.trajectory.dt does hbonds not have hbonds.times? .frames, .n_frames, .times should be part of all AnalysisBase results now

So hbonds.times contains the times used in finding the hydrogen bonds. But tau_frames goes from zero to tau_max, i.e. the autocorrelation is calculated from frame zero up to frame tau_max.

@lilyminium
Copy link
Member

But tau_frames goes from zero to tau_max, i.e. the autocorrelation is calculated from frame zero up to frame tau_max.

Oh I see, my bad, thanks for explaining.

Copy link
Member

@lilyminium lilyminium left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for adding these notebook examples, and your patience in waiting. HBond analysis is a very popular part of MDAnalysis and many, many people will benefit greatly by having these at reference!

@lilyminium lilyminium merged commit f789446 into MDAnalysis:develop Jun 23, 2021
This was referenced Jun 23, 2021
@p-j-smith
Copy link
Member Author

I'm happy to have added them, I hope they're useful for people. And thank you for improving them with your reviews!

@yannclaveau
Copy link

yannclaveau commented Jun 24, 2021

Hi @yannclaveau, generally you should ask questions related to usage of MDAalysis on the Google group (https://groups.google.com/g/mdnalysis-discussion) or on Discord (you can sign up to the MDAnalysis server here: https://discord.gg/fXTSfDJyxE).

But to answer your question, hbonds.count_by_time() counts the total of hydrogen bonds present at each frame. However, each water-water hydrogen bond involves two water molecules, so to get the count per molecule havg should be multiplied by two. That will still only be 2.42 hbonds per molecule. The number you get will also depend on the temperature and pressure of your system, plus you might need more than 10 ps to see the formation of a hydrogen bond network throughout your water box.

Sorry for that, I thought it may be a problem with hbondsanalysis or with the notebook, that's why I asked it here.
Thanks for your answer. However, I did find the same average number following the notebook presented here (using count by type).
with count by time = 1.29
with count by type = 1.29
using donor = O, acceptor = O
regarding the duration of the calculation : the number of H bonds converges after 5 time steps and the temperature and pressure are 298K, 1 atm.
My MDA is v1.1, could it be because of that ?

@p-j-smith
Copy link
Member Author

Sorry for that, I thought it may be a problem with hbondsanalysis or with the notebook, that's why I asked it here.

No need to apologise, and you're right that is was a problem with the initial notebook. count_by_type will count the number hydrogen bonds between each unquie pair of donor and acceptor atoms. So to get the number per molecule, you must multiply the count by two. In the first draft of the notebook, I also forgot to account for this. But if you take a look at the final version (https://userguide.mdanalysis.org/2.0.0-dev0/examples/analysis/hydrogen_bonds/hbonds.html) you'll see there is now a multiplication by 2 in order to get the count per molecule.

My MDA is v1.1, could it be because of that ?

I think it should be okay to use 1.1.

regarding the duration of the calculation : the number of H bonds converges after 5 time steps and the temperature and pressure are 298K, 1 atm.

Is your system a box of pure water?

Have you tried comparing the output to that of e.g. VMD HBonds plugin? If you find a different result, or if you have a simple system where you know the number of hydrogen bonds but HydrogenBondAnalysis gives an incorrect result, then please open a new issue (https://github.com/MDAnalysis/mdanalysis/issues). Or, if you can share your data file and trajectory, I can take a quick look at it.

@yannclaveau
Copy link

@p-j-smith Thanks, that's fine now. Thanks to the final version, I understand why : I thought the distance Rda was between the H and O. (I think it was pictured like that in the previous version?). Choosing Rda = 3.5 A (as staded in https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.76.928 ), instead of 3.0, gives me 1.6, so 3.2 H bonds / H2O.

So everything is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants