Example notebooks for HydrogenBondAnalysis #133

p-j-smith · 2021-02-03T19:28:40Z

Hi, sorry it has been so long since we spoke about this in #105. I have finally written a couple of notebooks to illustrate how to use HydrogenBondAnalysis and to calculate hydrogen bond lifetimes.

I wasn't sure exactly what to include/how much detail to go into for the examples, but I've add three notebooks:

https://p-j-smith.github.io/UserGuide/examples/analysis/hydrogen_bonds/hbonds.html This shows the basic usage the class, including the helper methods. It also shows how to use the results for further analysis - in this case, plotting the number of hbonds as a function of z. The final thing is how to store data, including saving it in a pandas DataFrame.
https://p-j-smith.github.io/UserGuide/examples/analysis/hydrogen_bonds/hbonds-selections.html This shows some advanced selections, including use of the between argument.
https://p-j-smith.github.io/UserGuide/examples/analysis/hydrogen_bonds/hbonds-lifetimes.html This shows how to calculate hydrogen bond lifetimes.

For now I've used the current protein-only trajectory rather than adding a new system (as was suggested in #105), but am happy to change this if people think it will provide a better example.

* Fixes issue MDAnalysis#100 * adds examples of selecting residues from each segment, or atoms from each residue

* Fixes MDAnalysis#93 * Adds tutorials on various ways to do analysis in parallel

- Changed 'selection' keyword to 'select' - Added radius_cut_q and explained results more - Added background section - Added bit more intro to readme

- added `match_atoms` keyword - clarified some of the `in_memory`/filename reasons

) * From MDAnalysis/mdanalysis#3053

Fixes MDAnalysis#121 Fixes MDAnalysis#122

Changes: - Added links to API docs for each class/function in each tutorial - Updated links in Quickstart guide - Fixed link to alignment example notebook

Add Binder badge

… hbonds Merge latest commits from upstream master

p-j-smith · 2021-02-03T19:47:40Z

I've just seen that @lilyminium has already created a PR with a notebook showing how to perform the hydrogen bond analysis (PR #125) - sorry again I took so long! I'm not sure what's easiest - perhaps I could remove the first notebook I've written (hbonds.ipynb) and keep the second and third to complement @lilyminium's?

orbeckst · 2021-06-18T17:09:08Z

I don't know what the best way forward is but I do know that we desperately need a proper example for HydrogenBondAnalysis. @p-j-smith @lilyminium could you two figure out how to get the work here and/or in PR #125 into the User Guide rather sooner than later?

I was just reminded by a comment on the mailing list https://groups.google.com/g/mdnalysis-discussion/c/p2vFYvXBM78/m/suZUf_fWBAAJ

p-j-smith · 2021-06-20T09:36:22Z

Tomorrow I will use @lilyminium's notebook as a base and add a couple of things from my first notebook (e.g. how to plot hbonds as a function of ,z and an image showing the geometric criteria for an hbond). Although I'm open to other suggestions too?

Should the notebooks be designed to work with the new develop for 2.0? I mean, should the results be accessed via hbonds.results rather than hbonds.hbonds ?

Also, I think I created a branch off develop rather than off master, so on this PR there are a couple of commits from @lilyminium. Should I create a new branch off master and submit a new PR?

And sorry about how confusing the docs are for people - they clearly need improving. I don't have much experience writing documentation, so I'd be happy to hear about how the docs can be made better.

orbeckst · 2021-06-20T17:10:45Z

Thank you! Docs should be for 2.x at this stage.

For anything else @lilyminium will have better answers than I.

lilyminium · 2021-06-20T17:29:41Z

Many apologies for the delay on this.

Also, I think I created a branch off develop rather than off master

@p-j-smith that depends a little on what you would like to include in your notebook. Master is at 1.1.1 for now and develop is still for 2.x, so it's up to you if you want to make a PR to one and I can back/forwardport it to the other -- but if you're planning to include the new Results thingy then please make it to develop <3

lilyminium · 2021-06-20T21:37:07Z

hbonds notebook review (diff is too large to review on GitHub and I haven't worked out how to do this in VS Code yet):

Overall a great introduction to hbond analysis, thank you! I think you need to make this PR to develop as you use f-strings (python 3.6+) and make reference to pickling.

--- We will load a small water-only systems containing 5 water molecules and 10 frames then find the hydrogen bonds present at each frame.
+++ We will load a small water-only system containing 5 water molecules and 10 frames, and find the hydrogen bonds present at each frame.

Nit pick: could you please make the DHA angle more obvious in the picture? and label the donor and acceptor? (maybe even the hydrogen). Your r_DA line actually goes through another H so it's not totally clear.
Could you please comment that acceptors_sel="name OH2", selects oxygens and that users' oxygens may not be called the same thing? Same with the Hs
I would find your note on updating selections confusing if I weren't clear on what updating atom selections are. Could you please link to the updatingatomgroup selection documentation, and break up the sentence into multiple parts? e.g.
"For a small performance boost, update_selections can always be set to False unless the given selections will result in an AtomGroup that changes across frames, e.g if acceptors_sel="name OH2 and around 3.0 protein" then update_selections must be True" -->

"For a small performance boost, update_selections can always be set to False. However, always set update_selections to True if you think that your selection will update over time -- for example, the number of oxygens within 3.0 angstrom may update over time if you choose acceptors_sel="name OH2 and around 3.0 protein"".

for the hbonds results array:
- Full stop at the end of "The results are stored as a numpy array in the hbonds.hbonds attribute" please
- Could you please show the shape of the array
- Could you please show 1 row of the array and explain what each field is so users can see an example
- Could you please show that the types are floats and that users will need to convert to int to do index stuff with it
plt.title("Number of hydrogeon bonds over time", weight="bold"), hydrogeon -> hydrogen
Could you please refer to the example output in your explanation of the output of hbonds.count_by_type(). i.e in "Each row contains a unique hydrogen bond type. The three columns correspond to: the donor resname and donor name" add a little "(here, the OT atom of the TIP3 water residue)"
n_frames = hbonds.frames.size does hbonds not have a n_frames attribute?
"In a water-only system the average number of hyrogen bonds" hyrogen -> hydrogen
u.atoms[most_common[DONOR]], u.atoms[most_common[HYDROGEN]], u.atoms[most_common[ACCEPTOR]] could you please split this up over multiple lines and print "Most common donor: {u.atoms[most_common[DONOR]]}" etc?
for ts in u.trajectory[hbonds.frames]: could you please explain what you are doing here (slicing the trajectory)?
u.atoms[hbonds.hbonds[hbonds.hbonds[:, FRAME] == ts.frame, DONOR].astype(int)] could you simplify this example?

for frame, donor_indices, *_ in hbonds.hbonds:
	u.trajectory[frame]
	donors = u.atoms[donor_indices.astype(int)]
	zpos = donors.positions[:,2]
    hist, *_ = np.histogram(zpos, bins=bin_edges)
    counts += hist

"Number of hydrogen bonds as a funcion of height" funcion -> function
mean_xy_area = np.mean([np.product(ts.dimensions[:2]) for ts in u.trajectory[hbonds.frames]]) could you please split this up into multiple lines to follow pep8?
interface_zpos could you please explain what interface_zpos is?
The pandas dataframe construction looks quite complicated. What do you think of this instead?

df = pd.DataFrame(hbonds.hbonds.T[:4].astype(int),
				  columns=["Frame",
				  		   "Donor_ix",
			      		   "Hydrogen_ix",
			      		   "Acceptor_ix",])
df["Distances"] = hbonds.hbonds.T[5]
df["Angles"] = hbonds.hbonds.T[6]
df["Donor resname"] = u.atoms[df.Donor_ix].resnames
df["Acceptor resname"] = u.atoms[df.Acceptor_ix].resnames
... and so on

df.to_csv("data/hbonds.csv", index=False) there's no guarantee the user will have "data", could you just save to hbonds.csv?

lilyminium · 2021-06-20T21:44:48Z

hbonds_selections:

print(f"{hbonds.hydrogens_sel=}") This is 3.8+, could you please only use code that works for 3.6+?
Is it possible to use a universe that has not had water removed, so you could plot some output?

That's all I have to say for this one, short and sweet :) Perhaps you could link to it from the hbonds notebook though, as an "if you want more specific selections please see this notebook"?

lilyminium · 2021-06-20T21:57:10Z

hbonds-selections:

Nice notebook, I like all the graphs. Just a few notes:

tau_times = tau_frames * u.trajectory.dt does hbonds not have hbonds.times? .frames, .n_frames, .times should be part of all AnalysisBase results now
Could you from scipy.optimize import curve_fit in the function where you need it instead of up top? (Just a personal preference, others may disagree -- but I skimmed past the imports and was surprised when I got to that part of the notebook)
Could you also print out the curve equation, people might want it?
f"{time_constant=:.2f} ps" here as well, python 3.6+ only please.
"This allows a hydrogen bond to break for up to a specified number of frames and be" the rest of this sentence seems to be missing!
"it can not be considered present at " -> "it will not be considered present at "
"Below we see how changing the intermittenct" -> intermittenct -> intermittency
"Let’s first find the 10 most prevalent hydrogen bonds" Full stop please :)
print(f"{d.resname}-{d.resid}-{d.name}\t{h.name}\t{a.resname}-{a.resid}-{a.name}\t{count=}") python 3.6+ please
for hbond in tqdm(counts[:10]): could you comment your code a bit more, e.g. explain what tqdm is doing here and say it's optional?
"The shape of these curves indicates we have poor statistics in our lifetime calculations - we used only 100 frames and consider a single hydrogen bond." Could you indicate what kind of curve we should be looking for and maybe link a resource for more reading?

p-j-smith · 2021-06-21T10:11:34Z

Thank you for the thorough reviews @lilyminium! I'll work on the notebooks later today

yannclaveau · 2021-06-21T14:41:52Z

Hi,
I tested to calculate the number of hbonds / oxygen in a 20x20x20 water molecule box using lammps.

Following https://www.lammps.org/tutorials/italy14/HandsonAnalysisScripting.pdf
all properties seem correct. However, the hbonds analysis, gives me
TIP4P:O to TIP4P:O = 1.21
which is far from the ~3.3.

I compare your notebook with mine, they are similar.

My simulation ran for 10ps. As the other properties were converged I thought it was enough. Moreover, when I plot the number of hbonds with respect to time it is converged after the 4th time step.

Do you have any idea why I do not get 3.3 ?

[edit] Here is the code I used. I compare with your notebook, the result is the same.

import matplotlib.pyplot as plt
import numpy as np
import MDAnalysis
from MDAnalysis.analysis.hydrogenbonds.hbond_analysis import HydrogenBondAnalysis as HBA

dump      ="wat.lammpsdump"
lammpsData="wat.data"

u = MDAnalysis.Universe(lammpsData,dump,dt=1)

u.add_TopologyAttr('names')
u.select_atoms("type 2").names = "O"
u.select_atoms("type 1").names = "H"

u.add_TopologyAttr('resname')
u.atoms.residues.resnames = 'TIP4P'

hbonds = HBA(universe=u, donors_sel='name O', hydrogens_sel='name H', acceptors_sel='name O', d_h_a_angle_cutoff=150 ,d_a_cutoff=3.0)
h= hbonds.run()

O   = u.select_atoms('name O')
NO  = O.n_atoms
h   = hbonds.count_by_time()
h   = h/NO
t   = hbonds.times

plt.scatter(t,h)
plt.xlabel('time in ps')
plt.ylabel('number of h bonds / H2O')
plt.show()

havg = np.mean(h[3:])
print('average number of h bonds =',havg)

p-j-smith · 2021-06-22T17:16:03Z

Hi @yannclaveau, generally you should ask questions related to usage of MDAalysis on the Google group (https://groups.google.com/g/mdnalysis-discussion) or on Discord (you can sign up to the MDAnalysis server here: https://discord.gg/fXTSfDJyxE).

But to answer your question, hbonds.count_by_time() counts the total of hydrogen bonds present at each frame. However, each water-water hydrogen bond involves two water molecules, so to get the count per molecule havg should be multiplied by two. That will still only be 2.42 hbonds per molecule. The number you get will also depend on the temperature and pressure of your system, plus you might need more than 10 ps to see the formation of a hydrogen bond network throughout your water box.

p-j-smith · 2021-06-23T15:16:25Z

I think I've made all the changes you suggested in your review @lilyminium.

For the hbonds intro notebook:

u.atoms[hbonds.hbonds[hbonds.hbonds[:, FRAME] == ts.frame, DONOR].astype(int)] could you simplify this example?

for frame, donor_indices, *_ in hbonds.hbonds:
    u.trajectory[frame]
    donors = u.atoms[donor_indices.astype(int)]
    zpos = donors.positions[:,2]
    hist, *_ = np.histogram(zpos, bins=bin_edges)
    counts += hist

Yep! Your way is much more readable. It does involve iterating over every hydrogen bond individually, rather than every frame, but I guess for the tutorial readability takes precedence.

p-j-smith · 2021-06-23T15:19:28Z

hbonds-selections:

Is it possible to use a universe that has not had water removed, so you could plot some output?

I think there is currently no protein-water system in the data files, but there is discussion about it in MDAnalysis/MDAnalysisData#45

Perhaps for now we can use the protein-only system, and when there is a protein-water system available I'll update the notebook?

Perhaps you could link to it from the hbonds notebook though, as an "if you want more specific selections please see this notebook"?

At the top of the notebook there are links to the other two hbond notebooks, although I can make them more prominent?

p-j-smith · 2021-06-23T15:23:42Z

hbonds-lifetimes:

tau_times = tau_frames * u.trajectory.dt does hbonds not have hbonds.times? .frames, .n_frames, .times should be part of all AnalysisBase results now

So hbonds.times contains the times used in finding the hydrogen bonds. But tau_frames goes from zero to tau_max, i.e. the autocorrelation is calculated from frame zero up to frame tau_max.

lilyminium · 2021-06-23T18:13:08Z

But tau_frames goes from zero to tau_max, i.e. the autocorrelation is calculated from frame zero up to frame tau_max.

Oh I see, my bad, thanks for explaining.

lilyminium

Thank you so much for adding these notebook examples, and your patience in waiting. HBond analysis is a very popular part of MDAnalysis and many, many people will benefit greatly by having these at reference!

p-j-smith · 2021-06-23T18:45:22Z

I'm happy to have added them, I hope they're useful for people. And thank you for improving them with your reviews!

yannclaveau · 2021-06-24T12:20:52Z

Hi @yannclaveau, generally you should ask questions related to usage of MDAalysis on the Google group (https://groups.google.com/g/mdnalysis-discussion) or on Discord (you can sign up to the MDAnalysis server here: https://discord.gg/fXTSfDJyxE).

But to answer your question, hbonds.count_by_time() counts the total of hydrogen bonds present at each frame. However, each water-water hydrogen bond involves two water molecules, so to get the count per molecule havg should be multiplied by two. That will still only be 2.42 hbonds per molecule. The number you get will also depend on the temperature and pressure of your system, plus you might need more than 10 ps to see the formation of a hydrogen bond network throughout your water box.

Sorry for that, I thought it may be a problem with hbondsanalysis or with the notebook, that's why I asked it here.
Thanks for your answer. However, I did find the same average number following the notebook presented here (using count by type).
with count by time = 1.29
with count by type = 1.29
using donor = O, acceptor = O
regarding the duration of the calculation : the number of H bonds converges after 5 time steps and the temperature and pressure are 298K, 1 atm.
My MDA is v1.1, could it be because of that ?

p-j-smith · 2021-06-24T13:07:01Z

Sorry for that, I thought it may be a problem with hbondsanalysis or with the notebook, that's why I asked it here.

No need to apologise, and you're right that is was a problem with the initial notebook. count_by_type will count the number hydrogen bonds between each unquie pair of donor and acceptor atoms. So to get the number per molecule, you must multiply the count by two. In the first draft of the notebook, I also forgot to account for this. But if you take a look at the final version (https://userguide.mdanalysis.org/2.0.0-dev0/examples/analysis/hydrogen_bonds/hbonds.html) you'll see there is now a multiplication by 2 in order to get the count per molecule.

My MDA is v1.1, could it be because of that ?

I think it should be okay to use 1.1.

regarding the duration of the calculation : the number of H bonds converges after 5 time steps and the temperature and pressure are 298K, 1 atm.

Is your system a box of pure water?

Have you tried comparing the output to that of e.g. VMD HBonds plugin? If you find a different result, or if you have a simple system where you know the number of hydrogen bonds but HydrogenBondAnalysis gives an incorrect result, then please open a new issue (https://github.com/MDAnalysis/mdanalysis/issues). Or, if you can share your data file and trajectory, I can take a quick look at it.

yannclaveau · 2021-06-24T14:33:54Z

@p-j-smith Thanks, that's fine now. Thanks to the final version, I understand why : I thought the distance Rda was between the H and O. (I think it was pictured like that in the previous version?). Choosing Rda = 3.5 A (as staded in https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.76.928 ), instead of 3.0, gives me 1.6, so 3.2 H bonds / H2O.

So everything is fine.

mieczyslaw and others added 19 commits September 25, 2020 13:58

Add examples of selecting groups from other groups

6fbcdd6

* Fixes issue MDAnalysis#100 * adds examples of selecting residues from each segment, or atoms from each residue

Remove selection keyword for select (MDAnalysis#108)

2479c57

Documents for parallel analysis (MDAnalysis#102)

6be2e24

* Fixes MDAnalysis#93 * Adds tutorials on various ways to do analysis in parallel

Change intersphinx link to latest objects.inv (MDAnalysis#107)

fb70da8

Contacts (MDAnalysis#84)

5332b11

- Changed 'selection' keyword to 'select' - Added radius_cut_q and explained results more - Added background section - Added bit more intro to readme

updated align (MDAnalysis#73)

4b0f8ee

- added `match_atoms` keyword - clarified some of the `in_memory`/filename reasons

Add GH actions docs deployment and stable/dev versioning (MDAnalysis#114

7e7be69

) * From MDAnalysis/mdanalysis#3053

Only deploy docs on merge (MDAnalysis#116)

2a8b594

Build CI docs from environment.yml (MDAnalysis#117)

be8147c

Activate mda-user-guide environment in conda (MDAnalysis#118)

95a8783

Move quickstart up to top level (MDAnalysis#113)

6e91444

Add readthedocs config (MDAnalysis#68)

ac1683c

Add windows installation note (MDAnalysis#123)

1de770b

Fixes MDAnalysis#121 Fixes MDAnalysis#122

Add and fix links to API docs and Quickstart (MDAnalysis#129)

797b9af

Changes: - Added links to API docs for each class/function in each tutorial - Updated links in Quickstart guide - Fixed link to alignment example notebook

Update general notebooks (MDAnalysis#124)

0128cd6

Update README.rst

c1fa991

Add Binder badge

Merge branch 'master' of https://github.com/MDAnalysis/UserGuide into…

2d1f01a

… hbonds Merge latest commits from upstream master

Add HydrogenBondAnalysis example notebooks

1e73eaf

Center image and math

72ba913

p-j-smith mentioned this pull request Mar 15, 2021

HydrogenBondAutoCorrel uses deprecated features of numpy MDAnalysis/mdanalysis#2987

Closed

p-j-smith changed the base branch from master to develop June 22, 2021 16:51

Paul added 5 commits June 22, 2021 19:28

Merge remote-tracking branch 'upstream/develop' into hbonds

81525ae

Update hbonds notebook

094cd3b

Update hbonds-selections notebook

b3c8882

Update hbonds-lifeimtes notebook

d14dd75

split mean_xy_area calculation over multiple lines

5e14d33

lilyminium approved these changes Jun 23, 2021

View reviewed changes

lilyminium merged commit f789446 into MDAnalysis:develop Jun 23, 2021

This was referenced Jun 23, 2021

hydrogenbond analysis notebook #105

Closed

Tutorial notebook wishlist #29

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Example notebooks for HydrogenBondAnalysis #133

Example notebooks for HydrogenBondAnalysis #133

p-j-smith commented Feb 3, 2021

p-j-smith commented Feb 3, 2021

orbeckst commented Jun 18, 2021

p-j-smith commented Jun 20, 2021

orbeckst commented Jun 20, 2021

lilyminium commented Jun 20, 2021

lilyminium commented Jun 20, 2021 •

edited

Loading

lilyminium commented Jun 20, 2021

lilyminium commented Jun 20, 2021

p-j-smith commented Jun 21, 2021

yannclaveau commented Jun 21, 2021 •

edited

Loading

p-j-smith commented Jun 22, 2021

p-j-smith commented Jun 23, 2021

p-j-smith commented Jun 23, 2021 •

edited

Loading

p-j-smith commented Jun 23, 2021 •

edited

Loading

lilyminium commented Jun 23, 2021

lilyminium left a comment

p-j-smith commented Jun 23, 2021

yannclaveau commented Jun 24, 2021 •

edited

Loading

p-j-smith commented Jun 24, 2021

yannclaveau commented Jun 24, 2021

Example notebooks for HydrogenBondAnalysis #133

Example notebooks for HydrogenBondAnalysis #133

Conversation

p-j-smith commented Feb 3, 2021

p-j-smith commented Feb 3, 2021

orbeckst commented Jun 18, 2021

p-j-smith commented Jun 20, 2021

orbeckst commented Jun 20, 2021

lilyminium commented Jun 20, 2021

lilyminium commented Jun 20, 2021 • edited Loading

lilyminium commented Jun 20, 2021

lilyminium commented Jun 20, 2021

p-j-smith commented Jun 21, 2021

yannclaveau commented Jun 21, 2021 • edited Loading

p-j-smith commented Jun 22, 2021

p-j-smith commented Jun 23, 2021

p-j-smith commented Jun 23, 2021 • edited Loading

p-j-smith commented Jun 23, 2021 • edited Loading

lilyminium commented Jun 23, 2021

lilyminium left a comment

Choose a reason for hiding this comment

p-j-smith commented Jun 23, 2021

yannclaveau commented Jun 24, 2021 • edited Loading

p-j-smith commented Jun 24, 2021

yannclaveau commented Jun 24, 2021

lilyminium commented Jun 20, 2021 •

edited

Loading

yannclaveau commented Jun 21, 2021 •

edited

Loading

p-j-smith commented Jun 23, 2021 •

edited

Loading

p-j-smith commented Jun 23, 2021 •

edited

Loading

yannclaveau commented Jun 24, 2021 •

edited

Loading