Refactor RMSD analyses into MDAnalysis AnaysisBase classes#90
Refactor RMSD analyses into MDAnalysis AnaysisBase classes#90hannahbaumann wants to merge 12 commits into
AnaysisBase classes#90Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #90 +/- ##
==========================================
+ Coverage 96.09% 96.37% +0.28%
==========================================
Files 6 6
Lines 333 359 +26
==========================================
+ Hits 320 346 +26
Misses 13 13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@talagayev and @jthorton : This is a first go at the refactor into the individual RMSD classes, based on the MDAnalysis |
AnaysisBase classes
| class LigandRMSD(AnalysisBase): | ||
| """ | ||
| 1D RMSD time series for a ligand AtomGroup. | ||
| """ | ||
|
|
||
| def __init__(self, atomgroup, **kwargs): | ||
| super(LigandRMSD, self).__init__(atomgroup.universe.trajectory, **kwargs) | ||
|
|
||
| self._ag = atomgroup | ||
|
|
||
| def _prepare(self): | ||
| self.results.rmsd = [] | ||
| self._reference = self._ag.positions | ||
| self._weights = self._ag.masses / np.mean(self._ag.masses) | ||
|
|
||
| def _single_frame(self): | ||
| rmsd = rms.rmsd( | ||
| self._ag.positions, | ||
| self._reference, | ||
| self._weights, | ||
| center=False, | ||
| superposition=False, | ||
| ) | ||
| self.results.rmsd.append(rmsd) | ||
|
|
||
| def _conclude(self): | ||
| self.results.rmsd = np.asarray(self.results.rmsd) |
There was a problem hiding this comment.
2 initial thoughts:
- Can we make a more general RMSD class that can be reused for protein and ligand analysis, basically it should work on any atom group and does this not already exist in MDAnalysis?
- Do we not want to switch to use the symmetry RMSD (spyrmsd I think its called)?
There was a problem hiding this comment.
@jthorton : There's a difference between the RMSD class in MDAnalysis in what we've been doing, and I'm not sure yet what we want. In MDAnalysis, by default they are doing a superposition (rotational and translational), while we didn't do that. Now I'm not sure, what are we actually interested in, esp. for the ligand RMSD. Should this be the RMSD of the internal conformation of the ligand, or the RMSD of the ligand in the binding pocket/ligand pose? I think in the solvent we would definitely be more interested in the internal conformation of the ligand. In the binding pocket, I'm not sure. We also calculate the ligand COM displacement as a measure of stability in the pocket, but I'm not sure what we would want for the RMSD. What do you think?
talagayev
left a comment
There was a problem hiding this comment.
@hannahbaumann I like it, looks very good to me :) also the structure is like for the MDAnalysis AnalysisBase classes, which is I think @IAlibay wanted to have and also adressing it that it takes any atom group adressing @jthorton comment is good.
Overall Looks good to me :)
| prot_rmsd = RMSDAnalysis(prot).run(step=skip) | ||
| output["protein_RMSD"].append(prot_rmsd.results.rmsd) | ||
| # # Using the MDAnalysis RMSD class instead | ||
| # gs = ["protein and name CA", "protein"] |
There was a problem hiding this comment.
Here is an example of how we could do the same analysis using the MDAnalysis RMSD class.
| self, atomgroup, reference=None, mass_weighted=False, superposition=False, **kwargs | ||
| ): | ||
| super(RMSDAnalysis, self).__init__(atomgroup.universe.trajectory, **kwargs) | ||
|
|
There was a problem hiding this comment.
_analysis_algorithm_is_parallelizable
|
| prot_rmsd2d = Protein2DRMSD(prot).run(step=skip) | ||
| output["protein_2D_RMSD"].append(prot_rmsd2d.results.rmsd2d) | ||
| # # Using the MDAnalysis DistanceMatrix class | ||
| # prot_rmsd2d = diffusionmap.DistanceMatrix(u, select="protein and name CA") |
There was a problem hiding this comment.
This MDA code is much slower, on the test data 10s vs. 0.4s.
| output["protein_RMSD"].append(prot_rmsd.results.rmsd) | ||
| # # Using the MDAnalysis RMSD class instead | ||
| # gs = ["protein and name CA"] | ||
| # prot_rmsd = rms.RMSD( |
There was a problem hiding this comment.
The two RMSD classes are approximately equal in timing (on the test data)
| # output["protein_2D_RMSD"].append(flattened) | ||
|
|
||
| if ligand.n_atoms > 0: | ||
| lig_rmsd = RMSDAnalysis(ligand, mass_weighted=True).run(step=skip) |
There was a problem hiding this comment.
Ligand RMSD is currently calculated on the hybrid topology, which may not be what we want long term.
There was a problem hiding this comment.
For a separate PR - the atom selection (or atomgroup) should really be user defined rather than defaulting to UNK.
This might be a good argument for letting Protocols deal with this rather than making it uniform.
| For all unique frame pairs ``(i, j)`` with ``i < j``, this function | ||
| computes the RMSD between atomic coordinates after optimal alignment. | ||
| """ | ||
|
|
There was a problem hiding this comment.
Can you explicitly define _analysis_algorithm_is_parallelizable = False (it's inherited by default, but it would be good to have it explicitly defined here) in these classes and then raise an issue about looking into parallism?
| self.results.rmsd2d = [] | ||
|
|
||
| def _single_frame(self): | ||
| self._coords.append(self._ag.positions.copy()) |
There was a problem hiding this comment.
Note that this is effectively the same as putting the whole trajectory into memory. I'm saying this as an FYI that it will be a possible failure mode when someone runs a really long simulation and then tries to use this class.
Might be good to document in the docstring notes.
| self.results.rmsd2d = [] | ||
|
|
||
| def _single_frame(self): | ||
| self._coords.append(self._ag.positions.copy()) |
| Flattened 2D RMSD matrix | ||
|
|
||
| For all unique frame pairs ``(i, j)`` with ``i < j``, this function | ||
| computes the RMSD between atomic coordinates after optimal alignment. |
There was a problem hiding this comment.
Can you maybe expand this to mention you're doing a center of geometry fit as well as a rotational and translational superposition usingg QCP?
| self, atomgroup, reference=None, mass_weighted=False, superposition=False, **kwargs | ||
| ): | ||
| super(RMSDAnalysis, self).__init__(atomgroup.universe.trajectory, **kwargs) | ||
|
|
|
|
||
| def _single_frame(self): | ||
| # distance between start and current ligand position | ||
| # ignores PBC, but we've already centered the traj |
There was a problem hiding this comment.
Why ignore PBC? Could you not just pass the box kwarg argument along? Or is the box distorted because of the transformation?
Please document this in the docstring.
| self.results.com_drift.append(drift) | ||
|
|
||
| def _conclude(self): | ||
| self.results.com_drift = np.asarray(self.results.com_drift) |
There was a problem hiding this comment.
Nit: it may be more ever so slightly efficient to just pre-allocate the array ahead of time in _prepare by defining a numpy array of length self.n_frames. This also has the nice side effect of not needing a _conclude definition.
| output.append(rmsd) | ||
| prot_rmsd = RMSDAnalysis(prot).run(step=skip) | ||
| output["protein_RMSD"].append(prot_rmsd.results.rmsd) | ||
| # # Using the MDAnalysis RMSD class instead |
There was a problem hiding this comment.
Please remember to remove the commented out regions.
| # output["protein_2D_RMSD"].append(flattened) | ||
|
|
||
| if ligand.n_atoms > 0: | ||
| lig_rmsd = RMSDAnalysis(ligand, mass_weighted=True).run(step=skip) |
There was a problem hiding this comment.
For a separate PR - the atom selection (or atomgroup) should really be user defined rather than defaulting to UNK.
This might be a good argument for letting Protocols deal with this rather than making it uniform.
| # lig_rmsd.run(step=skip) | ||
| # output["ligand_RMSD"].append(lig_rmsd.results.rmsd.T[3]) | ||
| lig_com_drift = LigandCOMDrift(ligand).run(step=skip) | ||
| output["ligand_wander"].append(lig_com_drift.results.com_drift) |
There was a problem hiding this comment.
I know this is historical, so it doesn't have to be here, but can we please renamed this to ligand_com_drift or anything else? wander is such an unspecific name 😅
Refactor the rmsd analysis using this example: https://docs.mdanalysis.org/2.7.0/documentation_pages/analysis/base.html
gather_rms_datafunction but access the individual analysis classes directly in the openfeProtocol