Pyloos subset#132
Conversation
|
@tromo : I thought we had a version of this implemented in the C++, but poking around in a couple of the implementations, it doesn't look like it. The closest I could find is In any event, I have some misgivings about this feature, because the main AtomicGroup (and thus any other AtomicGroups created by selecting from it) will be stale, which could lead to surprising outcomes, since in general we make a big deal about not having to worry about it. However, @lgsmith put LOTS of warnings in there, so it's probably ok. My bigger concern is with the call to For a test, I'd suggest opening 2 copies of the same trajectory, one using subset update, one using full update, then comparing the coordinates for the subset. Moreover, you'll need to test it with a couple of different trajectory formats, because they each have their own implementation of |
|
Yeah I've tested it on that workflow exactly. The subset does in fact pull
the correct atoms, in my testing. And as I said it's a bit faster for
systems where the buffer has way more atoms in it than are going to be
interacted with.
…On Thu, May 14, 2026 at 4:21 PM Grossfield Lab ***@***.***> wrote:
*agrossfield* left a comment (GrossfieldLab/loos#132)
<#132 (comment)>
@tromo <https://github.com/tromo> : I thought we had a version of this
implemented in the C++, but poking around in a couple of the
implementations, it doesn't look like it. The closest I could find is
DCD::mappedCoords, but that's not called anywhere and theres a comment
elsewhere saying it's slated for removal. Am I crazy?
In any event, I have some misgivings about this feature, because the main
AtomicGroup (and thus any other AtomicGroups created by selecting from it)
will be stale, which could lead to surprising outcomes, since in general we
make a big deal about not having to worry about it. However, @lgsmith
<https://github.com/lgsmith> put LOTS of warnings in there, so it's
probably ok.
My bigger concern is with the call to updateGroupCoords. I think in
general that just does a straight for-loop over atoms, so it's not clear to
me how that will work if the subset selection isn't a contiguous block at
the beginning of the full model. @lgsmith <https://github.com/lgsmith>:
have you tested this with a selection for eg alpha carbons? I'd need to
look more carefully, but I suspect you won't get the same thing.
For a test, I'd suggest opening 2 copies of the same trajectory, one using
subset update, one using full update, then comparing the coordinates for
the subset.
—
Reply to this email directly, view it on GitHub
<#132 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABAZVYNJWREWVKZXB6J5AHD42YTFVAVCNFSM6AAAAACY6IATYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DINJUGQYTOOJWGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
Actually hang on a minute. The warnings are irritating, so I'm going to give the user a flag to suppress them that will be more typing once but won't fill stderr with nonsense about your potentially foolhardy selection choices if you read many trajs in one script. |
|
Did you check with multiple trajectory file formats? From offline conversation, I think you tried it with DCD. Maybe mdtraj's hdf5 format? or Amber NETCDF? They don't share an implementation of updateGroupCoords(), so I want to make sure this works everywhere before we merge. |
I don't think it's actually necessary have the warnings get printed -- the point is to warn developers, not end users. Maybe a giant box comment above the implementation? (I recognize you've already marked it as dangerous in the options) In addition, it would be wonderful if you prepped a howto for the wiki, explaining the use case and benefits as well as the risks. (that's not a barrier to merging, just something that would make me happy) |
get warnings about the risk they're taking.
|
OK, this should be working the way that I want it to now. Here's the test I did to check for the coordinate match: |
|
The test looks good, but I want to see it for other trajectory file formats as well. |
|
Here's an updated benchmark: Here's the output from running it with the xtc extension (note the performance improvement is pretty modest here; I think this trajectory is so short that dealing with the xtc compression is still dominating the cost of reading it) And if I change the file name back to the dcd (I just converted it by having loos write an xtc copy): Make sense? |
|
Regarding the comment about the warnings: I am happy to have fewer warnings (warnings about unused fields from the beginnings of pdbs litter my outfiles and I don't always write a |
name: Pyloos Subset
about: adding an optional subset functionality that deploys update group coords on the subset kwarg AtomicGroup.
title: 'Pyloos Subset'
labels: 'feature'
assignees: ''
Changes proposed in this pull request
PyLOOS.Trajectoryobject, setfull_update=Falseto set theself._targetatomic group to theself._subsetatomic group already defined. Otherwise_modelis bound to target.updateGroupCoordsis always applied toself._targetrefreshModel()that calls updateGroupCoords on theself._modelAG instead.Comment
This was done to enhance performance on reading subsets of trajectories where the difference in atom counts beween the full frame and the subset is really significant. For example, when analyzing a protein in a large water box, the fraction of C$\alpha$ s to all atoms might be 0.2%. At that kind of ratio, I was seeing a 4.4x speedup from using this subsetting for just reading through the coordinates I wanted.