-
Notifications
You must be signed in to change notification settings - Fork 673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
groups.unique does not sort the group #3364
Comments
The expectation for selections has been that they come back sorted. Right now that's a defect, at least for 1.x. |
Would you know the expectation for
|
Just to say I think you got the wrong @coredevs |
Oops, sorry! I did mean @MDAnalysis/coredevs. |
I might have prematurely slapped the "defect" tag on here; it would have been more appropriate on an issue directly referencing
Regarding your #3364 (comment)
No, I am not sure. But I don't like the current behavior that will produce erratic results. It would be cleaner if
There might have been speed considerations in the current logic somewhere, so I can see that you might not want to sort unless you have to. I can also see that your proposal of decoupling sorting and uniqueness gives more flexibility. IIRC, sorted arrays can be much faster de-duplicated than unsorted ones, though, so that was probably why the two went hand-in-hand. On balance, however, I like re-affirming user expectations, namely that MDAnalysis unique works very similar to the numpy unique, especially as we tend to treat AtomGroups like "numpy arrays of atoms". |
At which point of the selection process is I think the most important point is that if an atomgroup is already not redundant, then calling |
Somewhat repetitively, at the end of each logical operation selection class (and, or, all, not, global) and at the end of many if not all of the keyword classes. It should probably just be another method on
Yeah, the
Yeah, but in the order of the input atom group -- so in the example where the input is
Just to be clear, in your opinion then |
This is the behaviours I expect. But I may not be representative. My issue with sorting here, is that I can sort afterward, but I cannot go back to the weird order I may have spent time building.
Indeed. |
I'm happy to have
We could add a keyword |
Could we just get a brief consensus please, thumbs up 👍 for sorting and thumbs down 👎 for not sorting Pros for sorting
Cons for sorting
(Please feel free to edit this comment with more points) |
So my understanding here is that if it's not unique it sorts, if it is unique it doesn't sort? Personally I think that for 2.0 the behaviour should be aligned to the original intended behaviour. Then if we want to re-works this and do an API break then fine, but that should be handled in 3.0. |
Yes. So one way or another we should make it consistent. |
So for context, group.unique was implemented in #1922 by @zemanj - to avoid doing a copy operation where possible. If we tag on a sort we'll probably have a performance regression. That being said, it looks like the target for performance improvement here was That's not the case for Same thing for We'd have to change the behaviour of |
I am in favor of making unique behave consistently so that users are not surprised (EDIT: and making it behave as documented and similar to np.unique, i.e., sort). Behind the scene we can special case for performance. |
EDIT: I thought of Would it be useful to have a kwarg to I think we have to distinguish between internal uses where we want to optimize for speed (e.g. reduce copies) and external uses, where consistency by default (and avoiding surprises) is important. |
How about asunique() for returning atomgroups that are unique but unsorted? Parallels with asarray(), ascontiguous() etc. Edit: and I would be in favour of asunique() taking a sort argument but defaulting to not sorting |
Might be me being naive, but do we actually use unique enough that lots of optmization is required here?
👍🏽 For 2.0 at least, I'm in favour of not changing |
I like the suggestion of adding asunique() and make unique always sort.
… Am 7/25/21 um 10:12 schrieb Irfan Alibay ***@***.***>:
Might be me being naive, but do we actually use unique enough that lots of optmization is required here?
How about asunique() for returning atomgroups that are unique but unsorted? Parallels with asarray(), ascontiguous() etc.
Edit: and I would be in favour of asunique() taking a sort argument but defaulting to not sorting
👍🏽 For 2.0 at least, I'm in favour of not changing unique from being a property, so asunique sounds like a good compromise to me.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Looks like by fixing the duplicate handling in #1922 I broke the behavior of What really surprises me, though, is the fact that it took so long for this behavior-breaking change to even being noticed. I think the behavior of |
To comment on @IAlibay's question about the necessity of optimizing: I guess it depends. If you process very large systems, avoiding full copies of atom groups will certainly have an impact. However, my guess is that the most relevant speedup is achieved by caching, and that will also be effective if the |
Fixes #3364 #2977 ## Work done in this PR - Add .asunique. This function (through ._asunique) now does all the grunt work in making unique groups and caching - Change .unique to always return a copy of the sorted unique group - Add .issorted - .unique is no longer cached. _cache['unique'] no longer exists - .sorted_unique and .unsorted_unique are now cached - .unsorted_unique and .sorted_unique can be the same group, which can be self - Added .cache_properties - uniqueness and sorted properties now propagate to children. They did not use to, the new function is called `_set_unique_caches_from` - Add ability to selectively sort `select_atoms` calls through `sorted` argument
Expected behavior
Related to #2977 (comment)
The documentation implies that
atomgroup.unique
will return a sorted atomgroup, so I expect it to. However, there is a confusing contradiction in the documentation.Actual behavior
If the atomgroup is already unique, as per docs, the original unsorted atomgroup is returned. @coredevs is this the desired behaviour? Should we change it or add a new
sorted
attribute?The
.unique
attribute is primarily used inselect_atoms
and wrapping-related methods, i.e.bsphere
,wrap
,unwrap
,translate
, androtate
. I don't think being ordered is important in any of those -- but being sorted is probably important toselect_atoms
? Are any users relying onunique
being sorted, or conversely not being sorted?Code to reproduce the behavior
Current version of MDAnalysis
python -c "import MDAnalysis as mda; print(mda.__version__)"
) 2.0.0-dev0python -V
)?The text was updated successfully, but these errors were encountered: