-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add array order processes #65
feat: add array order processes #65
Conversation
Codecov Report
@@ Coverage Diff @@
## main #65 +/- ##
==========================================
+ Coverage 73.23% 73.77% +0.53%
==========================================
Files 22 22
Lines 766 816 +50
==========================================
+ Hits 561 602 +41
- Misses 205 214 +9
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
permutation_idxs = np.argsort(data, kind="mergesort", axis=axis) | ||
else: # [::-1] not possible | ||
permutation_idxs = np.argsort( | ||
-data, kind="mergesort", axis=axis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only works with numerical types, but will fail if trying to sort string or datetime arrays!
Why not just use np.flip
on the found indices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had used flip before, but switched to this approach because of the following side effect:
Using the example in the documentation, order(data = [6,-1,2,null,7,4,null,8,3,9,9], asc = false, nodata = false) should result in [3, 6, 9, 10, 7, 4, 0, 5, 8, 2, 1].
With np.flip our result becomes [6, 3, 10, 9, 7, 4, 0, 5, 8, 2, 1], so the values that are the same are in the opposite order. Switching these after flip was used does not seem too convenient to me, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for this - hmm, what we could do would be to count the number of nans before, only flip the array[index_last_nan:] and stitch together again? Then we should have strings/datetimes covered too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you are referring to would result in an array like [3, 6, 10, 9, 7, 4, 0, 5, 8, 2, 1], so the nan values would be handled. If there are values in the data twice, (like the "9" in the example), the order of these is also flipped. (The result in the example should be [3, 6, 9, 10, 7, 4, 0, 5, 8, 2, 1])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. @m-mohr what do you think about relaxing the Ties will be left in their original ordering
requirement in the process spec for order
? Is this strictly necessary for some usecase? Seems like implementation on our end would be simpler if this didn't have to be guaranteed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get comparable and reproducible results, you can't let this undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that's fair, so we cannot drop this requirement completely - what about changing it such that ties will be in the original ordering if asc=True
and flipped otherwise?
Otherwise I currently fail to see a way to do this such that arrays of strings or datetimes can also be ordered, we'd have to throw an error in these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LukeWeidenwalker That's a good point, I think we can clarify that as I think that was the original intention anyway. Please open an issue in openeo-processes and I'll prepare a PR for 2.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened Open-EO/openeo-processes#409
… github.com:eodcgmbh/openeo-processes-dask into EODCAPI-235-add-array-processes-for-apply-dimension
if isinstance(order, list): | ||
order = np.asarray(order) | ||
|
||
if len(data.shape) != len(order.shape): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it make sense to also allow len(order.shape) == 1 here? From the examples in the definition https://processes.openeo.org/#rearrange this could also be valid and I think it makes sense to always take the same element of an array over a specific dimension... this could be handle with np.take automatically :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - the spec seems ambiguous about that to me, since all the examples have an equal number of axes. We could add extra axes and broadcast the order
array such that it fits the data
array. I agree that it's a possible enhancement, but adding this would need more tests for higher-dimensional arrays. I think we can simplify our lives here and just throw an error saying that this isn't supported until someone actually needs exactly this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of what type the elements are, the array is just re-arranged according to the order on the first level.
So if for example you have rearrange(data = [[2,3], [4,3]], order=[1,0])
, you'd get [[4,3], [2,3]]
. Hope that helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of what type the elements are, the array is just re-arranged according to the order on the first level.
So if for example you have
rearrange(data = [[2,3], [4,3]], order=[1,0])
, you'd get[[4,3], [2,3]]
. Hope that helps.
Thanks for this comment - so the order
array is always expected to only have a single axis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed. It is of type array<integer>
so can't be multi-dimensional.
@ValentinaHutter Update from Open-EO/openeo-processes#409: ordering of ties is now implementation-dependent, so we can go ahead with the |
No description provided.