Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add array order processes #65

Merged

Conversation

ValentinaHutter
Copy link
Collaborator

No description provided.

@codecov
Copy link

codecov bot commented Feb 24, 2023

Codecov Report

Merging #65 (8b36d14) into main (dd88c1d) will increase coverage by 0.53%.
The diff coverage is 82.35%.

@@            Coverage Diff             @@
##             main      #65      +/-   ##
==========================================
+ Coverage   73.23%   73.77%   +0.53%     
==========================================
  Files          22       22              
  Lines         766      816      +50     
==========================================
+ Hits          561      602      +41     
- Misses        205      214       +9     
Impacted Files Coverage Δ
...cesses_dask/process_implementations/cubes/apply.py 96.15% <ø> (ø)
...o_processes_dask/process_implementations/arrays.py 90.74% <82.00%> (-3.91%) ⬇️
...neo_processes_dask/process_implementations/core.py 97.61% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

permutation_idxs = np.argsort(data, kind="mergesort", axis=axis)
else: # [::-1] not possible
permutation_idxs = np.argsort(
-data, kind="mergesort", axis=axis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works with numerical types, but will fail if trying to sort string or datetime arrays!
Why not just use np.flip on the found indices?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had used flip before, but switched to this approach because of the following side effect:
Using the example in the documentation, order(data = [6,-1,2,null,7,4,null,8,3,9,9], asc = false, nodata = false) should result in [3, 6, 9, 10, 7, 4, 0, 5, 8, 2, 1].
With np.flip our result becomes [6, 3, 10, 9, 7, 4, 0, 5, 8, 2, 1], so the values that are the same are in the opposite order. Switching these after flip was used does not seem too convenient to me, what do you think?

Copy link
Contributor

@LukeWeidenwalker LukeWeidenwalker Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for this - hmm, what we could do would be to count the number of nans before, only flip the array[index_last_nan:] and stitch together again? Then we should have strings/datetimes covered too!

Copy link
Collaborator Author

@ValentinaHutter ValentinaHutter Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you are referring to would result in an array like [3, 6, 10, 9, 7, 4, 0, 5, 8, 2, 1], so the nan values would be handled. If there are values in the data twice, (like the "9" in the example), the order of these is also flipped. (The result in the example should be [3, 6, 9, 10, 7, 4, 0, 5, 8, 2, 1])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. @m-mohr what do you think about relaxing the Ties will be left in their original ordering requirement in the process spec for order? Is this strictly necessary for some usecase? Seems like implementation on our end would be simpler if this didn't have to be guaranteed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get comparable and reproducible results, you can't let this undefined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that's fair, so we cannot drop this requirement completely - what about changing it such that ties will be in the original ordering if asc=True and flipped otherwise?
Otherwise I currently fail to see a way to do this such that arrays of strings or datetimes can also be ordered, we'd have to throw an error in these cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LukeWeidenwalker That's a good point, I think we can clarify that as I think that was the original intention anyway. Please open an issue in openeo-processes and I'll prepare a PR for 2.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LukeWeidenwalker LukeWeidenwalker changed the title Eodcapi 235 add array processes for apply dimension feat: add array order processes Feb 24, 2023
if isinstance(order, list):
order = np.asarray(order)

if len(data.shape) != len(order.shape):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it make sense to also allow len(order.shape) == 1 here? From the examples in the definition https://processes.openeo.org/#rearrange this could also be valid and I think it makes sense to always take the same element of an array over a specific dimension... this could be handle with np.take automatically :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - the spec seems ambiguous about that to me, since all the examples have an equal number of axes. We could add extra axes and broadcast the order array such that it fits the data array. I agree that it's a possible enhancement, but adding this would need more tests for higher-dimensional arrays. I think we can simplify our lives here and just throw an error saying that this isn't supported until someone actually needs exactly this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of what type the elements are, the array is just re-arranged according to the order on the first level.

So if for example you have rearrange(data = [[2,3], [4,3]], order=[1,0]), you'd get [[4,3], [2,3]]. Hope that helps.

Copy link
Contributor

@LukeWeidenwalker LukeWeidenwalker Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of what type the elements are, the array is just re-arranged according to the order on the first level.

So if for example you have rearrange(data = [[2,3], [4,3]], order=[1,0]), you'd get [[4,3], [2,3]]. Hope that helps.

Thanks for this comment - so the order array is always expected to only have a single axis?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed. It is of type array<integer> so can't be multi-dimensional.

@LukeWeidenwalker
Copy link
Contributor

@ValentinaHutter Update from Open-EO/openeo-processes#409: ordering of ties is now implementation-dependent, so we can go ahead with the np.flip implementation! :)

@ValentinaHutter ValentinaHutter merged commit 8838311 into main May 3, 2023
@ValentinaHutter ValentinaHutter deleted the EODCAPI-235-add-array-processes-for-apply-dimension branch September 19, 2024 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants