-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_asarray_copy[dask.array]
fails on CI with dask==2024.12
#209
Comments
The failing test is
A prime suspect is dask/dask#11524. |
This is indeed a breaking change in dask: 2024.12: 3588 -> if is_arraylike(x) and hasattr(x, "copy"): Here So the meaning of Does this sound right @asmeurer ? |
Also ping @lithomas1 : would be able to weigh in on the dask.array changes? |
Yeah, part of the reason we wrap I think your analysis is right that dask is always doing a copy now. |
cc @phofl for awareness |
So I suppose the main question for this project is whether there's a supported way going forward to convert a numpy array (or anything that supports the buffer protocol) into a |
I disagree.
"if possible" should be something left at the library developer's discretion, so a change of behaviour there should not be treated as a breaking change. I believe instead that the test is over-eager in expecting a specific outcome. As @phofl and @fjetter correctly observe on dask/dask#11524, it is extremely unhealthy to store large arrays inside the dask graph. |
Hm, not sure I completely understand here why this can't be possible. In the linked issue, there is this comment:
This suggests that dask has an internal API somewhere that can create a dask This'd be pretty useful in the case where I know I won't use the original array I create the dask array from anymore after passing it into |
It's not impossible, it's just very, very unhealthy to let something at the root of your dask graph point to an array that could be updated by the user later on. It's an antipattern to use from_array/asarray to embed large arrays in the graph to begin with, as it carries all sorts of performance issues and will very likely kill off or at least hamstring the scheduler and/or the worker processes. So the wastefulness of always deep-copying an array that must be small (<10 MB) becomes inconsequential. |
Observed in #208 :
Note that it
The text was updated successfully, but these errors were encountered: