Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document map behavior on sparse data structures #44233

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

ParadaCarleton
Copy link
Contributor

@ParadaCarleton ParadaCarleton commented Feb 17, 2022

This pull request clarifies that map's behavior assumes purity when working with sparse data structures, and that such behavior is not considered a bug, as discussed in this thread. If the core devs clarify that this behavior is intentional and map requires purity, I can also add an apply function which iterates through a given collection and applies a function element-by-element.

@ParadaCarleton
Copy link
Contributor Author

Bump -- @JeffBezanson @StefanKarpinski am I correct in assuming that map's purity assumption is intended and should be documented?

@StefanKarpinski
Copy link
Member

I don't think either of us are the ones to address that but yes, the assumption that the mapper is consistent in mapping values to zero is intended and should be documented.

@ParadaCarleton
Copy link
Contributor Author

I don't think either of us are the ones to address that but yes, the assumption that the mapper is consistent in mapping values to zero is intended and should be documented.

Perfect; do you know who I should mention (if I should mention anyone)?

@StefanKarpinski
Copy link
Member

I believe @andreasnoack may be able to help at least by recommending someone who could help more.

Note that `f` makes no guarantees about the order in which `f` is called on elements. In
addition, `f` is assumed to be [pure](https://en.wikipedia.org/wiki/Pure_function). Using
an impure function together with `f` can cause bugs when working with some data structures,
e.g. sparse or diagonal arrays, as `f` will only be called once on duplicated elements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be called once on default 0 but if you have duplicates that are non-default zeros it will be called multiple times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be called once on default 0 but if you have duplicates that are non-default zeros it will be called multiple times.

I've edited to say that it "May" call it only once per duplicate element, and clarify that calling map with an impure function is undefined behavior.

@bkamins
Copy link
Member

bkamins commented Apr 3, 2022

The performance rationale behind the proposal of @ParadaCarleton is clear.

However, when we accept it we should update docstrings of map!, mapreduce, mapslices, and Iterators.map accordingly.

In general the issue is that users might have assumed that it is OK to use these functions with non-pure argument. E.g. Iterators.map contract says:

This is another syntax for writing (f(args...) for args in zip(iterators...)).

and generator equivalent for should not assume purity of f (at least in my opinion).

@ParadaCarleton
Copy link
Contributor Author

The performance rationale behind the proposal of @ParadaCarleton is clear.
However, when we accept it we should update docstrings of map!, mapreduce, mapslices, and Iterators.map accordingly.
In general the issue is that users might have assumed that it is OK to use these functions with non-pure argument. E.g. Iterators.map contract says:

This is another syntax for writing (f(args...) for args in zip(iterators...)).
and generator equivalent for should not assume purity of f (at least in my opinion).

While I think I'd agree we should have some alternative to map that doesn't assume purity, I'd say Iterators.map is already assuming purity implicitly because of its laziness. While in theory it's possible to have a lazily-called impure function, laziness+statefulness is a recipe for disaster.

I agree that we should avoid making any changes to behavior for now to avoid breaking old code, though.

@bkamins
Copy link
Member

bkamins commented Apr 4, 2022

While in theory it's possible to have a lazily-called impure function, laziness+statefulness is a recipe for disaster.

100% agreed. The only problem is exactly what you have commented on later - old code might be affected (fortunately we are not changing implementation of anything - only the contract). As you know in DataFrames.jl we identified this issue because users reported to us problems when they used not-pure functions with map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants