Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traverse fixes #2365

Merged
merged 4 commits into from
Apr 6, 2021
Merged

Traverse fixes #2365

merged 4 commits into from
Apr 6, 2021

Conversation

1Jajen1
Copy link
Member

@1Jajen1 1Jajen1 commented Apr 5, 2021

Ignore the branch name, I initially wanted to do the foldRight bits here as well.

So traverse is broken in a few ways atm:

  • The iteration order is not always left to right (as it should be)
  • Short circuit did not actually short circuit in most cases.

This pr fixes both of these issues without api changes and adds tests to verify that.

Also I changed the traverse implementations to accumulate in a mutable buffer. Using List.plus is a heavy perf hit since it quite literally copies the entire list (no sharing) and with ArrayList backed implementations that means a lot of array copies and thus gc strain. The mutable result is always casted to an immutable one implicitly so it should not leak to users (unless they manually cast again, but thats possible either way).

@kyay10
Copy link
Collaborator

kyay10 commented Apr 5, 2021

The mutable result is always casted to an immutable one implicitly so it should not leak to users (unless they manually cast again, but thats possible either way).

In this case, you can use buildList which is an experimental API in the stdlib that basically locks the list right after you build it, and so it ensures that even if a user casts the list afterwards that they can't actually modify it. I can make a PR on your jo-fold-right-order if that'll help :)

Edit: and actually looking at your implementation there's a tiny bit of a performance improvement that could be had with intializing the mutableList with the size of the Iterable/Nel so that it doesn't have to resize itself (this is also possible with buildList and so that's a plus)

@1Jajen1
Copy link
Member Author

1Jajen1 commented Apr 5, 2021

In this case, you can use buildList which is an experimental API in the stdlib that basically locks the list right after you build it, and so it ensures that even if a user casts the list afterwards that they can't actually modify it

I actually don't mind if anyone wants to do this, since the List<A> is so atrocious in regards to modify perf that I don't really want to lock anyone out of doing so. Especially since arrow does not keep a hold of the list in any way, its back to the user and thus imo we should not restrict it. A restricted api would be neat for libs that involve concurrency and/or keep a reference of the lists after the operation finished. Either way the api needs to be safe on the surface level, typecasts, or even worse: reflection, can always subvert it I guess 🤷
Edit: I just looked up how buildList works, it copies the mutable list once its done building.

Also what is the opinion of everyone else regarding this: Should we make our return values extra safe against future type casts or not care?

Edit: and actually looking at your implementation there's a tiny bit of a performance improvement that could be had with intializing the mutableList with the size of the Iterable/Nel so that it doesn't have to resize itself (this is also possible with buildList and so that's a plus)

This sounds nice, for the Iterable it is only possible if we first check that its a List, otherwise we can't efficiently know the size, but for Nel this should certainly be there.

@kyay10
Copy link
Collaborator

kyay10 commented Apr 5, 2021

Edit: I just looked up how buildList works, it copies the mutable list once its done building.

I'm not sure where you got this information from because looking at the implementation of buildListInternal on the JVM at least it uses an internal class called ListBuilder which is backed by an array and its build function (that gets called after the builderAction is applied to it) is defined as so:

    fun build(): List<E> {
        if (backing != null) throw IllegalStateException() // just in case somebody casts subList to ListBuilder
        checkIsMutable()
        isReadOnly = true
        return this
    }

as you can see it just sets one single boolean flag called isReadOnly that then causes any modifying functions to throw when called on the resulting list.
(Note: buildList calls buildListInternal which calls createListBuilder which just returns a ListBuilder and then the provided builderAction gets applied to that ListBuilder and then build gets called on it which just calls the member .build() and so it seems like maybe that information was from an older implementation of buildList because the current one is definitely as performant as a good old ArrayList)
There is definitely something to be said for the user casting back the list to a mutable one for copying, but is that really that much of a performance concern since like if we're already going the unsafe route here with unsafe casts one can just use reflection to set that one single isReadOnly field to true and get on with their lives.

for the Iterable it is only possible if we first check that its a List,

Thankfully the arrow source code already has an extension on Iterable collectionSizeOrDefault which does exactly that lol. I think I'm just gonna make 2 PRs maybe, one that uses an ArrayList with the collection's size and another that uses buildList

@1Jajen1
Copy link
Member Author

1Jajen1 commented Apr 5, 2021

Welp I got to a different builder that just calls toList looking through code. Fun^^ Thanks for correcting and looking this up 😅

Thankfully the arrow source code already has an extension on Iterable collectionSizeOrDefault which does exactly that lol

I actually really like that we provide all these nice bits, but tbh I think the whole api is somewhat flawed if you have to^^ Either way, good!

Thinking about this for a bit using buildList is probably fine. Not sure about introducing new things with @ExperimentalApi but we'll see.

I think I'm just gonna make 2 PRs maybe, one that uses an ArrayList with the collection's size and another that uses buildList

I can do this really quick, thanks for the offer, but it'll probably just be faster if I do this on my branch with buildList directly.

Edit: Actually nevermind, I'll wait for whatever the other maintainers think about this^^

@kyay10
Copy link
Collaborator

kyay10 commented Apr 5, 2021

I just saw this right now after making a PR, whoops 😅!

Well yeah if the other maintainers are fine with using buildList (especially because it's an experimental API) then this should be good to go. Just make sure to merge the buildList PR first into the jo-fold-right-order before merging this PR into main

@dberg
Copy link
Contributor

dberg commented Apr 5, 2021

lgtm!

@nomisRev
Copy link
Member

nomisRev commented Apr 6, 2021

@1Jajen1 @kyay10 I don't think we should add @Experimental anywhere in our APIs, unless we add them as overloads and have the same APIs available without @Experimental like we do for kotlin.time.

So should we move ahead with this PR, and leave buildList as a future enhancement once it goes stable?

Copy link
Member

@nomisRev nomisRev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking care of this @1Jajen1! Looks great.

Copy link
Member

@raulraja raulraja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @1Jajen1 !

@raulraja raulraja merged commit 6cecbc8 into main Apr 6, 2021
@raulraja raulraja deleted the jo-fold-right-order branch April 6, 2021 09:19
nomisRev added a commit that referenced this pull request Apr 23, 2021
* FoldRight order

* Test and fix traverse implementations

* Undo foldRight changes

Co-authored-by: Simon Vergauwen <nomisRev@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants