Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterator could chain indetermined number of iterators now #6570

Merged
merged 7 commits into from
Sep 10, 2018

Conversation

xqyww123
Copy link
Contributor

Fixes #6532

@@ -59,6 +59,55 @@ describe Iterator do
iter.rewind
iter.to_a.should eq([1, 2, 'a', 'b'])
end
describe "chain indeterminate number of iterators" do
it "chains all together" do
n = 10 + rand 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason to have random size. Tests should be reproducible.

Anyway, it's probably better to just use a fixed test value instead of generating it on the fly. This leads to simpler code and more easily understandable spec.

Also, there doesn't need to be a lot of iterators and the iterator items can just be simple values.

arrays = [[0, 0], [1, 1], [2, 2]]
iterator = Iterator(Int32).chain arrays
3.times do |i|
  iterator.next.should eq i
  iterator.next.should eq i
end

Don't make test cases more complicated than they need to be. Keep it simple.

Ditto below.

}
end
it "chains empty" do
arrs = [] of Array(Int32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could just be [] of Iterator(Int32).

it "chains array of empty" do
n = 10 + rand 10
arrs = [] of Array(Int32)
n.times { |a| arrs << (a % 3 == 0 ? [] of Int32 : [a, a*a, a + 3]) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Using a literal value is much easier to understand than this generator.

src/iterator.cr Outdated
@@ -200,6 +200,60 @@ module Iterator(T)
end
end

# Like `#chain` but could chains multiple iterators.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire doc comment needs linguistic improvements. Feel free to ask for help if you have trouble with the language.

@straight-shoota
Copy link
Member

Did you consider merging ChainsAll with the existing Chain type? It could probably just be replaced with the new implementation which could be used by both .chain and #chain.

@RX14
Copy link
Contributor

RX14 commented Aug 20, 2018

https://carc.in/#/r/4s6x

  1. The standard chain type needs to be "de-specialized" and have the I1 and I2 types taken out of it's type arguments. This means there's a lot more dispatches in Chain2 than Chain though. So it might be worth keeping the old type around for those where a union dispatch is too much of a performance penalty.
  2. The iterators need to be cast to the more generic type Iterator(Int32) otherwise the reduce block changes the type of memo, this isn't a problem specific to this kind of code: https://carc.in/#/r/4s71 vs https://carc.in/#/r/4s72

The root of the problem is simply that the iterator types try really hard to keep the iterator types as specific as possible, to enable LLVM to do inlining and other performance optimizations. This makes it really easy for iterator types to grow out of control (actually recursive) when iterator chains are constructed in a loop. This problem isn't specific to Chain, but really to performing a reduce with any iterators.

@xqyww123
Copy link
Contributor Author

xqyww123 commented Aug 21, 2018

As @RX14 said, although they both implement the same function, the performance is different. Theoretically, Chain would be faster than ChainAll.

However, in the original implementation, if one tries to concat N iterators, N if-branch and N bool variables will be used. I'm not sure whether compiler could optimize it but I guess it's hard and would influence performance deeply.

And more, if de-specialized the chain type, recursive calling could occur. In case that the number of iterators to concat is indeterminate, a loop is necessary to implement the function. But actually, de-specialized Chain use the recursion to simulate the loop. Recursion cannot be inlined and cost more time, stack overflow may also be a problem.

If all iterators have the same type, ChainAll could be inlined, or in the worse case, the cost of virtual callings is not too much.

If we really care performance, #chain may be RX14 style and .chain for dozens of iterators uses a real loop. But I prefer to Keep It Simple and Stupid.

@RX14
Copy link
Contributor

RX14 commented Aug 21, 2018

Yes, i'm fine with this implementation, I'm still very concerned that this problem isn't unique to Chain though.

Finally, I remain `#chain` in the same, for the purpose of performance, but added some comments to instruct this feature.
Maybe I had tried my best to improve linguistics, but I'm sorry I'm not good at foreign languages. If it's still not good enough, helps is really really appreciated, thanks.
@xqyww123
Copy link
Contributor Author

Finally, I remain #chain in the same, for the purpose of performance, but added some comments to instruct this feature.
Maybe I had tried my best to improve linguistics, but I'm sorry I'm not good at foreign languages. If it's still not good enough, helps is really really appreciated, thanks.

src/iterator.cr Outdated
@@ -160,6 +160,11 @@ module Iterator(T)

# Returns an iterator that returns elements from the original iterator until
# it is exhausted and then returns the elements of the second iterator.
# Compared to `.chain`, it has better performance when the quantity of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compared to #chain(Iterator(Iter))

src/iterator.cr Outdated
@@ -160,6 +160,11 @@ module Iterator(T)

# Returns an iterator that returns elements from the original iterator until
# it is exhausted and then returns the elements of the second iterator.
# Compared to `.chain`, it has better performance when the quantity of
# iterators to chain is small (usually less than 4).
# One could also face a problem trying to chain several iterators in loop(like `Enumerable#reduce`),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method also cannot chain iterators in a loop, for that see #chain(Iterator(Iter)).

@@ -200,6 +205,58 @@ module Iterator(T)
end
end

# The same as `#chain`, but have better performance when the quantity of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but has better performance

src/iterator.cr Outdated
@@ -200,6 +205,58 @@ module Iterator(T)
end
end

# The same as `#chain`, but have better performance when the quantity of
# iterators to chain is large(usually greater than 4) or undetermined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

large (usually

src/iterator.cr Outdated
# iterators to chain is large(usually greater than 4) or undetermined.
# It uses lesser type specification for more freedom to avoid the failure (at a cost of performance),
# and a loop rather than recursive calling to achieve better performance for numbers of iterators
# (but when the quantity is small, it's not really efficiency).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't tend to discuss implementation details in docs, only user-visible effects, like performance and ability to chain in a loop.

@@ -59,6 +59,30 @@ describe Iterator do
iter.rewind
iter.to_a.should eq([1, 2, 'a', 'b'])
end
describe "chain indeterminate number of iterators" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing extra line above this line.

iter = Iterator(Int32).chain iters
7.times { |i| iter.next.should eq i }
end
it "chains empty" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing extra line above this line.

iter = Iterator(Int32).chain arrs.map(&.each)
iter.next.should be_a Iterator::Stop
end
it "chains array of empty" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing extra line above this line.

iter = Iterator(Int32).chain iters
7.times { |i| iter.next.should eq i }
end
it "rewinds" do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing extra line above this line.

@asterite
Copy link
Member

Does this work when chaining iterators of different types? For example chaining all of Iterator(Int32), Iterator(String) and Iterator(Bool)?

@xqyww123
Copy link
Contributor Author

xqyww123 commented Sep 7, 2018

Yes, it could and I pushed new commit with the spec.

@asterite
Copy link
Member

asterite commented Sep 7, 2018

Please change the implementation of self.chain to:

  def self.chain(iters : Iterator(Iter)) forall Iter
    ChainsAll(Iter, typeof(iters.first.first)).new iters
  end

That way you can do:

Iterator.chain(iterators)

instead of:

Iterator(YouHaveToWriteTheTypeHereWhenTheCompilerCanFigureThisOut).chain(iterators)

And please add that newline I requested :-)

Thank you!

@xqyww123
Copy link
Contributor Author

xqyww123 commented Sep 9, 2018

Yeah, I realized the problem but didn't find the solution. This way is brilliant and I updated the code.

Copy link
Member

@asterite asterite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge this and if we later find that the performance of merging Chain with ChainAll is mostly the same we can unify their implementation while keeping the API interface the same.

@RX14 RX14 merged commit c772db8 into crystal-lang:master Sep 10, 2018
@RX14 RX14 added this to the 0.27.0 milestone Sep 10, 2018
@xqyww123 xqyww123 deleted the feature/iterator_chains_all branch September 13, 2018 09:52
ezrast pushed a commit to ezrast/crystal that referenced this pull request Oct 2, 2018
…ng#6570)

* Iterator could chain indetermined number of iterators now

Fixes crystal-lang#6532

* Formatted the code

* Simplified the spec & Improved comments.

Finally, I remain `#chain` in the same, for the purpose of performance, but added some comments to instruct this feature.
Maybe I had tried my best to improve linguistics, but I'm sorry I'm not good at foreign languages. If it's still not good enough, helps is really really appreciated, thanks.

* Improved comment linguistic, agian...

* Iterator could chain iterators of different type

* Type of ChainAll could be inferred by compiler
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants