Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use builders to implement yield expressions #951

Closed
charlesroddie opened this issue Jan 13, 2021 · 9 comments
Closed

Use builders to implement yield expressions #951

charlesroddie opened this issue Jan 13, 2021 · 9 comments

Comments

@charlesroddie
Copy link

charlesroddie commented Jan 13, 2021

Yield expressions are implemented using a GeneratedSequenceBase<int> class with e.g. SeqModule.ToArray afterwards: sharplab

[<Benchmark>]
member _.Yield() =
    [|
        for i = 0 to 1000 do
            yield i
    |]

Using a builder would perform much better, and might also allow removing syntactical restrictions, if yield can be used wherever b.Add can be used. E.g. allowing Option.iter yield, which addresses the main example in #705 .

The above code would then compile down to:

[<Benchmark>]
member _.Builder() =
    let b = System.Collections.Generic.List<int>()
    for i = 0 to 1000 do
        b.Add i
    b.ToArray()

The benchmark gives for Arrays:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Yield 21.963 us 0.2451 us 0.2292 us 2.9907 - - 12.34 KB
Builder 3.843 us 0.0766 us 0.0752 us 2.9755 - - 12.16 KB

For FSharpList, using b.ToArray() |> List.ofArray:

Method Mean Error StdDev Median Gen 0 Gen 1 Gen 2 Allocated
Yield 28.78 us 0.617 us 1.771 us 28.22 us 7.6904 0.1221 - 31.45 KB
Builder 13.12 us 0.260 us 0.676 us 12.96 us 10.6354 - - 43.45 KB

Builders should apply to all non-lazy sequence types (FSharpList, Array, ImmutableArray).

@RealmPlume
Copy link

RealmPlume commented Jan 14, 2021

You can use Array.init 1000 id in this case.

Consider another case:

   let arrx s = [|
        for i in s do
            yield i
   |]

It is very cumbersome to generate ResizeArray here.
I think the crux of the problem is that the compiler does not recognize
0 to 1000 or 0 .. 1000 as a range, but as Seq.

@charlesroddie
Copy link
Author

@greatim An optimization where a minimum length can be detected before creating the builder (by counting definite future yields, calling .Length on Array/ImmutableArray instances that are about to be yield!ed, etc.) would be useful, but as the simple benchmarks show, calculating an initial builder length efficiently is not necessary to give a significant speedup.

@RealmPlume
Copy link

RealmPlume commented Jan 17, 2021

@greatim An optimization where a minimum length can be detected before creating the builder (by counting definite future yields, calling .Length on Array/ImmutableArray instances that are about to be yield!ed, etc.) would be useful, but as the simple benchmarks show, calculating an initial builder length efficiently is not necessary to give a significant speedup.

@charlesroddie
I know what you are talking about, because I also did a similar calculation expression to generate a byte array. But I think it cannot be universal, for example, there are one or more Seq of unknown length. So it is reasonable to generate a Seq first.

There are actually two problems here:

  1. As mentioned earlier, the compiler recognizes Seq more than a to b, so that special optimization is required in each case. This is unreasonable. When our function uses for, this The disadvantage is obvious, F# only optimizes a few scenarios.

  2. After .Net Core, the ToArray method of the basic library will directly generate an array of the corresponding length when generating a Seq of known length. When generating a Seq of unknown length, it uses the LargeArrayBuilder , F# still uses .Net 4.x for unknown reasons.

I agree that it needs improvement. But the direction of improvement is inconsistent.

Your code is actually reflected in SeqModule.ToArray, and even more detailed. But for the reasons mentioned above, it can only generate Seq first.

@charlesroddie
Copy link
Author

I agree that it needs improvement. But the direction of improvement is inconsistent.

If I understand your comments right (and I'm not sure I do) you are examining alternatives to the builder idea, which would incrementally improve the existing method, and the conclusion is that there are may be ways to improve it in some special cases subject to some technicalities but not in a general way. So that would support going with builders I think.

@RealmPlume
Copy link

@charlesroddie
I am sorry for my bad English and thank you for your patience in reading what I said.

Excuse me, I didn't ask before now, does the builder you mean by Computation Expression?
what I said before is that, in Computation expressions we are no able to check the length of expression 0..1000.
When we write the code for i in 0 .. 1000', we get a seqasxinmember this.For(x, f),it's frustrating. So, even if aComputation expression` is used, its final result is no different from the original.
Just from:

SeqModule.ToArray TheSeq

To:

    let b = System.Collections.Generic.List<int>()
    for x in TheSeq do
        b.Add i
    b.ToArray()

This is not what we want.

And what needs to be added is the Computation Expression also cannot make the [| yield 0; yield 1 |] to [| 0; 1 |] directly, this is what I wish.

It seems that the compiler may need to do more optimization work.

And I don’t think it’s a good habit to optimize for x in a .. b do separately one time by one time. Expressions like for x in 0 .. 2 .. 1000 do are always ignored. Something must change, in compiler.

@charlesroddie
Copy link
Author

charlesroddie commented Jan 18, 2021

@greatim No those ideas are on different lines to this issue. The suggestion in this issue is that when code is seen with a yield expression - a collection with an explicit or implicit yield inside, e.g. [| for i = 0 to 1000 do yield i |] -, this should be compiled to a code block with a builder (the code inside member _.Builder()). I've edited the OP to make this clearer.

The suggestion is not to replace just SeqModule.toArray with a builder, but the whole current compiled code, including both GeneratedSequenceBase<int> and SeqModule.toArray, with a builder.

@hvester
Copy link

hvester commented Jan 23, 2021

I have understood that something like this is enabled by resumable state machines. See Example: low-allocation list and array builders

@charlesroddie
Copy link
Author

Fixed by dotnet/fsharp#6811

@dsyme
Copy link
Collaborator

dsyme commented Jul 19, 2021

Actually was done in dotnet/fsharp#11592

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants