Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[10.x] Refactored LazyCollection::except() to save memory #48535

Conversation

fuwasegu
Copy link
Contributor

Overview

I've reduced the memory usage by reimplementing the except() method of LazyCollection::class to use lazy evaluation.

Motivation

LazyCollection is a sibling of the Collection class that utilizes Generators for lazy evaluation.
I believe that methods implemented by LazyCollection should be lazily evaluated using yield wherever possible.

However, while it's often unavoidable to use passthru() to delegate to the Collection class, doing so defeats the purpose of using LazyCollection. This is because all the data gets loaded into memory, negating the benefits of lazy evaluation.

Tests

As this is a refactoring, the fact that no test modifications were needed indicates that the functionality remains unbroken.

References

I previously refactored LazyCollection::take() for similar reasons in PR #48382.

Performance Metrics

I measured the performance before and after the refactoring using the following test code, focusing on both execution time and memory usage. Please use these metrics to evaluate the effectiveness of this refactoring.

Code

<?php

declare(strict_types=1);

use Illuminate\Support\Enumerable;
use Illuminate\Support\Facades\Artisan;
use Illuminate\Support\LazyCollection;

Artisan::command('except:before {size}', function () {
    $factory = function (): Generator {
        $size = $this->argument('size');

        for ($index = 0; $index < $size; $index++) {
            yield "key-{$index}" => "value-{$index}";
        }
    };

    $collection = new LazyCollection($factory);
    $value = $collection->except(array_map(fn (int $i) => 'key-' . (10 ** $i), range(0, 8)))->first();

    dump(
        \memory_get_usage(true),
        \microtime(true) - \LARAVEL_START,
    );
});

Artisan::command('except:after {size}', function () {
    $factory = function (): Generator {
        $size = $this->argument('size');

        for ($index = 0; $index < $size; $index++) {
            yield "key-{$index}" => "value-{$index}";
        }
    };

    $collection = new class ($factory) extends LazyCollection {
        public function except($keys)
        {
            $keys = match (true) {
                $keys instanceof Enumerable => $keys->all(),
                is_array($keys) => $keys,
                is_null($keys) => [],
                default => func_get_args(),
            };

            if (count($keys) === 0) {
                return new static(function () {
                    yield from $this;
                });
            }

            return new static(function () use ($keys) {
                foreach ($this as $key => $value) {
                    if (!in_array($key, $keys)) {
                        yield $key => $value;
                    }
                }
            });
        }
    };
    $value = $collection->except(array_map(fn (int $i) => 'key-' . (10 ** $i), range(0, 8)))->first();

    dump(
        \memory_get_usage(true),
        \microtime(true) - \LARAVEL_START,
    );
});

Results

About memory usage:

100 10000 1000000 100000000
before 23068672 23068672 Fatal error Fatal error
after 23068672 23068672 23068672 23068672

About execution time:

100 10000 1000000 100000000
before 0.058702945709229 0.062233924865723 Fatal error Fatal error
after 0.059131860733032 0.058793067932129 0.05843710899353 0.059433937072754

Copy link
Contributor

@JosephSilber JosephSilber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test ensuring it's actually lazy. Update this testExceptIsLazy method:

public function testExceptIsLazy()
{
$this->assertDoesNotEnumerate(function ($collection) {
$collection->except([1, 2]);
});
$this->assertEnumeratesOnce(function ($collection) {
$collection->except([1, 2])->all();
});
}

Add something like this between those 2 tests:

$this->assertEnumerates(4, function ($collection) {
    $collection->except([1, 2])->take(2)->all();
});

src/Illuminate/Collections/LazyCollection.php Outdated Show resolved Hide resolved

return new static(function () use ($keys) {
foreach ($this as $key => $value) {
if (! in_array($key, $keys)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in_array needs to loop through the whole array for each iteration.

Instead, flip the array first, so that the list of keys are actual keys in the array, and can be used as a dictionary for efficient O(1) lookup.

See here for example:

$keys = array_flip(is_array($key) ? $key : func_get_args());
$count = count($keys);
foreach ($this as $key => $value) {
if (array_key_exists($key, $keys) && --$count == 0) {
return true;
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see.
I had not been able to take into account the performance of PHP down to the function level. Thanks for the advice!
I will try to fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@driesvints
Copy link
Member

@fuwasegu tests are fixed on 10.x, can you rebase?

@fuwasegu fuwasegu force-pushed the refactor/lazy-collection-except-save-memory branch from 9abf042 to e6b0dd0 Compare September 26, 2023 15:55
@fuwasegu
Copy link
Contributor Author

@driesvints
thank you for fixing tests.
i rebased and all tests were passed !

@taylorotwell
Copy link
Member

I think this may be a breaking change? Normal collections use Arr::forget under the hood for except, which supports "dot" notation.

@taylorotwell taylorotwell marked this pull request as draft September 26, 2023 21:27
@JosephSilber
Copy link
Contributor

I think this may be a breaking change? Normal collections use Arr::forget under the hood for except, which supports "dot" notation.

Oh right!

When this PR came in, I was racking my brain trying to remember why I didn't do it originally.

Now that you've mentioned it, I remember that this was indeed the reason. Didn't think at the time that it's worth reimplementing it all here.


If we do want to do it now, we have to decide whether we want to support dot notation for when the children (or other descendants) are also lazy collections 🤔

@fuwasegu
Copy link
Contributor Author

I've thought about it a lot, but I couldn't come up with an implementation of except() that supports dot nesting in LazyCollection...

For example, if the keys are ['foo.bar', 'bar', 'foo.baz'], etc., LazyCollection cannot run backwards, so it will have to go around once for each key verification

@morloderex
Copy link
Contributor

morloderex commented Sep 27, 2023

@fuwasegu Would it make sense to undot the values within the input array before running it through the except implementation then?

This way we can maybe make it work with the dot notation?

@fuwasegu
Copy link
Contributor Author

@morloderex nice idea !
thanks!

@fuwasegu
Copy link
Contributor Author

How about this one ?

public function except($keys)
{
    $keys = match (true) {
        $keys instanceof Enumerable => $keys->all(),
        is_array($keys) => $keys,
        is_null($keys) => [],
        default => func_get_args(),
    };

    if (count($keys) === 0) {
        return $this;
    }

    $keys = Arr::undot(array_fill_keys($keys, true));

    $forget = function (LazyCollection|ArrayAccess|array &$data, array $keys) use (&$forget) {
        if ($data instanceof LazyCollection) {
            return new static(function () use ($data, $keys, $forget) {
                foreach ($data as $key => $value) {
                    $shouldYield = true;
                    foreach ($keys as $k => $v) {
                        if (
                            $k === $key
                            && is_array($v)
                            && (
                                $value instanceof LazyCollection
                                || is_array($value)
                                || $value instanceof ArrayAccess
                            )
                        ) {
                            yield $key => $forget($value, $v);
                            $shouldYield = false;
                            break;
                        } elseif ($k === $key) {
                            $shouldYield = false;
                            break;
                        }
                    }
                    if ($shouldYield) {
                        yield $key => $value;
                    }
                }
            });
        } else {
            foreach ($keys as $k => $v) {
                if (is_array($v)) {
                    return $forget($data, $v);
                } else {
                    if (isset($data[$k])) {
                        unset($data[$k]);
                        return $data;
                    }
                }
            }
        }
    };

    return new static(function () use ($keys, $forget) {
        yield from $forget($this, $keys);
    });
}

@fuwasegu
Copy link
Contributor Author

fuwasegu commented Sep 27, 2023

these test cases was passed.

<?php

namespace Tests\Unit;

use Illuminate\Support\Collection;
use Illuminate\Support\LazyCollection;
use PHPUnit\Framework\TestCase;

class ExceptTest extends TestCase
{

    public function test_1(): void
    {
        $collection = TestLazyCollection::make([1, 2, 3, 4]);
        $collection = $collection->except([0, 2]);
        $this->assertSame([1 => 2, 3 => 4], $collection->toArray());
    }

    public function test_2(): void
    {
        $collection = TestLazyCollection::make([
            'foo' => 'FOO',
            'bar' => [
                'one' => 'ONE',
                'two' => 'TWO',
            ],
            'baz' => 'BAZ',
        ]);
        $collection = $collection->except(['foo', 'bar.two']);
        $this->assertSame([
            'bar' => [
                'one' => 'ONE',
            ],
            'baz' => 'BAZ',
        ], $collection->toArray());
    }

    public function test_3(): void
    {
        $collection = TestLazyCollection::make([
            'foo' => 'FOO',
            'bar' => Collection::make([
                'one' => 'ONE',
                'two' => 'TWO',
            ]),
            'baz' => 'BAZ',
        ]);
        $collection = $collection->except(['foo', 'bar.two']);

        $this->assertEquals([
            'bar' => Collection::make([
                'one' => 'ONE',
            ]),
            'baz' => 'BAZ',
        ], iterator_to_array($collection));
    }

    public function test_4(): void
    {
        $collection = TestLazyCollection::make(function () {
            yield 'foo' => LazyCollection::make(function () {
                yield 'one' => 'ONE';
                yield 'two' => 'TWO';
                yield 'three' => [
                    'first' => 'FIRST',
                    'second' => 'SECOND',
                ];
            });
            yield 'bar' => LazyCollection::make(function () {
                yield 'one' => 'ONE';
                yield 'two' => 'TWO';
                yield 'three' => [
                    'first' => 'FIRST',
                    'second' => 'SECOND',
                ];
            });
            yield 'baz' => LazyCollection::make(function () {
                yield 'one' => 'ONE';
                yield 'two' => 'TWO';
                yield 'three' => [
                    'first' => 'FIRST',
                    'second' => 'SECOND',
                ];
            });
        });
        $collection = $collection->except(['foo', 'bar.two', 'baz.three.second']);

        $this->assertEquals([
            'bar' => [
                'one' => 'ONE',
                'three' => [
                    'first' => 'FIRST',
                    'second' => 'SECOND',
                ],
            ],
            'baz' => [
                'one' => 'ONE',
                'two' => 'TWO',
                'three' => [
                    'first' => 'FIRST',
                ],
            ],
        ], $collection->toArray());
    }

}

@mpyw
Copy link
Contributor

mpyw commented Sep 28, 2023

@fuwasegu Refactored and fixed some bugs

public function except($keys)
{
    $keys = match (true) {
        $keys instanceof Enumerable => $keys->all(),
        is_array($keys) => $keys,
        is_null($keys) => [],
        default => func_get_args(),
    };

    if (count($keys) === 0) {
        return $this;
    }

    $keys = Arr::undot(array_fill_keys($keys, true));

    $forget = function (LazyCollection|ArrayAccess|array $data, array $keys) use (&$forget) {
        if ($data instanceof LazyCollection) {
            return new static(function () use ($data, $keys, $forget) {
                foreach ($data as $key => $value) {
                    foreach ($keys as $k => $v) {
                        if (
                            $k === $key
                            && is_array($v)
                            && (
                                $value instanceof LazyCollection
                                || is_array($value)
                                || $value instanceof ArrayAccess
                            )
                        ) {
                            yield $key => $forget($value, $v);
                            continue 2;
                        }
                        if ($k === $key) {
                            continue 2;
                        }
                    }
                    yield $key => $value;
                }
            });
        }
        foreach ($keys as $k => $v) {
            if (is_array($v)) {
                $forget($data, $v);
            } else {
                unset($data[$k]);
            }
        }
        return $data;
    };

    return new static(function () use ($keys, $forget) {
        yield from $forget($this, $keys);
    });
}

@fuwasegu
Copy link
Contributor Author

For some reason, my previous comment was lost, so I'm writing again.

As taylor and joseph have said, this PR is destructive.
I didn't realize at first that except supports dot nesting and implemented it under the assumption of a flat LazyCollection.

Now that joseph says "we have to decide what to do if the child of a LazyCollection is a LazyCollection", I think we have three choices.

  • Allow destructive changes and do not support nesting in the case of a LazyCollection. I would include this change starting with 11.x.
  • Make LazyCollection::except() support nesting by dots
  • LazyCollection::except() remains as before, allowing all data to be in memory with the call (giving up on being lazy)

Personally, I don't want to give up on the third one.
And I implemented except() for dots.
However, it has become a bit complicated, so I leave it to the owner to decide whether to include it as part of the framework implementation.

@fuwasegu
Copy link
Contributor Author

@taylorotwell @JosephSilber

To test the changes accommodating dot notation, I ran tests in SupportCollectionTest::testExcept() using the following data:

$data = new $collection([
    'social' => [
        'twitter' => '@taylorotwell',
        'github' => 'taylorotwell',
    ],
    'projects' => new $collection([
        'laravel' => [
            'stars' => 30000,
            'forks' => 5000,
        ],
        'spark' => [
            'stars' => 2000,
            'forks' => 500,
        ],
    ]),
]);

$this->assertEquals([
    'social' => [
        'github' => 'taylorotwell',
    ],
    'projects' => [
        'laravel' => [],
        'spark' => [
            'stars' => 2000,
        ],
    ],
], $data->except(['social.twitter', 'projects.laravel.stars', 'projects.laravel.forks', 'projects.spark.forks'])->toArray());

For LazyCollection, the test passed. However, for the standard Collection, I encountered the following warning, which caused the test to fail:

while (count($parts) > 1) {
$part = array_shift($parts);
if (isset($array[$part]) && static::accessible($array[$part])) {
$array = &$array[$part];
} else {
continue 2;
}
}

Is this because Collection::except() is not designed to accommodate types like Collection<TKey, Collection<TKey, array>>?

If so, the first step is to decide whether to allow such types. Should I consider refactoring Collection::except() with its own implementation, rather than relying on Arr::except()?

@fuwasegu fuwasegu marked this pull request as ready for review September 29, 2023 08:20
@Angel5a
Copy link
Contributor

Angel5a commented Oct 2, 2023

@fuwasegu What about a compromise:

  • Step1. If no dot notation involved, then use your simple lazy implementation, otherwise give up and fallback to old implementation (10.x branch).
  • Step 2. Add support for dot notation (perhaps in 11.x branch).

Lazy collections that are not lazy is a bad thing. It would be nice to see it fixed.

@fuwasegu
Copy link
Contributor Author

fuwasegu commented Oct 2, 2023

@Angel5a Thank you for your excellent suggestions!

Indeed.

  • If $keys contains a string with a dot in it, passthru, and
  • Otherwise, use my first implementation and process lazily

is a very clear-cut compromise!

Dot support is already implemented, so it is possible to incorporate it starting with 11.x.

@taylorotwell @JosephSilber What do you think of this idea?

@JosephSilber
Copy link
Contributor

Not sure about that. Leads to unintuitive swings in memory usage. Adding a dot now completely changes the memory footprint.

By selecting "fewer" values, I'll be increasing my memory footprint. Not what I'd expect.

@Angel5a
Copy link
Contributor

Angel5a commented Oct 2, 2023

@JosephSilber Yes, that's the problem, but

  1. We are already have the same issue.
  2. Refactoring is a step by step process.

For now we get closer for desired behavior (negotiate specific use-case). For 11.x we can completely resolve the problem, even by introducing BC change.

Penalty is in dot presence check.

@taylorotwell
Copy link
Member

I honestly just can't commit to maintaining that much code for this use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants