-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[10.x] Refactored LazyCollection::except() to save memory #48535
[10.x] Refactored LazyCollection::except() to save memory #48535
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test ensuring it's actually lazy. Update this testExceptIsLazy
method:
framework/tests/Support/SupportLazyCollectionIsLazyTest.php
Lines 345 to 354 in ac91af7
public function testExceptIsLazy() | |
{ | |
$this->assertDoesNotEnumerate(function ($collection) { | |
$collection->except([1, 2]); | |
}); | |
$this->assertEnumeratesOnce(function ($collection) { | |
$collection->except([1, 2])->all(); | |
}); | |
} |
Add something like this between those 2 tests:
$this->assertEnumerates(4, function ($collection) {
$collection->except([1, 2])->take(2)->all();
});
|
||
return new static(function () use ($keys) { | ||
foreach ($this as $key => $value) { | ||
if (! in_array($key, $keys)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in_array
needs to loop through the whole array for each iteration.
Instead, flip
the array first, so that the list of keys are actual keys in the array, and can be used as a dictionary for efficient O(1) lookup.
See here for example:
framework/src/Illuminate/Collections/LazyCollection.php
Lines 580 to 587 in 2a4d848
$keys = array_flip(is_array($key) ? $key : func_get_args()); | |
$count = count($keys); | |
foreach ($this as $key => $value) { | |
if (array_key_exists($key, $keys) && --$count == 0) { | |
return true; | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see.
I had not been able to take into account the performance of PHP down to the function level. Thanks for the advice!
I will try to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
@fuwasegu tests are fixed on 10.x, can you rebase? |
Co-authored-by: Joseph Silber <contact@josephsilber.com>
9abf042
to
e6b0dd0
Compare
@driesvints |
I think this may be a breaking change? Normal collections use |
Oh right! When this PR came in, I was racking my brain trying to remember why I didn't do it originally. Now that you've mentioned it, I remember that this was indeed the reason. Didn't think at the time that it's worth reimplementing it all here. If we do want to do it now, we have to decide whether we want to support dot notation for when the children (or other descendants) are also lazy collections 🤔 |
I've thought about it a lot, but I couldn't come up with an implementation of except() that supports dot nesting in LazyCollection... For example, if the keys are ['foo.bar', 'bar', 'foo.baz'], etc., LazyCollection cannot run backwards, so it will have to go around once for each key verification |
@fuwasegu Would it make sense to undot the values within the input array before running it through the except implementation then? This way we can maybe make it work with the dot notation? |
@morloderex nice idea ! |
How about this one ? public function except($keys)
{
$keys = match (true) {
$keys instanceof Enumerable => $keys->all(),
is_array($keys) => $keys,
is_null($keys) => [],
default => func_get_args(),
};
if (count($keys) === 0) {
return $this;
}
$keys = Arr::undot(array_fill_keys($keys, true));
$forget = function (LazyCollection|ArrayAccess|array &$data, array $keys) use (&$forget) {
if ($data instanceof LazyCollection) {
return new static(function () use ($data, $keys, $forget) {
foreach ($data as $key => $value) {
$shouldYield = true;
foreach ($keys as $k => $v) {
if (
$k === $key
&& is_array($v)
&& (
$value instanceof LazyCollection
|| is_array($value)
|| $value instanceof ArrayAccess
)
) {
yield $key => $forget($value, $v);
$shouldYield = false;
break;
} elseif ($k === $key) {
$shouldYield = false;
break;
}
}
if ($shouldYield) {
yield $key => $value;
}
}
});
} else {
foreach ($keys as $k => $v) {
if (is_array($v)) {
return $forget($data, $v);
} else {
if (isset($data[$k])) {
unset($data[$k]);
return $data;
}
}
}
}
};
return new static(function () use ($keys, $forget) {
yield from $forget($this, $keys);
});
} |
these test cases was passed. <?php
namespace Tests\Unit;
use Illuminate\Support\Collection;
use Illuminate\Support\LazyCollection;
use PHPUnit\Framework\TestCase;
class ExceptTest extends TestCase
{
public function test_1(): void
{
$collection = TestLazyCollection::make([1, 2, 3, 4]);
$collection = $collection->except([0, 2]);
$this->assertSame([1 => 2, 3 => 4], $collection->toArray());
}
public function test_2(): void
{
$collection = TestLazyCollection::make([
'foo' => 'FOO',
'bar' => [
'one' => 'ONE',
'two' => 'TWO',
],
'baz' => 'BAZ',
]);
$collection = $collection->except(['foo', 'bar.two']);
$this->assertSame([
'bar' => [
'one' => 'ONE',
],
'baz' => 'BAZ',
], $collection->toArray());
}
public function test_3(): void
{
$collection = TestLazyCollection::make([
'foo' => 'FOO',
'bar' => Collection::make([
'one' => 'ONE',
'two' => 'TWO',
]),
'baz' => 'BAZ',
]);
$collection = $collection->except(['foo', 'bar.two']);
$this->assertEquals([
'bar' => Collection::make([
'one' => 'ONE',
]),
'baz' => 'BAZ',
], iterator_to_array($collection));
}
public function test_4(): void
{
$collection = TestLazyCollection::make(function () {
yield 'foo' => LazyCollection::make(function () {
yield 'one' => 'ONE';
yield 'two' => 'TWO';
yield 'three' => [
'first' => 'FIRST',
'second' => 'SECOND',
];
});
yield 'bar' => LazyCollection::make(function () {
yield 'one' => 'ONE';
yield 'two' => 'TWO';
yield 'three' => [
'first' => 'FIRST',
'second' => 'SECOND',
];
});
yield 'baz' => LazyCollection::make(function () {
yield 'one' => 'ONE';
yield 'two' => 'TWO';
yield 'three' => [
'first' => 'FIRST',
'second' => 'SECOND',
];
});
});
$collection = $collection->except(['foo', 'bar.two', 'baz.three.second']);
$this->assertEquals([
'bar' => [
'one' => 'ONE',
'three' => [
'first' => 'FIRST',
'second' => 'SECOND',
],
],
'baz' => [
'one' => 'ONE',
'two' => 'TWO',
'three' => [
'first' => 'FIRST',
],
],
], $collection->toArray());
}
} |
@fuwasegu Refactored and fixed some bugs public function except($keys)
{
$keys = match (true) {
$keys instanceof Enumerable => $keys->all(),
is_array($keys) => $keys,
is_null($keys) => [],
default => func_get_args(),
};
if (count($keys) === 0) {
return $this;
}
$keys = Arr::undot(array_fill_keys($keys, true));
$forget = function (LazyCollection|ArrayAccess|array $data, array $keys) use (&$forget) {
if ($data instanceof LazyCollection) {
return new static(function () use ($data, $keys, $forget) {
foreach ($data as $key => $value) {
foreach ($keys as $k => $v) {
if (
$k === $key
&& is_array($v)
&& (
$value instanceof LazyCollection
|| is_array($value)
|| $value instanceof ArrayAccess
)
) {
yield $key => $forget($value, $v);
continue 2;
}
if ($k === $key) {
continue 2;
}
}
yield $key => $value;
}
});
}
foreach ($keys as $k => $v) {
if (is_array($v)) {
$forget($data, $v);
} else {
unset($data[$k]);
}
}
return $data;
};
return new static(function () use ($keys, $forget) {
yield from $forget($this, $keys);
});
} |
For some reason, my previous comment was lost, so I'm writing again. As taylor and joseph have said, this PR is destructive. Now that joseph says "we have to decide what to do if the child of a LazyCollection is a LazyCollection", I think we have three choices.
Personally, I don't want to give up on the third one. |
To test the changes accommodating dot notation, I ran tests in SupportCollectionTest::testExcept() using the following data: $data = new $collection([
'social' => [
'twitter' => '@taylorotwell',
'github' => 'taylorotwell',
],
'projects' => new $collection([
'laravel' => [
'stars' => 30000,
'forks' => 5000,
],
'spark' => [
'stars' => 2000,
'forks' => 500,
],
]),
]);
$this->assertEquals([
'social' => [
'github' => 'taylorotwell',
],
'projects' => [
'laravel' => [],
'spark' => [
'stars' => 2000,
],
],
], $data->except(['social.twitter', 'projects.laravel.stars', 'projects.laravel.forks', 'projects.spark.forks'])->toArray()); For LazyCollection, the test passed. However, for the standard Collection, I encountered the following warning, which caused the test to fail: framework/src/Illuminate/Collections/Arr.php Lines 286 to 294 in 4b22e39
Is this because Collection::except() is not designed to accommodate types like Collection<TKey, Collection<TKey, array>>? If so, the first step is to decide whether to allow such types. Should I consider refactoring Collection::except() with its own implementation, rather than relying on Arr::except()? |
@fuwasegu What about a compromise:
Lazy collections that are not lazy is a bad thing. It would be nice to see it fixed. |
@Angel5a Thank you for your excellent suggestions! Indeed.
is a very clear-cut compromise! Dot support is already implemented, so it is possible to incorporate it starting with 11.x. @taylorotwell @JosephSilber What do you think of this idea? |
Not sure about that. Leads to unintuitive swings in memory usage. Adding a dot now completely changes the memory footprint. By selecting "fewer" values, I'll be increasing my memory footprint. Not what I'd expect. |
@JosephSilber Yes, that's the problem, but
For now we get closer for desired behavior (negotiate specific use-case). For 11.x we can completely resolve the problem, even by introducing BC change. Penalty is in dot presence check. |
I honestly just can't commit to maintaining that much code for this use case. |
Overview
I've reduced the memory usage by reimplementing the except() method of LazyCollection::class to use lazy evaluation.
Motivation
LazyCollection is a sibling of the Collection class that utilizes Generators for lazy evaluation.
I believe that methods implemented by LazyCollection should be lazily evaluated using yield wherever possible.
However, while it's often unavoidable to use passthru() to delegate to the Collection class, doing so defeats the purpose of using LazyCollection. This is because all the data gets loaded into memory, negating the benefits of lazy evaluation.
Tests
As this is a refactoring, the fact that no test modifications were needed indicates that the functionality remains unbroken.
References
I previously refactored LazyCollection::take() for similar reasons in PR #48382.
Performance Metrics
I measured the performance before and after the refactoring using the following test code, focusing on both execution time and memory usage. Please use these metrics to evaluate the effectiveness of this refactoring.
Code
Results
About memory usage:
About execution time: