-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add iterators for non pdo drivers. #2718
Conversation
You could use a generator instead of creating classes for the iterators, it would simplify the code needed for the feature. |
@jvasseur agreed that would be easier...as |
The |
@jvasseur wow, I did not know that! Thanks, will update PR, this makes it way easier. |
@jvasseur updated PR, thanks! Will likely give it some time for feedback before rebasing so's we can all forget that whole iterator thing ever happened 😆 Have manually tested the SQL server implementation and it works. |
While the idea is good, using generators is a BC break when doing the following: $result = $statement->getIterator();
foreach ($result as $row) {
}
// will error because generators cannot be reused
foreach ($result as $row) {
} Not sure if we should care though. /cc @Ocramius |
We could provide a rewindable iterator somehow: tricky but feasible
…On 17 May 2017 5:30 p.m., "Steve Müller" ***@***.***> wrote:
While the idea is good, using generators is a BC break when doing the
following:
$result = $statement->getIterator();foreach ($result as $row) {}// will error because generators cannot be reusedforeach ($result as $row) {}
Not sure if we should care though. /cc @Ocramius
<https://github.com/ocramius>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2718 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAJakIgNx32rdEVeOh3V6FTDZid3Ns2tks5r6xKMgaJpZM4NTuoD>
.
|
@Ocramius could you elaborate a bit more, besides the technical possibility and BC, what could it be needed for? Generators do not allow to be rewound on purpose, and I think the same is applicable to the statements: as a developer, I'd better be informed that I'm re-fetching data from the same statement twice and redesign my logic instead of just fetching the same data again. |
@morozov we could accumulate the data that was already iterated upon |
@morozov I tend to agree with your argumentation but TBH I still find it confusing that generators implement @Ocramius do you mean "caching" fetched data in the object while iterating? What is wrong with simply executing the statement again on rewind? |
Side effects |
@deeky666 for example, a query that selects values from a sequence will allocate new sequence values, instead of retrieving the same ones over multiple calls. |
@Ocramius good point. That's reason enough for me. So then iterators should probably really be forward only here. I don't see how we could make them rewindable then without storing fetched rows in memory. |
I've added a failing test case for the re-usable iterator problem. |
I added an example of a |
$data = $this->fetchAll(); | ||
|
||
return new \ArrayIterator($data); | ||
$that = $this; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use $this
inside the closure instead of doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Been working on a legacy codebase for too long...thanks
return $g(); | ||
} | ||
|
||
public function count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to implement the countable interface here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, you're right, will remove 👍
57522c0
to
790a7ec
Compare
|
||
public function getIterator() | ||
{ | ||
$g = $this->generator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return ($this->generator)();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would simply call the generator callable at each foreach () {}
(because each foreach
will cause a ->getIterator()
call).
That's most likely what we do NOT want, as a query will be repeated.
The test case is as following:
$result = someQuery();
foreach ($result as $value) {
}
foreach ($result as $value) {
}
self::assertCount(1, $executedQueries);
$j++; | ||
} | ||
|
||
$this->assertEquals($i, $j); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The amount of executed queries is to be checked here - you can use something like SELECT CURRENT_TIME_IN_MICROSECONDS()
(or equivalent - I made this SQL up) to verify that the result doesn't change over multiple calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a way of counting the number of executed queries using DebugStack
take a look, if that doesn't quite get what we need I will do the microseconds thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, I just checked: the logger is only used outside statements, not inside them :-(
This means that the logger will only log the creation of the statement and its first execution.
A stateful query would most likely be better. Sorry for not noticing this before!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your reply and sorry for my delay. I've pushed something which I'm pretty sure is what you mean. However, I don't think this works. I can't seem to get microseconds. Only thing I can do to get around this is to add a sleep call 😞 . Any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is likely because of the MySQL version. 5.6 and up seems to support it. Travis is on a pre 5.6 version of MySQL IIRC although I think the new trusty boxes may be newer. So I guess we could update travis to build against the newer versions if we want (if this is indeed the issue!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do that, yes (using trusty on travis-ci). Ideally, we'd test this feature with SQLite in-memory though, unless we want something mysql-specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaict, travis is using mariadb which I think supports microseconds, so think it should be fine?
On another note, I noticed the build is failing and it's because pdo doesn't actually support rewinding the iterator (well not by default is my understanding, may be able to with unbuffered queries) so made me questions if we even needed to support this for non pdo drivers, or if we should, should we wrap the pdo statements so they can be rewound too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requires testing against stateful queries
If an iterator cannot be rewinded, we'll just need to build an accumulator and rewind ourselves (at the cost of memory): providing inconsistent features is worse than providing features at all :-) |
b078876
to
926c94c
Compare
@Ocramius to clarify, if we're going to provider rewindable iterators for non pdo drivers we should add something to do the same for pdo drivers? If so, I can take a look at doing that. |
At least looking at what is feasible first. Rewindable iterator with emulation means memory usage. |
@Ocramius I've committed an example of a PDO iterator where we store a cache and can be rewound. Just wanted to remind ourselves of what the PR intentions were and where we're at now. Initial: Non PDO drivers were running out of memory when iterating over large data sets due to Iterator couldn't be reiterated or couldn't be rewound. So a PDO couldn't be reiterated noticed after adding an iterator that could be reiterated (rewound) for non PDO drivers that PDO drivers could not actually be reiterated, not sure if this is a bug or by design. Latest I've added an example of a So...we're basically gone from non PDO drivers running out of memory when iterating over large data to PDO drivers now running out of memory when iterating over large data sets 😆 Anyway, let me know how you wish to proceed 😄 |
@jenkoian the alternative is that we make them ALL non-rewindable, and that's it. That means that the user is left the decision to call Thoughts? |
Options as I can see it are: a) Consistent, BC Break b) Consistent, doesn't break BC c) Inconsistent, doesn't break BC If you're looking for consistency I would say option b would be the best choice. IMO I think option c is the best and inconsistency here can be tolerated due to shortcoming of PDO. |
Fairly sure we need this scenario then: can't afford OOM |
…ons. Only added to mysqli for now, can add to other drivers (assuming they support reseting the pointer in some way) after feedback.
statements in a rewindable iterator. Skipped the test for iterating multiple times with a message regarding that iterators are non rewindable.
21e1740
to
581511e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a few minor issues. Looks good overall.
public function testGettingIteratorDoesNotCallFetch() | ||
{ | ||
$stmt = $this->createMock(Statement::class); | ||
$stmt->expects($this->never())->method('fetch'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also add all other fetch*
methods here (fetchAll()
, fetchColumn()
).
@@ -20,6 +20,8 @@ | |||
namespace Doctrine\DBAL\Driver\OCI8; | |||
|
|||
use Doctrine\DBAL\Driver\Statement; | |||
use Doctrine\DBAL\Driver\StatementIterator; | |||
use \IteratorAggregate; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The leading namespace separator is redundant.
|
||
public function testIterationCallsFetchOncePerStep() | ||
{ | ||
$values = ['foo', '', 'bar', '0', 'baz', 0, 'qux', null, 'quz', false]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to add one more value after false
to prove that fetching false
terminates iteration, while other empty values don't (see original example).
581511e
to
078d32e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢 |
The methods has more limitations and caveats than provides real use: 1. It fetches all data in memory which is often inefficient (see doctrine#2718). 2. It fetches rows in memory one by one instead of using `fetchAll()`. 4. It doesn't allow to specify the statement fetch mode since it's instantiated internally. 5. It can be easily replaced with: ```php foreach ($conn->executeQuery($query, $params, $types) as $value) { yield $function($value); } ```
The methods has more limitations and caveats than provides real use: 1. It fetches all data in memory which is often inefficient (see doctrine#2718). 2. It fetches rows in memory one by one instead of using `fetchAll()`. 4. It doesn't allow to specify the statement fetch mode since it's instantiated internally. 5. It can be easily replaced with: ```php foreach ($conn->executeQuery($query, $params, $types) as $value) { yield $function($value); } ```
The methods has more limitations and caveats than provides real use: 1. It fetches all data in memory which is often inefficient (see doctrine#2718). 2. It fetches rows in memory one by one instead of using `fetchAll()`. 4. It doesn't allow to specify the statement fetch mode since it's instantiated internally. 5. It can be easily replaced with: ```php foreach ($conn->executeQuery($query, $params, $types) as $value) { yield $function($value); } ```
Allows result sets to be iterated over without running out of memory on
large data sets.
--
Before: Iterating (e.g. foreaching) over a large data set would often run out of memory.
After: Iterating over a large data set uses iterators using the cursor so that memory usage remains low.
That's the aim anyway, tested (manually) the SQLSrv driver and it has the desired effect, could do with some help testing the other drivers if anyone is able? I would have liked to add some unit tests but none of the existing stuff is unit tested and I'm not really sure where I would even start, again any help here appreciated also.
Small update: I have reverted the changes for the mysqli driver as I don't quite understand that driver enough, it seems to behave unlike any of the others, again any help here much appreciated. Also came across the DataAccessTest which should cover most of this, it certainly brought out the issues with the mysqli driver, so that's good.
Another update: I have removed all the stuff with iterators, favouring generators. This makes things a lot easier as we can just re-use the existing fetch functionality. Generators are not without their issues though as they cannot be re-used, see the discussion below for more info, in particular this comment.
Update (20170724): Due to concerns over consistency both PDO and non PDO can now be iterated but not rewindable. A rewindable generator has been included if users wish to use this (possible this will be removed though?).
Update (20170914): StatementIterator class created which is now used by all non PDO statements. Extracting to own class makes it easier to test.