-
Notifications
You must be signed in to change notification settings - Fork 438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(datastore): query limit more than 300 #5592
fix(datastore): query limit more than 300 #5592
Conversation
modified: tests/System/QueryResultPaginationTest.php
6d4591c
to
63655f5
Compare
2377993
to
59a73da
Compare
59a73da
to
f902098
Compare
While researching this I came across a similar issue: googleapis/google-cloud-python#1763 It looks like the solution they landed on is to update the limit on subsequent calls in a paged set and utilize offsets. This approach could provide some benefit over using WDYT about using that approach here, too? I did some quick testing and it seems like we should be able to support it. |
@dwsupplee our doc suggests that we should rather move away from offsets to save costs, so we should continue with cursors (current approach is similar to Ruby, etc). To prevent over-fetching of records in last page, I switched to managing limits via a new I am keeping |
@vishwarajanand Just a heads up this is on my radar! I'll be reviewing shortly. |
@dwsupplee Sure, thanks for the headsup. I have merged the latest main to pull in all changes from |
@vishwarajanand and I discussed the PR offline a bit, looking good for the most part just a few minor updates to be expected. |
if (isset($res['query']['limit'])) { | ||
$remainingLimit = $res['query']['limit']; | ||
} | ||
if (isset($remainingLimit['value'])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks redundant - ['value']
could be coming from $res['query']['limit']
, or from $query->queryObject()['limit']
. So why set it above just to set it again here? It seems to me like what you'd want to do is set $remainingLimit
to null
, above, and then simply make this:
if (isset($res['query']['limit']['value'])) {
$remainingLimit = $res['query']['limit']['value'];
}
But maybe there's something I'm missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several cases here:
- In a standard Query, (either GRPC or REST) we don't get a
query
key in the response. So, we set$remainingLimit
using$query->queryObject()['limit']
:google-cloud-php/Datastore/src/Operation.php
Line 472 in 71ac41c
$remainingLimit = $query->queryObject()['limit']; - In GqlQuery, we don't have limit from the query. So while using REST, we get limit in
$res['query']['limit']
:google-cloud-php/Datastore/src/Operation.php
Line 507 in 71ac41c
$remainingLimit = $res['query']['limit']; - In GqlQuery while using GRPC, we get limit in
$res['query']['limit']['value']
:google-cloud-php/Datastore/src/Operation.php
Lines 509 to 510 in 71ac41c
if (isset($remainingLimit['value'])) { $remainingLimit = $remainingLimit['value'];
@@ -242,7 +242,7 @@ public function runQuery(array $args) | |||
$query['filter'] = $this->convertFilterProps($query['filter']); | |||
} | |||
|
|||
if (isset($query['limit'])) { | |||
if (isset($query['limit']) && is_int($query['limit'])) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vishwarajanand why was this added? This seems like it could break existing implementations if they supplied something like $query['limit'] = '1'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bshaffer I mentioned this in point #1 in PR description > Changes
:
Under Grpc mode, when a Gql query string goes through
runQuery
, and a parsed query object is returned with correctlimit
array, it doesn't need to be modified in subsequent fetches. Added a check to skip that.
In Grpc mode, the non-Gql query provides$query['limit']
as an int
which needs to be parsed into an array, but your comment suggests we support int-like string
also.
If so, we should change the is_int
to !is_array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes: #5039
Changes
Datastore loops through all pages via
EntityPageIterator
which usesPageIteratorTrait
.The trait expects
resultLimit
to be set while doing the subsequent calls, which was missing so any limit value of more than 300 was leading to all results getting fetched. Following bugs were fixed:runQuery
, and a parsed query object is returned with correctlimit
array, it doesn't need to be modified in subsequent fetches. Added a check to skip that.resultLimit
inOperation->runQuery(...)
via theEntityPageIterator
instance, so that the iterator can loop through all pages.EntityPageIterator
, when we fetch the current page only then we need to save thelimit
value because in case ofGqlQuery
we do not have the limit parsed in client library.Testing
Notes