Datastore query limits are silently truncated by the v1beta3 API #1763

Bogdanp · 2016-04-28T12:48:00Z

This bug manifests itself in a couple of "fun" ways:

len(list(client.query(kind="Foo").fetch(limit=300)))

Will yield 300, assuming there are 300 or more entities of that kind.

len(list(client.query(kind="Foo").fetch(limit=500)))

Assuming there are 500 entities of that kind, will yield 500. If there are n entities where n > 500, it will yield n.

This all seems to be caused by the v1beta3 api truncating the limits to 300:

entities, more, cursor = client.query(kind="Foo").fetch(limit=500).next_page()

len(entities) will be 300 and not 500 even if there are 500 or more entities of that kind.

We're currently using the following snippet to work around this issue:

def _all_entities(iterator):
    entities, cursor = [], None
    while iterator._limit > 0:
        xs, more, cursor = iterator.next_page()
        iterator._limit -= len(xs)
        entities.extend(xs)
        if not more:
            break

    return entities, cursor

The text was updated successfully, but these errors were encountered:

dhermes · 2016-04-28T14:38:57Z

@Bogdanp This is a "feature" not a bug. The API doesn't promise to return your fetch limit, it just promises not to exceed it. It typically times out on the backend after a pre-set amount of time. Is the more value not useful? (The actual value returned by the backend is a bit different.)

@pcostell Can say more, though I am pre-emptively closing this since there isn't anything we can do in the library.

Bogdanp · 2016-04-28T14:43:45Z

it just promises not to exceed it

The library does exceed it in the second example I gave though: requesting 500 entities will end up returning more than the limit if there exist more entities than that limit.

dhermes · 2016-04-28T14:46:20Z

Oh wow, my bad! Good thing I forgot to click the close button. Reproducing now on my end.

Bogdanp · 2016-04-28T14:49:04Z

No worries! Regarding the "feature", is it possible to get that bumped up somehow or is that set in stone? The old API used to allow much higher values for limits.

Bogdanp · 2016-04-28T14:54:50Z

The issue is with the way iter works, btw: it naively calls next_page without modifying its limit based on the results it gets back so it eventually exceeds the limit if there are enough entities. This used to work fine because, like I mentioned above, the old API didn't truncate limits (if it did, it did so at a much higher number than 300). The fix would probably look a lot like my _all_entities function above.

dhermes · 2016-04-28T14:55:25Z

OK this is totally a "bug" in our implementation, essentially identical to #1467. It occurs because limit=500 is set on every request, but doesn't take into account the results already returned. Bugs like this are a reasons we are hesitant when we implement a __iter__ that makes more than one API call.

As for 300 vs. 500, that's completely out of gcloud-python's control, though maybe @pcostell can weigh in. You could also file an issue on https://github.com/GoogleCloudPlatform/google-cloud-datastore

dhermes · 2016-04-28T14:55:38Z

Jinx!

pcostell · 2016-04-28T16:21:43Z

The Cloud Datastore API has to reserve the right not to return all the results -- we need to protect both our servers and other users on them. This is why we expose our API with support for batching -- but the client libraries need to take advantage of this in order to expose a nicer API to our users.

I think __iter__ needs to make more than one API call -- it should return exactly how many items the user asks for (or less if those items don't exist). Also, I think next_page should probably be implemented in terms of __iter__, not the other way around. This is very similar to gcloud-node#1260, I'm just going to quote in something I said from there:

The way I see it is that our desired behavior is a streaming API. If a user asks for a limit of N, we will give them a stream of up to N entities. We can then add some helpers on top of that to convert the stream to a list.

Cloud Datastore then offers a paging feature, which provides the user a cursor they can use to get extra pages. On top of that, Cloud Datastore has the feature of best effort end condition tracking (more_results). This is very useful for users who are showing pages to their users as they can get information about whether or not they should show a "Next Page" button (this is MORE_RESULTS_AFTER_LIMIT). This feature is conservative (MORE_RESULTS_AFTER_LIMIT may be returned even if the correct result is NO_MORE_RESULTS).

Importantly, Cloud Datastore's paging feature is completely separate from the batching API, which is really just an implementation detail because prior to gRPC we couldn't offer a streaming API.

dhermes · 2016-04-29T20:17:16Z

Thanks @pcostell.

pcostell · 2016-05-04T17:02:48Z

@dhermes can we get this bug assigned to someone? The Cloud Datastore batch size shouldn't be something client library users should have to worry about.

Fixes googleapis#1763.

In the process, also keeping tracking of the number of skipped results (so that we can update the offset). Fixes googleapis#1763.

Rafff · 2017-06-06T12:33:11Z

Hey.
So there is no way to get more than 300 results at once ?
I can see that this issue has been closed for a long time but it seems that this "problem" (or feature?) still exists.

dhermes · 2017-06-06T16:20:01Z

@Rafff How do you mean? The API has limits based on payload size and there isn't anything we can do about it in the client. However, we do provide an iterator so that you don't have to worry about it.

ibrahim-abuelalaa · 2017-12-11T15:32:51Z

Hi @dhermes ,
I have an issue and I believes that it somehow related to this issue my datastore have more than 8000 entities but when i am trying to use offset>350 and limit=2000 it return less than 1900 entities and if I increased my offset to 1000 it returns 0 entities

Hint: all my entities match the query

dhermes · 2017-12-11T16:52:48Z

@ibrahim-abuelalaa Would you mind sharing some code so we can try to reproduce?

ibrahim-abuelalaa · 2017-12-11T17:23:03Z

@dhermes

client = datastore.Client()
query = client.query(kind = kind)
query.add_filter('projectKey', '=', projectKey)
len(list( query.fetch(limit=limit, offset=offset)))

dhermes · 2017-12-11T17:25:31Z

So you have something like:

>>> query_iter1 = query.fetch()
>>> len(list(query_iter1))
8001
>>>
>>> query_iter2 = query.fetch(offset=351, limit=2000)
>>> len(list(query_iter2))
1899
>>>
>>> query_iter3 = query.fetch(offset=1000, limit=2000)
>>> len(list(query_iter3))
0

dhermes · 2017-12-11T17:28:15Z

@ibrahim-abuelalaa Is the distinction relevant? (I.e. is "almost" good enough?) I just want to be able to reproduce this issue.

ibrahim-abuelalaa · 2017-12-11T17:31:29Z

@dhermes yes it is exactly like that, sorry for confusion

edwardfung123 · 2018-05-08T18:28:57Z

Hello. Is there any way to work around the 300 entities limit? Is there any documentation about this 300 limit? I searched the library source code and nothing related to this 300 limit. Is it from the server side? To iterating all the entities, what is the best practice? Should we set the limit to None or use the cursor way?

tseaver added the api: datastore Issues related to the Datastore API. label Apr 28, 2016

dhermes self-assigned this May 4, 2016

dhermes added a commit to dhermes/google-cloud-python that referenced this issue May 11, 2016

Reducing limit or the life of a datastore query iterator.

1ad2799

Fixes googleapis#1763.

dhermes mentioned this issue May 11, 2016

Reducing limit over the life of a datastore query iterator. #1787

Merged

dhermes added a commit to dhermes/google-cloud-python that referenced this issue May 12, 2016

Reducing limit or the life of a datastore query iterator.

2a72031

Fixes googleapis#1763.

dhermes added a commit to dhermes/google-cloud-python that referenced this issue May 17, 2016

Reducing limit or the life of a datastore query iterator.

67ad4d8

Fixes googleapis#1763.

dhermes added a commit to dhermes/google-cloud-python that referenced this issue May 17, 2016

Reducing limit or the life of a datastore query iterator.

7b54978

Fixes googleapis#1763.

dhermes added a commit to dhermes/google-cloud-python that referenced this issue May 17, 2016

Reducing limit or the life of a datastore query iterator.

53715f1

Fixes googleapis#1763.

dhermes closed this as completed in #1787 May 18, 2016

theacodes unassigned dhermes Sep 28, 2018

JustinBeckwith assigned dhermes Feb 1, 2021

dwsupplee mentioned this issue Nov 15, 2022

fix(datastore): query limit more than 300 googleapis/google-cloud-php#5592

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datastore query limits are silently truncated by the v1beta3 API #1763

Datastore query limits are silently truncated by the v1beta3 API #1763

Bogdanp commented Apr 28, 2016 •

edited

Loading

dhermes commented Apr 28, 2016

Bogdanp commented Apr 28, 2016 •

edited

Loading

dhermes commented Apr 28, 2016

Bogdanp commented Apr 28, 2016

Bogdanp commented Apr 28, 2016 •

edited by dhermes

Loading

dhermes commented Apr 28, 2016

dhermes commented Apr 28, 2016

pcostell commented Apr 28, 2016

dhermes commented Apr 29, 2016

pcostell commented May 4, 2016

Rafff commented Jun 6, 2017

dhermes commented Jun 6, 2017

ibrahim-abuelalaa commented Dec 11, 2017 •

edited

Loading

dhermes commented Dec 11, 2017

ibrahim-abuelalaa commented Dec 11, 2017 •

edited by dhermes

Loading

dhermes commented Dec 11, 2017

dhermes commented Dec 11, 2017 •

edited

Loading

ibrahim-abuelalaa commented Dec 11, 2017

edwardfung123 commented May 8, 2018

Datastore query limits are silently truncated by the v1beta3 API #1763

Datastore query limits are silently truncated by the v1beta3 API #1763

Comments

Bogdanp commented Apr 28, 2016 • edited Loading

dhermes commented Apr 28, 2016

Bogdanp commented Apr 28, 2016 • edited Loading

dhermes commented Apr 28, 2016

Bogdanp commented Apr 28, 2016

Bogdanp commented Apr 28, 2016 • edited by dhermes Loading

dhermes commented Apr 28, 2016

dhermes commented Apr 28, 2016

pcostell commented Apr 28, 2016

dhermes commented Apr 29, 2016

pcostell commented May 4, 2016

Rafff commented Jun 6, 2017

dhermes commented Jun 6, 2017

ibrahim-abuelalaa commented Dec 11, 2017 • edited Loading

dhermes commented Dec 11, 2017

ibrahim-abuelalaa commented Dec 11, 2017 • edited by dhermes Loading

dhermes commented Dec 11, 2017

dhermes commented Dec 11, 2017 • edited Loading

ibrahim-abuelalaa commented Dec 11, 2017

edwardfung123 commented May 8, 2018

Bogdanp commented Apr 28, 2016 •

edited

Loading

Bogdanp commented Apr 28, 2016 •

edited

Loading

Bogdanp commented Apr 28, 2016 •

edited by dhermes

Loading

ibrahim-abuelalaa commented Dec 11, 2017 •

edited

Loading

ibrahim-abuelalaa commented Dec 11, 2017 •

edited by dhermes

Loading

dhermes commented Dec 11, 2017 •

edited

Loading