-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Queries currently fail on large datasets #242
Comments
Tagging myself since I submitted this through support. |
@erichiggins Are you referring to cloud support? @feczo Can you try to make this clear when filing future issues? Thanks |
@erichiggins Why do you need to fetch this many at once? Why can't you save the cursor and start a new query? Also, a raw query with no filters is asking for the whole firehose of the dataset. Are you doing some kind of large-scale transformation? |
FWIW, #204 added the ability to use the cursor on a query. |
@dhermes I can certainly use cursors in cases where large datasets are required, but there doesn't appear to be any documentation around cursor support for this library, or any documented limitations regarding how many entities is "too many". Regarding the raw query, there are cases where we need to fetch all entities of a given type when syncing with other systems for a variety of purposes. This is quite intentional and we look forward to this functionality through the Datastore API since it's fairly painful to do with the Remote API or having GAE send the data outward. |
Some thoughts:
|
Cursor should be documented (and be recommend as the idiomatic way to iterate over large result set) and developer should have some control over the /cc @pcostell who can comment on the implementation in NDB. |
I agree. If a certain cursor size results in 500s (even if they are flaky), is it worth disallowing it in user land? Or are you suggesting that the UPDATE: @proppy which docs are you referring to? The official Cloud Datastore API query docs don't mention batching or max fetch size or anything it seems. I suppose any query to any webserver that begins to exceed 60s will cause problems. Is there some part of the documentation regarding max payload size or request time-outs? |
@dhermes Thanks for the follow-up. Just a few quick responses to some of the points you made:
Thanks again for the quick support on this. Looking forward to the improvements! |
@erichiggins first of all, thanks for using the library and providing us with extremely useful feedback! |
@silvolu Awesome. Thank you! |
Right now Cloud Datastore does have the limitation that RPCs will fail if they exceed 60 seconds. For now we recommend issuing queries for smaller batches and using cursors to get all the result. Eventually if an RPC is taking too long Cloud Datastore should respond with whatever progress is made with a cursor to continue the work. |
Ok great.
|
@dhermes Thanks for the updates! Regarding versioning, there are a couple of pieces that coincide. In both a development and production environment we rely on a specific package version for tools like pip and virtualenv. It's possible to use a commit hash to lock to down to a point in time in order to ensure consistency, but then there becomes a problem of knowing when/if to "upgrade". That's why I also asked for a Changelog in this repo (#243) which, in addition to versioning, makes it clear to developers what has changed. If you started using Releases on GitHub, then you can effectively get a Changelog without adding a specific file that also needs to be maintained. I usually cite the Python Requests library as a good example to follow. |
@erichiggins for versioning the plan is effectively to use releases on GitHub. Stay tuned :) |
Can we close this out? We are inches from our first tag / release and it would be more appropriate to open an issue about documenting limits of the API and potentially putting a cap on query fetch size under the hood and stringing queries together if the number of entities requested is too large. @pcostell do you think 1000 entities is a reasonable cap? Do you think it even makes sense for a library to have such a cap? |
The servlet has actually been updated to handle timing issues. So if you do hit a time-based or size-based limit it should return partial results. In this case, the server should return query results with more_results set. In general I think it makes sense for client libraries not to impose caps. The server should have proper handling for these cases. That way when the server changes what it can handle, things will continue to work for users. |
@pcostell I was under the impression from googleapis/google-cloud-node#5 that Is this no longer the case? Provided it is implemented, this is super exciting! |
Whatever the limit and type (time-based or number of entities), please ensure that it's well-documented so developers don't end up debugging. Looking forward to the release! |
@erichiggins We hope to cut a release soon. LMK if you are still having issues and I'm happy to re-open. |
* chore: update to gapic-generator-python 1.5.0 feat: add support for `google.cloud.<api>.__version__` PiperOrigin-RevId: 484665853 Source-Link: googleapis/googleapis@8eb249a Source-Link: googleapis/googleapis-gen@c8aa327 Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiYzhhYTMyN2I1ZjQ3ODg2NWZjM2ZkOTFlM2MyNzY4ZTU0ZTI2YWQ0NCJ9 * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * update version in gapic_version.py * chore: Update to gapic-generator-python 1.6.0 feat(python): Add typing to proto.Message based class attributes feat(python): Snippetgen handling of repeated enum field PiperOrigin-RevId: 487326846 Source-Link: googleapis/googleapis@da380c7 Source-Link: googleapis/googleapis-gen@61ef576 Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiNjFlZjU3NjJlZTY3MzFhMGNiYmZlYTIyZmQwZWVjZWU1MWFiMWM4ZSJ9 * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * feat: new APIs added to reflect updates to the filestore service - Add ENTERPRISE Tier - Add snapshot APIs: RevertInstance, ListSnapshots, CreateSnapshot, DeleteSnapshot, UpdateSnapshot - Add multi-share APIs: ListShares, GetShare, CreateShare, DeleteShare, UpdateShare - Add ConnectMode to NetworkConfig (for Private Service Access support) - New status codes (SUSPENDED/SUSPENDING, REVERTING/RESUMING) - Add SuspensionReason (for KMS related suspension) - Add new fields to Instance information: max_capacity_gb, capacity_step_size_gb, max_share_count, capacity_gb, multi_share_enabled PiperOrigin-RevId: 487492758 Source-Link: googleapis/googleapis@5be5981 Source-Link: googleapis/googleapis-gen@ab0e217 Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiYWIwZTIxN2Y1NjBjYzJjMWFmYzExNDQxYzJlYWI2YjY5NTBlZmQyYiJ9 * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * chore: Update gapic-generator-python to v1.6.1 PiperOrigin-RevId: 488036204 Source-Link: googleapis/googleapis@08f275f Source-Link: googleapis/googleapis-gen@555c094 Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiNTU1YzA5NDVlNjA2NDllMzg3MzlhZTY0YmM0NTcxOWNkZjcyMTc4ZiJ9 * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * Set library type to GAPIC_AUTO * fix(deps): Require google-api-core >=1.34.0, >=2.11.0 fix: Drop usage of pkg_resources fix: Fix timeout default values docs(samples): Snippetgen should call await on the operation coroutine before calling result PiperOrigin-RevId: 493260409 Source-Link: googleapis/googleapis@fea4387 Source-Link: googleapis/googleapis-gen@387b734 Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiMzg3YjczNDRjNzUyOWVlNDRiZTg0ZTYxM2IxOWE4MjA1MDhjNjEyYiJ9 * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * add gapic_version.py * chore(python): fix warehouse-package-name for resourcemanager/v3 PiperOrigin-RevId: 495386037 Source-Link: googleapis/googleapis@eeb9d2b Source-Link: googleapis/googleapis-gen@a497570 Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiYTQ5NzU3MGYxM2I3NDRhMzhmNDdhMTlhYTBmYTZhNGRmZDZjMGEyOSJ9 * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * use templated owlbot.py and autogenerated setup.py Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: Anthonios Partheniou <partheniou@google.com>
Source-Link: googleapis/synthtool@f8077d2 Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:dfa9b663b32de8b5b327e32c1da665a80de48876558dd58091d8160c60ad7355 Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
Source-Link: https://togithub.com/googleapis/synthtool/commit/30bd01b4ab78bf1b2a425816e15b3e7e090993dd Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:9bc5fa3b62b091f60614c08a7fb4fd1d3e1678e326f34dd66ce1eefb5dc3267b
Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
🤖 I have created a release \*beep\* \*boop\* --- ## [2.0.0](https://www.github.com/googleapis/python-dialogflow/compare/v1.1.0...v2.0.0) (2020-12-14) ### ⚠ BREAKING CHANGES * use microgenerator. See [Migration Guide](https://github.com/googleapis/python-dialogflow/blob/master/UPGRADING.md). (#239) ### Features * use microgenerator ([#239](https://www.github.com/googleapis/python-dialogflow/issues/239)) ([57c90a5](https://www.github.com/googleapis/python-dialogflow/commit/57c90a5b72668e599047b358f634f939d70a051f)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please).
…242) Source-Link: googleapis/synthtool@8e55b32 Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:c6c965a4bf40c19011b11f87dbc801a66d3a23fbc6704102be064ef31c51f1c3
🤖 I have created a release \*beep\* \*boop\* --- ### [3.4.1](https://www.github.com/googleapis/python-bigquery-datatransfer/compare/v3.4.0...v3.4.1) (2021-11-01) ### Bug Fixes * **deps:** drop packaging dependency ([2a7e0db](https://www.github.com/googleapis/python-bigquery-datatransfer/commit/2a7e0dba3714d1664d9c67518040ccf6b51eda83)) * **deps:** require google-api-core >= 1.28.0 ([2a7e0db](https://www.github.com/googleapis/python-bigquery-datatransfer/commit/2a7e0dba3714d1664d9c67518040ccf6b51eda83)) ### Documentation * list oneofs in docstring ([2a7e0db](https://www.github.com/googleapis/python-bigquery-datatransfer/commit/2a7e0dba3714d1664d9c67518040ccf6b51eda83)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
- [ ] Regenerate this pull request now. PiperOrigin-RevId: 451250442 Source-Link: googleapis/googleapis@cca5e81 Source-Link: googleapis/googleapis-gen@0b219da Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiMGIyMTlkYTE2MWE4YmRjYzNjNmY3YjJlZmNkODIxMDUxODJhMzBjYSJ9
* chore: upgrade gapic-generator-java, gax-java and gapic-generator-python PiperOrigin-RevId: 423842556 Source-Link: googleapis/googleapis@a616ca0 Source-Link: googleapis/googleapis-gen@29b938c Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiMjliOTM4YzU4YzFlNTFkMDE5ZjJlZTUzOWQ1NWRjMGEzYzg2YTkwNSJ9 * 🦉 Updates from OwlBot See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: Anthonios Partheniou <partheniou@google.com>
🤖 I have created a release *beep* *boop* --- ## [2.10.0](googleapis/python-texttospeech@v2.9.1...v2.10.0) (2022-02-03) ### Features * add api key support ([#242](googleapis/python-texttospeech#242)) ([3b4f0d0](googleapis/python-texttospeech@3b4f0d0)) ### Bug Fixes * resolve DuplicateCredentialArgs error when using credentials_file ([4c11b12](googleapis/python-texttospeech@4c11b12)) ### Documentation * update comments for ListVoicesRequest ([#244](googleapis/python-texttospeech#244)) ([bc5b73f](googleapis/python-texttospeech@bc5b73f)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
- [ ] Regenerate this pull request now. docs: list oneofs in docstring fix(deps): require google-api-core >= 1.28.0 fix(deps): drop packaging dependency committer: busunkim96@ PiperOrigin-RevId: 406468269 Source-Link: googleapis/googleapis@83d81b0 Source-Link: googleapis/googleapis-gen@2ff001f Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiMmZmMDAxZmJhY2I5ZTc3ZTcxZDczNGRlNWY5NTVjMDVmZGFlODUyNiJ9
) Source-Link: https://togithub.com/googleapis/synthtool/commit/d6103f4a3540ba60f633a9e25c37ec5fe7e6286d Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:39f0f3f2be02ef036e297e376fe3b6256775576da8a6ccb1d5eeb80f4c8bf8fb
…242) Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Source-Link: googleapis/synthtool@571ee2c Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:660abdf857d3ab9aabcd967c163c70e657fcc5653595c709263af5f3fa23ef67
Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
Source-Link: googleapis/synthtool@38e11ad Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:4e1991042fe54b991db9ca17c8fb386e61b22fe4d1472a568bf0fcac85dcf5d3
* fix: Add async context manager return types chore: Mock return_value should not populate oneof message fields chore: Support snippet generation for services that only support REST transport chore: Update gapic-generator-python to v1.11.0 PiperOrigin-RevId: 545430278 Source-Link: googleapis/googleapis@601b532 Source-Link: googleapis/googleapis-gen@b3f18d0 Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiYjNmMThkMGY2NTYwYTg1NTAyMmZkMDU4ODY1ZTc2MjA0NzlkN2FmOSJ9 * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Source-Link: https://togithub.com/googleapis/synthtool/commit/25083af347468dd5f90f69627420f7d452b6c50e Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:e6cbd61f1838d9ff6a31436dfc13717f372a7482a82fc1863ca954ec47bff8c8
* chore(python): drop python 3.6 Source-Link: googleapis/synthtool@4f89b13 Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:e7bb19d47c13839fe8c147e50e02e8b6cf5da8edd1af8b82208cd6f66cc2829c * add api_description to .repo-metadata.json * require python 3.7+ in setup.py * remove python 3.6 sample configs * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * trigger CI Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: Anthonios Partheniou <partheniou@google.com>
* docs: Add documentation for enums fix: Add context manager return types chore: Update gapic-generator-python to v1.8.1 PiperOrigin-RevId: 503210727 Source-Link: googleapis/googleapis@a391fd1 Source-Link: googleapis/googleapis-gen@0080f83 Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiMDA4MGY4MzBkZWMzN2MzMzg0MTU3MDgyYmNlMjc5ZTM3MDc5ZWE1OCJ9 * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
* chore(python): Add Python 3.12 Source-Link: googleapis/synthtool@af16e6d Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:bacc3af03bff793a03add584537b36b5644342931ad989e3ba1171d3bd5399f5 * add python 3.12 to noxfile.py * Add trove classifier for python 3.12 --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: Anthonios Partheniou <partheniou@google.com>
Queries currently fail on large datasets (>6000 entities). [1] We
have many cases where we have well over 30K entities of a given Model that
we'd like the ability to iterate over.
[1] Code snippet & traceback:
The text was updated successfully, but these errors were encountered: