-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datastore: Using new datastore aggregation queries for count #1781
Comments
@kolea2 Would you mind giving us some feedback on this? This looks like a very reasonable and seamless improvement to the way we do |
@meltsufin I'm unfamiliar with |
@kolea2 Thanks for feedback. The way we did the count before is this: Lines 865 to 872 in a478264
|
Indeed, current implementation triggers multiple calls to fetch all the entities in a paginated way, the link given below points to the code responsible for making subsequent calls when calling I would like to focus on the cost aspect of a key only query and aggregation query. There is a small cost (though negligible) associated with running the aggregation query whereas the key only query is categorised under Small operations which are free. Regarding the performance aspect, I agree Aggregation Query should be faster, as it offloads the heavy lifting of calculating the count value to the backend and egress traffic (response of aggregation query) is bare minimum (just an aggregated value). It would be interesting to prove the performance of Aggregation Query through some numbers by code profiling. Also the memory footprint of realizing the iterator in the last statement would be O(n) on the client side (where n is the number of total entities of that kind. 😨 StreamSupport.stream(keysFound.spliterator(), false).toArray(Key[]::new); |
Hi @meltsufin , @kolea2 and @jainsahab I've tried to measure the performance of both implementations by measuring the time it takes to get the count value using this simple java program , and here are the results.
|
I think the performance advantage is pretty striking. @jainsahab Thanks for pointing out the cost aspect, but I don't actually see a difference. Both methods seem to incur a cost of one entity read, for which there is a free tier as well. Can you clarify the cost difference? |
This is what doc says: But yeah, cost should actually be same, as the a key only query will get 1000 results per response and will be charged 1 entity read, and underlying client library will make multiple requests to satisfy the query and that's how pricing will end up being the same. In a nutshell since aggregation queries do not have any cost implications, they are definitely better over keys only queries |
) Modifying the implementation of DatastoreTemplate#count to use recently introduced [COUNT aggregation and Aggregation queries in datastore](https://cloud.google.com/datastore/docs/aggregation-queries). Fixes. GoogleCloudPlatform#1781
In the current implementation, the library performs client-side aggregations by fetching all the keys on the client side and then figuring out the count. In theory, the new aggregation queries and COUNT aggregation should be faster than the current one, as the current implementation makes use of the lazy iterator from the underlying client library that initiates multiple backend calls in cases where the number of keys is greater than the page size specified. And on top of that getting all the keys on the client side will have more egress cost compared to getting a count value.
Reference: COUNT aggregation and Aggregation queries in datastore
The text was updated successfully, but these errors were encountered: