-
Notifications
You must be signed in to change notification settings - Fork 13
Support sized pages in Page Streaming #86
Comments
I'm confused by this request. The GAX implementation of page streaming already constructs the result lazily as a Python generator, so there's no performance benefit to this (except perhaps that we can use the batch size to set the page_size field) -- the difference seems to be only in the depth of nesting the result in the object we're returning. What is the use case for this? |
There is a request to allow the page streaming iterator yield batches of results instead of individual items. @anthmgoogle can provide more details. |
Is this feature (a) an abstraction on top of paging, where each element of the response contains a fixed number of instances of an API resource, and where each element may not correspond to a single request? (b) a way to hide paging concepts like page_token, next_page_token, and page_size, while still exposing the contents of the page itself? The examples you gave seem more like (a), but I don't see the benefit over the current version of page streaming that we have. (b) seems like a nice way not to hide the API requests under a Python generator like we currently do. It also doesn't require us to define additional concepts like In (b), I mean something like:
where each tuple corresponds to exactly one API response, and contains a maximum of, but not necessarily exactly, four elements, depending on what the response was. |
Correct me if I'm wrong, but the request here is to allow accessing the page data underlying in the data stream, so that users can see the page token (or remember it for later uses), for example. That means, grouping items by a number is not sufficient, and that could be wrong if the server may not fill the data elements of the requested page size. |
It's definitely (b), a way of making going page by page also have a nice idiomatic surface rather than mucking around with tokens. The terms "Pagination" and "Batched" are both problematic descriptions of this. We already rejected "Pagination" because of the disconnect with the english meaning. "Batching" is a very different feature in gRPC. I think Paging or Paged are clearer. |
@jmuk Users can just save a Python generator that maintains the page token in its state; they shouldn't need to use the token itself. If they do need to see the token, you can always turn off the feature as |
@anthmgoogle: I just updated the issue name, please adjust if necessary |
Agree this is a good name for the feature. |
If it's (b), I still think that specifying the size of the batch (or page or whatever) isn't sufficient, because anyways it doesn't respect to the actual page data behind the scene. It's rather a boolean flag of whether iterating over resources or over pages. |
Also -- I think this is similar to files (or string streams) in a sense; someone wants to look through individual characters, and someone else wants to iterate over lines. In Python, files are iterable but have some utility methods to support other type of iterations. Thus I suggest a different idea; list_topic_subscriptions() will return an iterable object -- which iterates over resources by default -- but have a method, pages(), for iterating over pages, like as follows:
|
For reference, gcloud-java does something very much like what jmuk@ described. If that is also idiomatic to Python it sounds like a good approach. I was expecting this would present a nicer surface over the paging of the underlying API, not present a new paging view separate and on top of that. I think this may be important for cases of actual UI paging or map-reduce to optimize round-trips. It also seems more in keeping with the principle of making the calls nicer while keeping the relationship between client and server calls simple and clear. @jgeewax to confirm if this has been the case for paging implementations in the other gcloud languages. |
It's not. generators in python do not have extra embedded methods like this. |
We could modify CallOptions to change the "is_page_streaming" flag to a "pagination" flag that takes three values: page streaming, paging, or None. Alternatively, we could keep the CallOptions the same, and when is_page_streaming=False, then we can return a generator over pages, rather than the proto response. |
Please begin implementing the option you just described:
I think this issue has fulfilled its role in outlining alternatives. This option is best as it is most consistent with the existing design; and it should require only documentation changes to the generated code. Let's continue the design discussion on the implementation PR. In particular the question as whether to return the server pages or a view on them can continue there. I will update the starting comment to note this decision |
PTAL @tseaver |
Please see #94. |
Early feedback suggests that in addition to simple iteration, a batch mode of operation should also be supported for page-streamed calls, where the user is able to iterate by batches of results. E.g,
E.g, currently for page streaming the following works
There should be a way to support batching:
Proposals
Add a helper to google.gax to support batch iteration
Pros
Cons
Add a helper to google.gax, also add batch_size to CallOptions
Pros
Cons
@jmuk, @geigerj, @bjwatson, @anthmgoogle PTAL and discuss
Decision
There's an open question that is OK to resolve after implementation begins
The text was updated successfully, but these errors were encountered: