-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pagination: scans require an extra request to fetch an empty page of items #20
Comments
I vote for letting the consumer explicitly indicate the end of results. |
@davepacheco are you imagining that let next_page = if has_more {
items
.last()
.map(|last_item| {
let selector = get_page_selector(last_item, scan_params);
serialize_page_token(selector)
})
.transpose()?
} else {
None
}; Alternatively it seems like Less: the signature could be like this: pub fn new<F, ScanParams, PageSelector>(
items: Vec<ItemType>,
scan_params: &ScanParams,
next_page: Option<PageSelector>,
) -> Result<ResultsPage<ItemType>, HttpError>
PageSelector: Serialize, i.e. the consumer could be required to do all the work of figuring out the next page Or more: the signature could be like this: pub fn new<F, ScanParams, PageSelector>(
items: dyn Interator<Item=ItemType>,
pag_params: &PaginationParams<ScanParams, PageSelector>,
scan_params: &ScanParams,
get_page_selector: F,
) -> Result<ResultsPage<ItemType>, HttpError>
where
F: Fn(&ItemType, &ScanParams) -> PageSelector,
PageSelector: Serialize, In this case the consumer would look like this: let items = iter.map(|p| (*p).clone());
Ok(HttpResponseOk(ResultsPage::new(items, &pag_params, &scan_params, page_selector)?)) and let some_items = items.limit(limit).collect(); // <- I guess we'd need the request context to derive the limit
let next_page = items.next().map(|_| some_items.last().map(...)); i.e. have We could potentially use a |
Yes, that is what I was envisioning, but I don't love it because of how easy it is to get wrong.
Specifically, they're doing the part that today looks at the last item and constructs a page selector for it using the given scan params? I was a little worried it would be easy to do the wrong thing here (e.g., make a page selector for the first item, or to just look at the interface and not understand which item you were supposed to use).
Is this an Iterator over all the items in the scan, or just the one page of results that the client is getting back? With pagination, I usually think of the scan being backed by a traditional relational database, where you'll want to include the limit in the SQL query and fetch only one page of results. (This can affect the query plan and avoid having database do a bunch of extra throwaway work. I think it's also a safer programming pattern: if you don't do this, and somebody forgets to call The simplest efficient way I can think for an RDBMS-based consumer to know whether there are more results is to apply a limit of This is all easier if you're willing to accept incorrect behavior when the last page of results contains |
Another idea discussed in today's tactical huddle: instead of using a boolean to communicate to dropshot that there are no more results, it could be an enum of |
This is an extremely nitty nit, but IMO |
Sure. Maybe even a two-state |
This came back up in the 5/27 control plane demo. I'm a little bit wary of the Is there any updated thoughts on this? |
On the ternary model, is it correct to say that if an endpoint implements the query in such a way that it checks if there are more, the middle "maybe" option would not have to be exposed to the external consumer of that endpoint? (because that endpoint always knows whether it has more or not.) And therefore the maybe option is more about preserving the option for the Dropshot consumer to avoid implementing that check if necessary. |
Can you say more about this? You mean dropshot consumers are made more complicated by the proposed NoMore/MightHaveMore enum? Is there a standard way to express this? |
To be clear, in no case is the enum ever exposed to the external (endpoint) consumer. I think I initially suggested three variants because of the three cases where an endpoint implementor (1) went out of their way to check and knows there are more results, (2) went out of their way to check and knows there aren't more results, and (3) didn't go out of their way and doesn't know. The behavior for (1) and (3) is the same, so it could just be |
I think it's worth considering what we would want to do in our implementations. I think it's most standard that when somebody asks for a page of 500 results, you fetch 500 items from the database, in which case your answer here will be DontKnow. There are also standard patterns (that I think don't scale) where when somebody asks for a page of 500 results, you fetch 500 items and count the number of items that match the query, but that doesn't tell you whether there's another page of results without more information. We could built a standard pattern internally (and at one point had one, I think) where when somebody asks for a page of 500 items, we ask the database for 501 items, check how many we got, and return the first 500. I haven't seen this before but it seems correct, scalable, and adds almost no runtime cost. I'm not sure how easy it is to implement in the current code -- depends how many layers need to know about this distinction between the requested limit and the actual limit. |
I agree the fetch N+1 version is only practical middle ground. Knowing precisely how many things there are in total is rarely useful if N is greater than some small humane number like 20 or 30, and I expect most of our lists will be bigger than that. |
Currently, the
next_page
token onResultsPage
is populated whenever the page that we're returning to the client has any items in it. This works, but it means that for any scan through a collection, the client needs to make an extra request at the end to fetch an empty page in order to see that it's the end of the scan. Ideally, the last page with any results on it would indicate that there were no more results and we could skip this extra request. The problem is that right now, Dropshot doesn't know this information. One simple interface would be for the consumer to provide the limit that they used, and Dropshot could say that if they provided fewer items than the limit, then it's the end of the collection. (This would still require a theoretically request that's theoretically unnecessary if the scan ends with a full page of items, but this seems a lot less likely.) We could also ask the consumer to provide an explicit boolean indicating whether there are more items.Another consideration: I've worked with systems that would fetched N records from the database, fail to process 1-2 of them, and send the results back to the client. If a consumer worked this way, using the interface I described above, we would erroneously conclude that because the page was non-full, we've finished the scan -- having omitted an arbitrarily large number of results after that. Obviously that's just a bug in the consumer, but it'd be nice to make that impossible if we can.
The text was updated successfully, but these errors were encountered: