Proposed changes to search results retrieval api #187

philvarner · 2022-05-27T16:55:27Z

Motivation:

The motivation behind these changes is to provide a better balance in the API between the desire to hide search pagination from a user and the real-world latency in executing these methods.

For example, a user who naively executes this this call:

pystac_client.Client.from_file("https://planetarycomputer.microsoft.com/api/stac/v1").search().get_all_items()

will be attempting to load 50 million Items into memory. Fortunately this will likely never happen, as they won't wait for all 50,000 sequential requests required to pull them page by page, 1000 at a time.

Proposed Changes:

set the default of max_items to the same value as limit, so that (ideally) only one page of results will be fetched, instead of an unlimited number. (limit is only as suggestion to the server, so it may take multiple requests to get max_items even if it's set to the same value as limit).
rename get_item_collections to item_collections and get_items to items
deprecate ItemSearch.get_all_items - this has the potential to consume a large amount of memory, and has no benefit over the iterable get_items method.
deprecate `ItemSearch.get_all_items_as_dict - the point of this was to return the dict structure instead of going though the (expensive) pydantic unmarshalling to get objects, but this suffers from the same problems as get_all_items of potentially pulling a large number of objects into memory
add ItemSearch.get_items_as_dicts() -> Iterable[Dict[str, Any]] - this will operate the same as get_items() but will return the (never unmarshalled) dict representation of the Item rather than unmarshalled pydantic Item objects

The text was updated successfully, but these errors were encountered:

matthewhanson · 2022-05-27T22:25:43Z

These changes all seem reasonable to me.

Changing max_items = limit is fine, but think we should make sure if using the CLI that it indicates that it only fetched X of numberMatched (if available)

Makes sense to not have any function that actually gets all items since a user can do the same thing with an iterator by casting to List.

gadomski · 2022-05-31T13:51:01Z

I'm 👍🏽 for all proposed solutions (and Matt's amendments) except for that I'm not sure I agree with

an ItemCollection is a proper name and shouldn't be split

. I'm used to PascalCase mapping to pascal_case. To me, get_item_collections scans fine -- I would expect something like get_items_and_collections or get_items_or_collections if you were getting both somehow.

philvarner · 2022-05-31T14:14:17Z

In that example, the term "Pascal case" (two words, with a space) is being UpperCamelCase stylized to PascalCase or pascal_case stylized to pascal_case. I feel like there's a difference between words that are stylized like this in specs vs. multiple words and are stylized to distinguish them when concatenated in code.

How would you pascal_case the geojson geometries? MultiPolygon to multi_polygon or multipolyon, LineString to line_string or linestring, MultiLineString to multi_line_string or multilinestring?

gadomski · 2022-05-31T14:35:03Z

Fair. I'll convert to :+0: on the name change, curious if anyone else has thoughts.

This was referenced Jun 1, 2022

deprecate get_all_items_as_dict and get_all_items, add get_items_as_dicts #206

Merged

default ItemSearch max_items to 100, and make its use more prominent #208

Merged

philvarner added this to the 0.4.0 milestone Jun 1, 2022

philvarner closed this as completed Jun 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed changes to search results retrieval api #187

Proposed changes to search results retrieval api #187

philvarner commented May 27, 2022 •

edited

Loading

matthewhanson commented May 27, 2022

gadomski commented May 31, 2022

philvarner commented May 31, 2022

gadomski commented May 31, 2022

Proposed changes to search results retrieval api #187

Proposed changes to search results retrieval api #187

Comments

philvarner commented May 27, 2022 • edited Loading

matthewhanson commented May 27, 2022

gadomski commented May 31, 2022

philvarner commented May 31, 2022

gadomski commented May 31, 2022

philvarner commented May 27, 2022 •

edited

Loading