-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bounds for embedded lists in QP #5202
Changes from all commits
f7bc02e
aeb9433
edc4ddc
46ab13f
c36ed04
0f9b3ca
750694b
f478784
f085b74
249f48e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,6 +9,7 @@ | |
from bson import ObjectId | ||
from dataclasses import asdict, dataclass | ||
from datetime import date, datetime | ||
import math | ||
import typing as t | ||
|
||
import asyncio | ||
|
@@ -139,11 +140,11 @@ async def lightning_resolver( | |
for item in sublist | ||
] | ||
|
||
filter = ( | ||
{f"{dataset.group_field}.name": input.slice} | ||
if dataset.group_field and input.slice | ||
else None | ||
) | ||
if dataset.group_field and input.slice: | ||
filter = {f"{dataset.group_field}.name": input.slice} | ||
dataset.group_slice = input.slice | ||
else: | ||
filter = {} | ||
result = await _do_async_pooled_queries(dataset, flattened, filter) | ||
|
||
results = [] | ||
|
@@ -316,13 +317,15 @@ async def _do_async_query( | |
filter: t.Optional[t.Mapping[str, str]], | ||
): | ||
if isinstance(query, DistinctQuery): | ||
if query.has_list and not query.filters: | ||
if query.has_list: | ||
return await _do_distinct_query(collection, query, filter) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure about mongo, but distinct scans in most dbs are really expensive and slow... have you confirmed that relying more on distinct improves performance? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A distinct scan with high cardinality is not good, yes. I am not aware of an aggregation pipeline that can return the first N sorted results using a multikey index, though. But we should investigate more |
||
|
||
return await _do_distinct_pipeline(dataset, collection, query, filter) | ||
|
||
if filter: | ||
query.insert(0, {"$match": filter}) | ||
for k, v in filter.items(): | ||
query.insert(0, {"$match": {k: v}}) | ||
query.insert(0, {"$sort": {k: 1}}) | ||
|
||
return [i async for i in collection.aggregate(query)] | ||
|
||
|
@@ -420,29 +423,19 @@ def _first( | |
): | ||
pipeline = [{"$sort": {path: sort}}] | ||
|
||
if floats: | ||
pipeline.extend(_handle_nonfinites(path, sort)) | ||
|
||
if sort: | ||
pipeline.append({"$match": {path: {"$ne": None}}}) | ||
|
||
matched_arrays = _match_arrays(dataset, path, is_frame_field) | ||
if matched_arrays: | ||
pipeline += matched_arrays | ||
elif floats: | ||
pipeline.extend(_handle_nonfinites(path, sort)) | ||
|
||
pipeline.append({"$limit": 1}) | ||
|
||
pipeline.extend([{"$match": {path: {"$exists": True}}}, {"$limit": 1}]) | ||
unwound = _unwind(dataset, path, is_frame_field) | ||
if unwound: | ||
pipeline += unwound | ||
if floats: | ||
pipeline.extend(_handle_nonfinites(path, sort)) | ||
|
||
if sort: | ||
pipeline.append({"$match": {path: {"$ne": None}}}) | ||
|
||
pipeline.append({"$sort": {path: sort}}) | ||
|
||
return pipeline + [ | ||
{ | ||
"$group": { | ||
|
@@ -513,8 +506,13 @@ def _match_arrays(dataset: fo.Dataset, path: str, is_frame_field: bool): | |
def _parse_result(data): | ||
if data and data[0]: | ||
value = data[0] | ||
if value.get("value", None) is not None: | ||
return value["value"] | ||
if "value" in value: | ||
value = value["value"] | ||
return ( | ||
value | ||
if not isinstance(value, float) or math.isfinite(value) | ||
else None | ||
) | ||
|
||
return value.get("_id", None) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if str is not a number (null?), we just show it as null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The formatter (
numeral().format
) returns anNaN
string when it can't meaningfully format given the precision provided ("0.00a"
). Punting on a better solution for now, this at least shows the value as opposed to NaN