Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add synthetic_source support to aggregate_metric_double fields #88909

Merged
merged 14 commits into from
Aug 1, 2022

Conversation

csoulios
Copy link
Contributor

@csoulios csoulios commented Jul 28, 2022

This PR implements synthetic_source support to the aggregate_metric_double field type

Relates to #86603

@elasticsearchmachine
Copy link
Collaborator

Hi @csoulios, I've created a changelog YAML for you.

@csoulios csoulios added :Search Foundations/Mapping Index mappings, including merging and defining field types and removed :StorageEngine/TSDB You know, for Metrics labels Jul 28, 2022
@csoulios csoulios marked this pull request as ready for review July 28, 2022 16:38
@csoulios csoulios requested a review from nik9000 July 28, 2022 16:38
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Jul 28, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

Comment on lines 712 to 726
public Leaf leaf(LeafReader reader, int[] docIdsInLeaf) throws IOException {
Map<Metric, SortedNumericDocValues> metricDocValues = new EnumMap<>(Metric.class);
for (Metric m : metrics) {
SortedNumericDocValues dv = dv(reader, m);
if (dv != null) {
metricDocValues.put(m, dv);
}
}

if (metricDocValues.isEmpty()) {
return SourceLoader.SyntheticFieldLoader.NOTHING_LEAF;
}

return new AggregateMetricSyntheticFieldLoader.ImmediateLeaf(metricDocValues);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find an easy way to leverage the NumericSyntheticFieldLoader class, so I had to rewrite the same implementation so that it uses one SortedNumericDocValues instance per metric.

I skipped the singleton optimization because having mutliple docs in leaf requires an array of values. This combined with multiple metrics per doc (a Map<Metric, SortedNumericDocValues> will make the code extremely complex. I can do it if we think this optimization is worth the complexity

Comment on lines 793 to 797
NumericDocValues single = reader.getNumericDocValues(fieldName);
if (single != null) {
return DocValues.singleton(single);
}
return null;
Copy link
Contributor Author

@csoulios csoulios Jul 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all metric subfields are stored using the SortedNumericDocValuesField, all doc values will be stored as SortedNumericDocValues. So, I am not sure reader.getNumericDocValues(fieldName) will ever return non-null values

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try removing it! I believe if you store single values it'll flip to NumericDocValues anyway.

If you do have to keep loading from both of there it's probably worth making this method static and public in NumericFieldMapper or something like that.

return;
}
b.startObject(simpleName);
for (Map.Entry<Metric, SortedNumericDocValues> entry : metricDocValues.entrySet()) {
Copy link
Contributor Author

@csoulios csoulios Jul 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metricDocValues is an EnumMap instance. This means that the order that sub fields will be returned in the source, will be the order in the Metric enum. Order ismin,max, sum, value_count. It is not alphabetical, but it will always be consistent. Does it make sense to change this so subfields are retuned in an alphabetical order?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's cool. Consistency is lovely for reading these docs. Alphabetical just gets the consistency in an easy way. No need to do it here.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few questions. I think this works fine though! I'm just poking around the edges, suggesting some maybe silly ideas. Either way, I think it's right.

@@ -292,6 +296,7 @@ private NumberFieldMapper.NumberFieldType delegateFieldType(Metric metric) {

/**
* Return a delegate field type for the default metric sub-field
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does your formatter do this? I think it's more standard, but I don't tend to do it myself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I undertand the question well, this has nothing to do with formatting the field in the output.

This has to do with the following:

  1. Querying the field: When running a term or a range query, the query is delegated to the subfield metric that has been marked as default_metric in the field mapping.
  2. When retrieving the field using the fields api, the value of the default_metric is returned (this will change when Support aggregate_double_metric fields in the Field API #88534 is merged)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, no, my question was about the extra line inserted into the javadoc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry! I wasn't clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it was me messing with IntelliJ formatting. Reverted it now.

Comment on lines 793 to 797
NumericDocValues single = reader.getNumericDocValues(fieldName);
if (single != null) {
return DocValues.singleton(single);
}
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try removing it! I believe if you store single values it'll flip to NumericDocValues anyway.

If you do have to keep loading from both of there it's probably worth making this method static and public in NumericFieldMapper or something like that.

return;
}
b.startObject(simpleName);
for (Map.Entry<Metric, SortedNumericDocValues> entry : metricDocValues.entrySet()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's cool. Consistency is lovely for reading these docs. Alphabetical just gets the consistency in an easy way. No need to do it here.

@csoulios
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/packaging-tests-unix-sample

@csoulios
Copy link
Contributor Author

csoulios commented Aug 1, 2022

@elasticmachine update branch

@csoulios csoulios mentioned this pull request Aug 1, 2022
50 tasks
@csoulios csoulios requested a review from nik9000 August 1, 2022 09:40
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a tiny thing about switching types on the tracked map. Otherwise LGTM.


@Override
public boolean empty() {
return metricDocValues.isEmpty();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is never empty, right? You could maybe just return false here. And In the ctor you could assert metricDocValues.empty == false

// it is enough to check for the first docValue. However, in the future
// we may relax the requirement of all metrics existing. In this case
// we should check the doc value for each metric separately
dvHasValue = new EnumMap<>(Metric.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use an EnumSet instead? And maybe clear it instead of recreate it?

dvHasValue.put(e.getKey(), leafHasValues);
}

return hasValues;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe dvHasValue.isEmpty() == false is enough here?

@csoulios csoulios merged commit ad2dc83 into elastic:main Aug 1, 2022
@csoulios csoulios deleted the agg-metric-synthetic-src branch August 1, 2022 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants