Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add synthetic_source support to aggregate_metric_double fields #88909

Merged
merged 14 commits into from
Aug 1, 2022
Merged
5 changes: 5 additions & 0 deletions docs/changelog/88909.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 88909
summary: Add `synthetic_source` support to `aggregate_metric_double` fields
area: Mapping
type: enhancement
issues: []
1 change: 1 addition & 0 deletions docs/reference/mapping/fields/synthetic-source.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ space. There are a couple of restrictions to be aware of:
* Synthetic `_source` can be used with indices that contain only these field
types:

** <<aggregate-metric-double-synthetic-source, `aggregate_metric_double`>>
** <<boolean-synthetic-source,`boolean`>>
** <<numeric-synthetic-source,`byte`>>
** <<numeric-synthetic-source,`double`>>
Expand Down
52 changes: 52 additions & 0 deletions docs/reference/mapping/types/aggregate-metric-double.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -251,3 +251,55 @@ The search returns the following hit. The value of the `default_metric` field,
}
----
// TESTRESPONSE[s/\.\.\./"took": $body.took,"timed_out": false,"_shards": $body._shards,/]

ifeval::["{release-state}"=="unreleased"]
[[aggregate-metric-double-synthetic-source]]
==== Synthetic source
`aggregate_metric-double` fields support <<synthetic-source,synthetic `_source`>> in their default
configuration. Synthetic `_source` cannot be used together with <<ignore-malformed,`ignore_malformed`>>.

For example:
[source,console,id=synthetic-source-aggregate-metric-double-example]
----
PUT idx
{
"mappings": {
"_source": { "mode": "synthetic" },
"properties": {
"agg_metric": {
"type": "aggregate_metric_double",
"metrics": [ "min", "max", "sum", "value_count" ],
"default_metric": "max"
}
}
}
}

PUT idx/_doc/1
{
"agg_metric": {
"min": -302.50,
"max": 702.30,
"sum": 200.0,
"value_count": 25
}
}
----
// TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]

Will become:

[source,console-result]
----
{
"agg_metric": {
"min": -302.50,
"max": 702.30,
"sum": 200.0,
"value_count": 25
}
}
----
// TEST[s/^/{"_source":/ s/\n$/}/]

endif::[]
Original file line number Diff line number Diff line change
Expand Up @@ -253,6 +253,13 @@ protected boolean supportsMeta() {
return true;
}

/**
* Override to disable testing {@code copy_to} in fields that don't support it.
*/
protected boolean supportsCopyTo() {
return true;
}

protected void metaMapping(XContentBuilder b) throws IOException {
minimalMapping(b);
}
Expand Down Expand Up @@ -886,15 +893,17 @@ public final void testSyntheticEmptyList() throws IOException {

public final void testSyntheticSourceInvalid() throws IOException {
List<SyntheticSourceInvalidExample> examples = new ArrayList<>(syntheticSourceSupport().invalidExample());
examples.add(
new SyntheticSourceInvalidExample(
matchesPattern("field \\[field] of type \\[.+] doesn't support synthetic source because it declares copy_to"),
b -> {
syntheticSourceSupport().example(5).mapping().accept(b);
b.field("copy_to", "bar");
}
)
);
if (supportsCopyTo()) {
examples.add(
new SyntheticSourceInvalidExample(
matchesPattern("field \\[field] of type \\[.+] doesn't support synthetic source because it declares copy_to"),
b -> {
syntheticSourceSupport().example(5).mapping().accept(b);
b.field("copy_to", "bar");
}
)
);
}
for (SyntheticSourceInvalidExample example : examples) {
Exception e = expectThrows(
IllegalArgumentException.class,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@
import org.apache.lucene.index.DocValues;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexableField;
import org.apache.lucene.index.LeafReader;
import org.apache.lucene.index.LeafReaderContext;
import org.apache.lucene.index.NumericDocValues;
import org.apache.lucene.index.SortedNumericDocValues;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.SortField;
Expand All @@ -32,6 +34,7 @@
import org.elasticsearch.index.mapper.MapperBuilderContext;
import org.elasticsearch.index.mapper.NumberFieldMapper;
import org.elasticsearch.index.mapper.SimpleMappedFieldType;
import org.elasticsearch.index.mapper.SourceLoader;
import org.elasticsearch.index.mapper.SourceValueFetcher;
import org.elasticsearch.index.mapper.TextSearchInfo;
import org.elasticsearch.index.mapper.TimeSeriesParams;
Expand Down Expand Up @@ -284,6 +287,7 @@ public AggregateDoubleMetricFieldType(String name, Map<String, String> meta, Met

/**
* Return a delegate field type for a given metric sub-field
*
* @return a field type
*/
private NumberFieldMapper.NumberFieldType delegateFieldType(Metric metric) {
Expand All @@ -292,6 +296,7 @@ private NumberFieldMapper.NumberFieldType delegateFieldType(Metric metric) {

/**
* Return a delegate field type for the default metric sub-field
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does your formatter do this? I think it's more standard, but I don't tend to do it myself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I undertand the question well, this has nothing to do with formatting the field in the output.

This has to do with the following:

  1. Querying the field: When running a term or a range query, the query is delegated to the subfield metric that has been marked as default_metric in the field mapping.
  2. When retrieving the field using the fields api, the value of the default_metric is returned (this will change when Support aggregate_double_metric fields in the Field API #88534 is merged)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, no, my question was about the extra line inserted into the javadoc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry! I wasn't clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it was me messing with IntelliJ formatting. Reverted it now.

* @return a field type
*/
private NumberFieldMapper.NumberFieldType delegateFieldType() {
Expand Down Expand Up @@ -509,6 +514,7 @@ protected Object parseSourceValue(Object value) {

/**
* If field is a time series metric field, returns its metric type
*
* @return the metric type or null
*/
public MetricType getMetricType() {
Expand All @@ -524,13 +530,19 @@ public MetricType getMetricType() {

private final Version indexCreatedVersion;

/** A set of metrics supported */
/**
* A set of metrics supported
*/
private final EnumSet<Metric> metrics;

/** The default metric to be when querying this field type */
/**
* The default metric to be when querying this field type
*/
protected Metric defaultMetric;

/** The metric type (gauge, counter, summary) if field is a time series metric */
/**
* The metric type (gauge, counter, summary) if field is a time series metric
*/
private final TimeSeriesParams.MetricType metricType;

private AggregateDoubleMetricFieldMapper(
Expand Down Expand Up @@ -574,7 +586,6 @@ public Iterator<Mapper> iterator() {

@Override
protected void parseCreateField(DocumentParserContext context) throws IOException {

context.path().add(simpleName());
XContentParser.Token token;
XContentSubParser subParser = null;
Expand Down Expand Up @@ -675,4 +686,115 @@ protected void parseCreateField(DocumentParserContext context) throws IOExceptio
public FieldMapper.Builder getMergeBuilder() {
return new Builder(simpleName(), ignoreMalformedByDefault, indexCreatedVersion).metric(metricType).init(this);
}

@Override
public SourceLoader.SyntheticFieldLoader syntheticFieldLoader() {
if (ignoreMalformed) {
throw new IllegalArgumentException(
"field [" + name() + "] of type [" + typeName() + "] doesn't support synthetic source because it ignores malformed numbers"
);
}
return new AggregateMetricSyntheticFieldLoader(name(), simpleName(), metrics);
}

public static class AggregateMetricSyntheticFieldLoader implements SourceLoader.SyntheticFieldLoader {
private final String name;
private final String simpleName;
private final EnumSet<Metric> metrics;

protected AggregateMetricSyntheticFieldLoader(String name, String simpleName, EnumSet<Metric> metrics) {
this.name = name;
this.simpleName = simpleName;
this.metrics = metrics;
}

@Override
public Leaf leaf(LeafReader reader, int[] docIdsInLeaf) throws IOException {
Map<Metric, SortedNumericDocValues> metricDocValues = new EnumMap<>(Metric.class);
for (Metric m : metrics) {
SortedNumericDocValues dv = dv(reader, m);
if (dv != null) {
metricDocValues.put(m, dv);
}
}

if (metricDocValues.isEmpty()) {
return SourceLoader.SyntheticFieldLoader.NOTHING_LEAF;
}

return new AggregateMetricSyntheticFieldLoader.ImmediateLeaf(metricDocValues);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find an easy way to leverage the NumericSyntheticFieldLoader class, so I had to rewrite the same implementation so that it uses one SortedNumericDocValues instance per metric.

I skipped the singleton optimization because having mutliple docs in leaf requires an array of values. This combined with multiple metrics per doc (a Map<Metric, SortedNumericDocValues> will make the code extremely complex. I can do it if we think this optimization is worth the complexity


private class ImmediateLeaf implements Leaf {
private final Map<Metric, SortedNumericDocValues> metricDocValues;
private Map<Metric, Boolean> dvHasValue;

private boolean hasValues;

ImmediateLeaf(Map<Metric, SortedNumericDocValues> metricDocValues) {
this.metricDocValues = metricDocValues;
}

@Override
public boolean empty() {
return false;
}

@Override
public boolean advanceToDoc(int docId) throws IOException {
// It is required that all defined metrics must exist. In this case
// it is enough to check for the first docValue. However, in the future
// we may relax the requirement of all metrics existing. In this case
// we should check the doc value for each metric separately
dvHasValue = new EnumMap<>(Metric.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use an EnumSet instead? And maybe clear it instead of recreate it?

hasValues = false;
for (Map.Entry<Metric, SortedNumericDocValues> e : metricDocValues.entrySet()) {
boolean b = e.getValue().advanceExact(docId);
hasValues = hasValues || b;
dvHasValue.put(e.getKey(), b);
}

return hasValues;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe dvHasValue.isEmpty() == false is enough here?

}

@Override
public void write(XContentBuilder b) throws IOException {
if (false == hasValues) {
return;
}
b.startObject(simpleName);
for (Map.Entry<Metric, SortedNumericDocValues> entry : metricDocValues.entrySet()) {
Copy link
Contributor Author

@csoulios csoulios Jul 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metricDocValues is an EnumMap instance. This means that the order that sub fields will be returned in the source, will be the order in the Metric enum. Order ismin,max, sum, value_count. It is not alphabetical, but it will always be consistent. Does it make sense to change this so subfields are retuned in an alphabetical order?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's cool. Consistency is lovely for reading these docs. Alphabetical just gets the consistency in an easy way. No need to do it here.

if (dvHasValue.get(entry.getKey())) {
String metricName = entry.getKey().name();
long value = entry.getValue().nextValue();
if (entry.getKey() == Metric.value_count) {
b.field(metricName, value);
} else {
b.field(metricName, NumericUtils.sortableLongToDouble(value));
}
}
}
b.endObject();
}
}

/**
* Returns a {@link SortedNumericDocValues} or null if it doesn't have any doc values.
* See {@link DocValues#getSortedNumeric} which is *nearly* the same, but it returns
* an "empty" implementation if there aren't any doc values. We need to be able to
* tell if there aren't any and return our empty leaf source loader.
*/
private SortedNumericDocValues dv(LeafReader reader, Metric metric) throws IOException {
String fieldName = subfieldName(name, metric);
SortedNumericDocValues dv = reader.getSortedNumericDocValues(fieldName);
if (dv != null) {
return dv;
}
NumericDocValues single = reader.getNumericDocValues(fieldName);
if (single != null) {
return DocValues.singleton(single);
}
return null;
Copy link
Contributor Author

@csoulios csoulios Jul 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all metric subfields are stored using the SortedNumericDocValuesField, all doc values will be stored as SortedNumericDocValues. So, I am not sure reader.getNumericDocValues(fieldName) will ever return non-null values

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try removing it! I believe if you store single values it'll flip to NumericDocValues anyway.

If you do have to keep loading from both of there it's probably worth making this method static and public in NumericFieldMapper or something like that.

}
}
}
Loading