Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TSDB] Add support for downsampling aggregate_metric_double fields #90029

Merged
merged 29 commits into from
Sep 20, 2022

Conversation

csoulios
Copy link
Contributor

This PR adds support for downsampling metric fields of type aggregate_metric_double, enabling the rollup-of-rollups functionality.

@csoulios csoulios added :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data >non-issue :StorageEngine/TSDB You know, for Metrics labels Sep 13, 2022
@csoulios csoulios marked this pull request as ready for review September 13, 2022 17:08
@csoulios csoulios requested a review from nik9000 September 13, 2022 17:08
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 13, 2022
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

assert fieldType != null : "Unknown field type for field: [" + field + "]";
Map<String, FieldValueFetcher> fetchers = new HashMap<>();

if (fieldType instanceof AggregateDoubleMetricFieldMapper.AggregateDoubleMetricFieldType aggMetricFieldType) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this one of the threads you'd pull on to fix the other instanceof checks? If the field type built the fetcher - or if we had a Map<Class<FieldType>, FetcherFactory> or something?

}
}
} else {
if (context.fieldExistsInIndex(field)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include only the fields that exist in the index, instead of all mapped field.

This should improve downsampling performance for cases such as ECS schema.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java code looks fine to me. I'd love to plumb the instanceof checks through another way, but that's fine for a follow up change. I wish you luck fighting your gradle demons.

}

public TimeseriesFieldTypeHelper build(final String timestampField) throws IOException {
final MapperService mapperService = indicesService.createIndexMapperServiceForValidation(indexMetadata);
final CompressedXContent sourceIndexCompressedXContent = new CompressedXContent(indexMapping);
mapperService.merge(MapperService.SINGLE_MAPPING_NAME, sourceIndexCompressedXContent, MapperService.MergeReason.INDEX_TEMPLATE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this important?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. It moved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I moved it one level up so that I can reuse the MapperService instance

rollup module extends mapper-aggregate-metric instead of
importing it via `api`. This solves some Classloading
issues
from rollup build.gradle file
@csoulios
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/docs

Moved FieldValueFetcher in the mappers package so
that it can be retrieved through the MappedFieldType
and remove the instanceof clause
@csoulios
Copy link
Contributor Author

@nik9000 I have sorted out the classloading issue. I had to modify the rollup x-pack plugin so that it extends the x-pack-aggregate-metric plugin. This solved the problem. For more details see b39ee34

Finally, I made an effort to move the FieldValueFetchers inside the server module so that it is returned by MappedFieldType. As discussed, this removes the instanceof condtional, but spills the downsmpling logic in the mappers code. To be honest I am not sure what's best (or what's worse). Also, it does not remove the instanceof conditionals when creating MetricFieldProducer and LabelFieldProducer instances.
The change can be seen in 69664b0. If you don't like it, I am happy to revert it.

With all those changes, I think the PR is ready for one more review.
Thanks for all the iterations

@csoulios csoulios requested a review from nik9000 September 20, 2022 12:10
/**
* Create a helper class to fetch value field data.
*/
public FieldValueFetcher fieldValueFetcher(SearchExecutionContext context) {
Copy link
Contributor Author

@csoulios csoulios Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that naming this fieldValueFetcher looks confusing, especially when the method is right below a method named valueFetcher, but I did not have any better idea (maybe fieldDataFetcher, but again there's fieldDataBuilder). I am open to suggestions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep this in the rollup module, but this works for now. I can put together a followup PR with some ideas once this is merged. Or I can try. And if I fail then, well, I'll know.

/**
* Utility class used for fetching field values by reading field data
*/
class FieldValueFetcher {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class became an interface and moved into server module in the org.elasticsearch.index.mapper package.

The static methods were moved in the RollupShardIndexer class

final Map<String, FormattedDocValues> docValuesFetchers = new LinkedHashMap<>(fieldValueFetchers.size());
for (FieldValueFetcher fetcher : fieldValueFetchers) {
for (Map.Entry<String, FormattedDocValues> e : fetcher.getLeaves(ctx).entrySet()) {
if (searchContext.fieldExistsInIndex(e.getKey())) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If field does not exist in the index, it will be ignored even if it exists in the mapping. This should improve performance for use cases such as ECS

@nik9000
Copy link
Member

nik9000 commented Sep 20, 2022

@nik9000 I have sorted out the classloading issue. I had to modify the rollup x-pack plugin so that it extends the x-pack-aggregate-metric plugin. This solved the problem. For more details see b39ee34

got it. I don't really know if there is a "right" way to do this, but I'm just happy this works.

Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I do think it's be nice to keep the fetchers in the rollup module and I think I know how. But let's get this in and I can make a PR and we can see if it's a reasonable way.

/**
* Create a helper class to fetch value field data.
*/
public FieldValueFetcher fieldValueFetcher(SearchExecutionContext context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep this in the rollup module, but this works for now. I can put together a followup PR with some ideas once this is merged. Or I can try. And if I fail then, well, I'll know.

/**
* Interface for retrieving field values by reading field data
*/
public interface FieldValueFetcher {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's in server it's nice to have a very good name. Its a FormattedDocValuesFetcher or something. I dunno. Probably not worth changing it now. If I can't figure out a good way to move it back to the module then we should rename it.

@csoulios
Copy link
Contributor Author

I reverted the last change. I moved the FieldValueFetcher class back to the rollup module.I am going to merge this PR as is. Let's work on a following PR on how to remove the instanceof conditionals

@csoulios
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/bwc

@csoulios
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-1

@csoulios csoulios merged commit b9f20e9 into elastic:main Sep 20, 2022
@csoulios csoulios deleted the tsdb-rollup-of-rollups branch September 20, 2022 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants