Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSDB: Add _tsid field to time_series indices #80276

Merged
merged 47 commits into from
Nov 29, 2021
Merged

Conversation

csoulios
Copy link
Contributor

@csoulios csoulios commented Nov 3, 2021

This PR adds support for a field named _tsid that uniquely identifies the time series a document belongs to.

When a document is indexed in a time series index (IndexMode.TIME_SERIES), _tsid field is generated from the values of all dimension fields.

@csoulios csoulios added :StorageEngine/TSDB You know, for Metrics v8.1.0 labels Nov 3, 2021
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Nov 3, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@csoulios csoulios marked this pull request as draft November 3, 2021 13:19
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked on video a bit but I had already started typing stuff. This makes public what we talked about.

@weizijun
Copy link
Contributor

weizijun commented Nov 8, 2021

Will index sorting settings about dealing _tsid added in this PR? Or create another PR do the feature?

@csoulios
Copy link
Contributor Author

csoulios commented Nov 8, 2021

@weizijun this PR is work in progress. I need to add more tests and complete the index sorting part before it is ready to be merged.

to IndexMode so that it is only created in time_series mode
Copy link
Member

@nik9000 nik9000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized a thing about encoding these even if we're not using them. I think it's rare that a field will be a dimension outside of time series mode but it's possible. We should probably come up with a scheme to skip those fields....

new FieldSortSpec(TimeSeriesIdFieldMapper.NAME),
new FieldSortSpec(DataStreamTimestampFieldMapper.DEFAULT_PATH) };
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably use the same "move it into a method on IndexMode" treatment that the builder used. But that can wait for a follow up I think.

if (dimension) {
// Extract the tsid part of the dimension field
BytesReference bytes = TimeSeriesIdFieldMapper.extractTsidValue(NetworkAddress.format(address));
context.doc().addDimensionBytes(fieldType().name(), bytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about.... Can't you declare something a dimension without enabling the tsid field? I wonder if we should avoid encoding it if the field isn't present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ip and numeric field types, we can skip the encodeTsidValue part if _tsid field is not going to be generated.

However, for keyword fields we still must do the encoding, because must validate the string length.

@nik9000
Copy link
Member

nik9000 commented Nov 25, 2021 via email

@csoulios
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/eql-correctness

@csoulios
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-2

@csoulios csoulios merged commit c0b4b60 into elastic:master Nov 29, 2021
@csoulios csoulios deleted the tsid-gen branch November 29, 2021 10:44
csoulios added a commit that referenced this pull request Dec 9, 2021
This is a follow up to #80276
Relates to #74660

Co-authored-by: Nik Everett <nik9000@gmail.com>
csoulios added a commit that referenced this pull request Dec 9, 2021
This PR builds on the work added in #80276 that generates the _tsid field for keyword, ip and number dimension fields.

It adds support for unsigned_long dimension fields.
@wchaparro wchaparro assigned csoulios and unassigned csoulios Dec 16, 2021
csoulios added a commit that referenced this pull request Jan 26, 2022
Since _tsid cannot be a multi-value field, this PR modifies the TimeSeriesIdFieldMapper 
so that _tsid is added as a SortedDocValuesField (instead of a SortedSetDocValuesField)

Relates to #80276
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants