Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSDB: Change _tsid field to SortedDocValuesField #83045

Merged
merged 7 commits into from
Jan 26, 2022

Conversation

csoulios
Copy link
Contributor

@csoulios csoulios commented Jan 25, 2022

Since _tsid cannot be a multi-value field, this PR modifies the TimeSeriesIdFieldMapper so that _tsid is added as a SortedDocValuesField (instead of a SortedSetDocValuesField)

Relates to #80276

@csoulios csoulios added :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics v8.1.0 labels Jan 25, 2022
@csoulios csoulios requested a review from romseygeek January 25, 2022 13:46

@Override
public SortField sortField(@Nullable Object missingValue, MultiValueMode sortMode, Nested nested, boolean reverse) {
SortField sortField = new SortField(getFieldName(), SortField.Type.STRING, reverse);
Copy link
Contributor Author

@csoulios csoulios Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if setting this to SortField.Type.STRING by default is ok.

If I leave it to the default SortField.Type.CUSTOM method IndexSortConfig#validateIndexSortField fails

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type.STRING is correct here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type.CUSTOM means that it's expecting you to give it a custom FieldComparator, which we don't need. It is a horrible API from about 2004 and I keep meaning to hack it out and replace it with something less awful...

@csoulios csoulios requested a review from jpountz January 25, 2022 19:04
@csoulios csoulios marked this pull request as ready for review January 25, 2022 19:04
@elasticmachine elasticmachine added Team:Search Meta label for search team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Jan 25, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left a couple of suggestions.


@Override
public SortField sortField(@Nullable Object missingValue, MultiValueMode sortMode, Nested nested, boolean reverse) {
SortField sortField = new SortField(getFieldName(), SortField.Type.STRING, reverse);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type.STRING is correct here.

SortField sortField = new SortField(getFieldName(), SortField.Type.STRING, reverse);
sortField.setMissingValue(
sortMissingLast(missingValue) ^ reverse ? SortedSetSortField.STRING_LAST : SortedSetSortField.STRING_FIRST
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect anything to be missing a _tsid field? If not we can just ignore setMissingValue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In time series indices _tsid field must always be present. I only left it because SortedOrdinalsIndexFieldData may possibly be used for other fields too.

@@ -93,7 +93,7 @@ protected boolean lessThan(LeafWalker a, LeafWalker b) {
this.collector.setScorer(scorer);
iterator = scorer.iterator();
docBase = context.docBase;
tsids = context.reader().getSortedSetDocValues(TimeSeriesIdFieldMapper.NAME);
tsids = context.reader().getSortedDocValues(TimeSeriesIdFieldMapper.NAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use DocValues.getSorted(context.reader(), TimeSeriesIdFieldMapper.NAME) here as it will protect you from NPEs and also automatically convert between SortedSet and Sorted if we need it.

@@ -106,7 +106,7 @@ boolean next() throws IOException {
docId = iterator.nextDoc();
if (docId != DocIdSetIterator.NO_MORE_DOCS && (liveDocs == null || liveDocs.get(docId))) {
if (tsids.advanceExact(docId)) {
BytesRef tsid = tsids.lookupOrd(tsids.nextOrd());
BytesRef tsid = tsids.lookupOrd(tsids.ordValue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can replace this with an ord comparison now that we know there's a single value, but let's do that in a follow-up.


@Override
public LeafOrdinalsFieldData load(LeafReaderContext context) {
return new SortedSetBytesLeafFieldData(context.reader(), getFieldName(), toScriptField);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth a comment here that SortedSetBytesLeafFieldData will happily load SortedDocValues as well, so there's no need for a single-valued implementation.


@Override
public SortField sortField(@Nullable Object missingValue, MultiValueMode sortMode, Nested nested, boolean reverse) {
SortField sortField = new SortField(getFieldName(), SortField.Type.STRING, reverse);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type.CUSTOM means that it's expecting you to give it a custom FieldComparator, which we don't need. It is a horrible API from about 2004 and I keep meaning to hack it out and replace it with something less awful...

@csoulios csoulios requested a review from romseygeek January 26, 2022 12:06
Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@csoulios csoulios merged commit 40cab17 into elastic:master Jan 26, 2022
@csoulios csoulios deleted the tsid-dv-field branch January 26, 2022 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :Search Foundations/Mapping Index mappings, including merging and defining field types :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Meta label for search team v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants