Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSDB: Support GET and DELETE and doc versioning #82633

Merged
merged 91 commits into from
Mar 10, 2022
Merged

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Jan 14, 2022

This adds support for GET and DELETE and the ids query and
Elasticsearch's standard document versioning to TSDB. So you can do
things like:

POST /tsdb_idx/_doc?filter_path=_id
{
  "@timestamp": "2021-12-29T19:25:05Z", "uid": "adsfadf", "v": 1.2
}

That'll return {"_id" : "BsYQJjqS3TnsUlF3aDKnB34BAAA"} which you can turn
around and fetch with

GET /tsdb_idx/_doc/BsYQJjqS3TnsUlF3aDKnB34BAAA

just like any other document in any other index. You can delete it too!
Or fetch it.

The ID comes from the dimensions and the @timestamp. So you can
overwrite the document:

POST /tsdb_idx/_bulk
{"index": {}}
{"@timestamp": "2021-12-29T19:25:05Z", "uid": "adsfadf", "v": 1.2}

Or you can write only if it doesn't already exist:

POST /tsdb_idx/_bulk
{"create": {}}
{"@timestamp": "2021-12-29T19:25:05Z", "uid": "adsfadf", "v": 1.2}

This works by generating an id from the dimensions and the @timestamp
when parsing the document. The id looks like:

  • 4 bytes of hash from the routing calculated from routing_path fields
  • 8 bytes of hash from the dimensions
  • 8 bytes of timestamp
    All that's base 64 encoded so that Uid can chew on it fairly
    efficiently.

When it comes time to fetch or delete documents we base 64 decode the id
and grab the routing from the first four bytes. We use that hash to pick
the shard. Then we use the entire ID to perform the fetch or delete.

We don't implement update actions because we haven't written the
infrastructure to make sure the dimensions don't change. It's possible
to do, but feels like more than we need now.

There ton of compromises with this. The long term sad thing is that it
locks us into indexing the id of the sample. It'll index fairly
efficiently because the each time series will have the same first eight
bytes. It's also possible we'd share many of the first few bytes in the
timestamp as well. In our tsdb rally track this costs 8.75 bytes per
document. It's substantial, but not overwhelming.

In the short term there are lots of problems that I'd like to save for a
follow up change:

  1. We still generate the automatic _id for the document but we don't use
    it. We should stop generating it.
    Included in this PR based on review comments.
  2. We generated the time series _id on each shard and when replaying
    the translog. It'd be the good kind of paranoid to generate it once
    on the primary and then keep it forever.
  3. We have to encode the _id as a string to pass it around
    Elasticsearch internally. And Elasticsearch assumes that when an id
    is loaded we always store as bytes encoded the Uid - which does
    have nice encoding for base 64 bytes. But this whole thing requires
    us to make the bytes, base 64 encode them, and then hand them back to
    Uid to base 64 decode them into bytes. It's a bit hacky. And, it's
    a small thing, but if the first byte of the routing hash encodes to
    254 or 255 we Uid spends an extra byte to encode it. One that'll
    always be a common prefix for tsdb indices, but still, it hurts my
    heart. It's just hard to fix.
  4. We store the _id in Lucene stored fields for tsdb indices. Now
    that we're building it from the dimensions and the @timestamp we
    really don't need to store it. We could recalculate it when fetching
    documents. In the tsdb rall ytrick this'd save us 6 bytes per document
    at the cost of marginally slower fetches. Which is fine.
  5. There are several error messages that try to use _id right now
    during parsing but the _id isn't available until after the parsing
    is complete. And, if parsing fails, it may not be possible to know
    the id at all. All of these error messages will have to change,
    at least in tsdb mode.
  6. If you specify an _id on the request right now we just overwrite
    it. We should send you an error.
    Included in this PR after review comments.
  7. We have to entirely disable the append-only optimization that allows
    Elasticsearch to skip looking up the ids in lucene. This halves
    indexing speed. It's substantial. We have to claw that optimization
    back somehow. Something like sliding bloom filters or relying on
    the increasing timestamps.
  8. We parse the source from json when building the routing hash when
    parsing fields. We should just build it from to parsed field values.
    It looks like that'd improve indexing speed by about 20%.
  9. Right now we write the @timestamp little endian. This is likely bad
    the prefix encoded inverted index. It'll prefer big endian. Might shrink it.
  10. Improve error message on version conflict to include tsid and timestamp.
  11. Improve error message when modifying dimensions or timestamp in update_by_query
  12. Make it possible to modify dimension or timestamp in reindex.
  13. Test TSDB's _id in RecoverySourceHandlerTests.java and EngineTests.java.

I've had to make some changes as part of this that don't feel super
expected. The biggest one is changing Engine.Result to include the
id. When the id comes from the dimensions it is calculated by the
document parsing infrastructure which is happens in
IndexShard#pepareIndex. Which returns an Engine.IndexResult. To make
everything clean I made it so id is available on all Engine.Results
and I made all of the "outer results classes" read from
Engine.Results#id. I'm not excited by it. But it works and it's what
we're going with.

I've opted to create two subclasses of IdFieldMapper, one for standard
indices and one for tsdb indices. This feels like the right way to
introduce the distinction, especially if we don't want tsdb to cary
around it's old fielddata support. Honestly if we need to aggregate on
_id in tsdb mode we have doc values for the tsdb and the
@timestamp - we could build doc values for _id on the fly. But I'm
not expecting folks will need to do this. Also! I'd like to stop storing
tsdb'd _id field (see number 4 above) and the new subclass feels like
a good place to put that too.

@nik9000 nik9000 marked this pull request as draft January 14, 2022 19:38
This adds support for GET and DELETE and the ids query and
Elasticsearch's standard document versioning to TSDB. So you can do
things like:
```
POST /tsdb_idx/_doc?filter_path=_id
{
  "@timestamp": "2021-12-29T19:25:05Z", "uid": "adsfadf", "v": 1.2
}
```

That'll return `{"_id" : "22d7YQAAAABoMqcHfgEAAA"}` which you can turn
around and fetch with
```
GET /tsdb_idx/_doc/22d7YQAAAABoMqcHfgEAAA
```
just like any other document in any other index. You can delete it too!
Or fetch it.

The ID comes from the dimensions and the `@timestamp`. So you can
overwrite the document:
```
POST /tsdb_idx/_bulk
{"index": {}}
{"@timestamp": "2021-12-29T19:25:05Z", "uid": "adsfadf", "v": 1.2}
```

Or you can write only if it doesn't already exist:
```
POST /tsdb_idx/_bulk
{"create": {}}
{"@timestamp": "2021-12-29T19:25:05Z", "uid": "adsfadf", "v": 1.2}
```

This works by generating an id from the dimensions and the `@timestamp`
when parsing the document. The id looks like:
* 4 bytes of hash from the routing dimensions
* 4 bytes of hash from the non-routing dimensions
* 8 bytes of timestamp
All that's base 64 encoded so that `Uid` can chew on it fairly
efficiently.

When it comes time to fetch or delete documents we base 64 decode the id
and grab the hash form the routing dimensions. We use that hash to pick
the shard. Then we use the entire ID to perform the fetch or delete.

We don't implement update actions because we haven't written the
infrastructure to make sure the dimensions don't change. It's possible
to do, but feels like more than we need now.

There *ton* of compromises with this. The long term sad thing is that it
locks us into *indexing* the id of the sample. It'll index fairly
efficiently because the each time series will have the same first eight
bytes. It's also possible we'd share many of the first few bytes in the
timestamp as well. So, if we're lucky, we're really only paying, say,
six bytes per document for this. But that's six bytes we can't give up
easily.

In the short term there are lots of problems that I'd like to save for a
follow up change:
1. We still generate the automatic `_id` for the document but we don't use
   it. We should stop generating it.
2. We generated the time series `_id` on each shard and when replaying
   the translog. It'd be the good kind of paranoid to generate it once
   on the primary and then keep it forever.
3. We have to encode the `_id` as a string to pass it around
   Elasticsearch internally. And Elasticsearch assumes that when an id
   is loaded we always store as bytes encoded the `Uid` - which *does*
   have nice encoding for base 64 bytes. But this whole thing requires
   us to make the bytes, base 64 encode them, and then hand them back to
   `Uid` to base 64 decode them into bytes. It's a bit hacky. And, it's
   a small thing, but if the first byte of the routing hash encodes to
   254 or 255 we `Uid` spends an extra byte to encode it. One that'll
   always be a common prefix for tsdb indices, but still, it hurts my
   heart. It's just hard to fix.
4. We store the `_id` in tsdb indices. Now that we're building it from
   the dimensions and the `@timestamp` we really don't *need* to store
   it. We could recalculate it when fetching documents. This could save
   us a few bytes of storage per document. 6? 10? I dunno, it depends
   how well the compression of stored fields manages.
5. There are several error messages that try to use `_id` right now
   during parsing but the `_id` isn't available until after the parsing
   is complete. And, if parsing fails, it may not be possible to know
   the id at all. All of these error messages will have to change,
   at least in tsdb mode.

I've had to make some changes as part of this that don't feel super
expected. The biggest one is changing `Engine.Result` to include the
`id`. When the `id` comes from the dimensions it is calculated by the
document parsing infrastructure which is happens in
`IndexShard#pepareIndex`. Which returns an `Engine.IndexResult`. To make
everything clean I made it so `id` is available on all `Engine.Result`s
and I made all of the "outer results classes" read from
`Engine.Results#id`. Another option, I think, would have been to change
the results objects produced by `IndexShard` new objects that have the
`id` in them. This may very well be the righ thing to do. I'm not sure.
Another option would have been to do a pass over the data to ge the `id`
first and then another to get the data. That feels like overkill though.

I've had to change the routing calculation for tsdb indices from
something clever to something a little simple to calculate from the
parsed values. It's *possible* to keep the old routing algorithm, but
it'd be complex and, frankly, the old algorithm felt a little too clever
on reread and I didn't want to try to back into it. Another option would
have been to store the routing as calculated on the coordinating node
and read it on the primary when making the id. This felt pretty heavy.
We'd have to add the `IndexRequest` or fake it into the `routing` field
of the index request. Both just feel silly when the bytes are already
available, already parsed, we just have to hash them.

I've opted to create two subclasses of `IdFieldMapper`, one for standard
indices and one for tsdb indices. This feels like the right way to
introduce the distinction, especially if we don't want tsdb to cary
around it's old fielddata support. Honestly if we *need* to aggregate on
`_id` in tsdb mode we have doc values for the `tsdb` and the
`@timestamp` - we could build doc values for `_id` on the fly. But I'm
not expecting folks will need to do this. Also! I'd like to stop storing
tsdb'd `_id` field (see number 4 above) and the new subclass feels like
a good place to put that too.
@nik9000
Copy link
Member Author

nik9000 commented Jan 18, 2022

run elasticsearch-ci/part-1

@nik9000 nik9000 marked this pull request as ready for review January 18, 2022 20:06
@nik9000 nik9000 added :StorageEngine/TSDB You know, for Metrics >feature labels Jan 18, 2022
@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jan 18, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

if (entry.getValue().isRoutingDimension()) {
routingHash = 31 * routingHash + thisHash;
} else {
nonRoutingHash = 31 * nonRoutingHash + thisHash;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I talked with @henningandersen about this one and he linked me to https://preshing.com/20110504/hash-collision-probabilities/ . We had some brainstorming. In the worst case the nonRoutingHash and the routingHash are entirely correlated -imagine we route on data_center and use ip a the only other dimension. It's not great, but it could happen. In that case the only things saving us from unexpected _id collisions is the nonRoutingHash and the timer. And some folks are going to have garbage resolution timers.

So in the worst case the odds of an _id collision are entirely bases on the odds of collision on nonRoutingHash. And, that's a birthday problem. Or, so says the link Henning shared. I buy that. That link has a handy table. For the odds of any two hashes colliding, assuming this hashing algorithm is perfect, depend on the number of unique tsids and the number of bits in the hash. For the 32 bit hash I'm using here it takes 30,000 to get about 1:10 odds of any two colliding. That seems bad.

So I'll have to think more here. A 64 bit hash would be a lot better - 1:10 odds takes 1.97 billion unique tsid. Maybe that'd be enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a hash, it'd be nice if we could use a lookup table to sequentially assign the tsids. Lucene already has a lookup table for global ordinals, but it's global ordinals, an expensive query time thing. It has isn't really for this. It's in the wrong place and has the wrong constraints. We'd have to build something else. And that'd need replication semantics and all that jazz. So, possible, but I think the path of least resistance is to go with a 64 bit hash and live with the rare chance of collisions for a while. I'd love to replace it with something fancier, but that's a bigger project I think.

* stored, but we need to keep it so that its FieldType can be used to generate
* queries.
*/
public class TsdbIdFieldMapper extends IdFieldMapper {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should call this TimeSeriesIdFieldMapper to keep it consistent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TimeSeriesIdFieldMapper class already exists. It generates the _tsid

Naming is hard :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine my confusing when I tried to rename the class to that and it didn't work. I tried like three times before reading the error message. So I did rename the class, but I'm not super happy with the names. I'm happy to take suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the TimeSeriesModeIdFieldMapper approach. Maybe I would rename StandardIdFieldMapper to StandardModeIdFieldMapper. I know it looks too long, but it's more consistent.

Alternatively, I'd also like the TimeSeriesIndexIdFieldMapper vs DefaultIdFieldMapper (I prefer default to standard`).

@weizijun
Copy link
Contributor

I see that _id is constructed from non-routing dimensions, routing dimensions and timestamp.
it will use non-routing dimensions to route the doc to the dest shard.
KeywordFieldMapper PARSER use the Regex.simpleMatch method to filter the the non-routing dimensions fields.
But the Regex.simpleMatch action is not the same with Xcontent filter when deal with escape character.
Here is a test case:

public void testTsdbIdException() throws IOException {
        Version version = VersionUtils.randomIndexCompatibleVersion(random());
        Settings settings = Settings.builder()
            .put(IndexSettings.MODE.getKey(), "time_series")
            .put(IndexMetadata.SETTING_VERSION_CREATED, version)
            .put(IndexMetadata.SETTING_NUMBER_OF_SHARDS, between(1, 100))
            .put(IndexSettings.TIME_SERIES_START_TIME.getKey(), "-9999-01-01T00:00:00Z")
            .put(IndexSettings.TIME_SERIES_END_TIME.getKey(), "9999-01-01T00:00:00Z")
            .put(IndexMetadata.INDEX_ROUTING_PATH.getKey(), "o.*")
            .build();
        MapperService mapperService = createMapperService(settings, mapping(b -> {
            b.startObject("o.f1").field("type", "keyword").field("time_series_dimension", true).endObject();
            b.startObject("o").startObject("properties");
            {
                b.startObject("f2").field("type", "keyword").field("time_series_dimension", true).endObject();
            }
            b.endObject().endObject();
        }));
        String f1 = randomAlphaOfLength(12);
        String f2 = randomAlphaOfLength(12);
        parse(mapperService, b -> {
            b.field("@timestamp", "2022-01-01T01:00:00Z");
            b.field("o.f1", f1);
            b.startObject("o");
            {
                b.field("f2", f2);
            }
            b.endObject();
        }).id();
    }

It failed in the TsdbIdFieldMapper.postParse method's below line:

        assert shardFromSource(context.indexSettings().getIndexMetadata(), context.sourceToParse()) == shardFromRoutingHash(
            context.indexSettings().getIndexMetadata(),
            routingHash
        );

I'm in deep thinking of the index.routing_path setting.
First, it doesn't have the index.routing_path setting, and I found out that the routings are different in the same docs, as the first doc hit the dynamic mapping rule, it add a new field in dimensions, and the second same doc generate a different routing, the test case is in nik9000#9.

So you add a new setting, named index.routing_path, it separate the routing and generate the _tsid.
it's ok to separate routing and _tsid. But when _id is generated by non-routing dimensions, routing dimensions, the taste starts to change.

is it needed to make the fields of index.routing_path's different from the dimensions fields?
If id and routing need to be different, I prefer to separate the routing and id logic. The _id is an internal value built by _tsid and timestamp.
And the routing is a custom action, the user needs to pass the routing value to route to dest shard.

But how to deal with the initial dynamic mapping problem, I have an initial idea.
_tsid and _id can be generated at coordinate node. It add the mappings's properties field that configured time_series_dimension and the dynamic_templates rule that has configured time_series_dimension to a list, the list is the same with index.routing_path function.
Since old fields always hit the property field, the XContent filter can match the none-wildcard field in O(1) cost.

And If the user has a custom routing, it use the index.routing_path to config the routing field, the routing field must configured time_series_dimension, and the rule can be like what I mention in the #82511,

support custom routing search:

  • if user set only one field routing_path, it can search with: GET xxx/_search?routing=xx
  • if user set multi fields in routing_path, it can search with: GET xxx/_search?routing=filed1:xxx|field2:xxx|field3:xxx
  • if user set wildcard fields in routing_path, it can reject the routing parameter

@nik9000
Copy link
Member Author

nik9000 commented Jan 20, 2022

Here is a test case:

Ooof. I see it, yeah. Our dots thing. And it lines up with the concerns you had earlier. Got it. I'll have a think.....

I'm in deep thinking of the index.routing_path setting.

I think what you are proposing sounds a lot like my initial try. Like you said, the issue was dynamic mappings. And I think what you are proposing isn't something we want on the coordinating node because it's too heavy. That's most of the parsing stage. I really would like to replace routing_path, but I couldn't think of anything better. I agree its kind of janky.

I don't feel to bad about mixing the _id and routing data. That's how Elasticsearch works by default - if you don't specify _routing it just uses the ID for the routing. This PR is in that spirit - it uses a part of the _id for the routing.

Now! I see you are concerned about supporting explicit routing for tsdb data - that's something I'd mostly hoped we wouldn't have to do. That whatever folks were doing with custom routing before they'd use a dimension field for in the future. I think we should talk more about that. Would it be ok to use a dimension to get similar behavior? Or are you relying on the grouping aspect more strongly?

@weizijun
Copy link
Contributor

Would it be ok to use a dimension to get similar behavior? Or are you relying on the grouping aspect more strongly?

I'm a little confused about this question. Does this mean that the routing should be one dimension and not many dimensions?
Yes, In our internal use case, only one dimension is ok.
We collect all Alibaba Cloud elasticsearch monitoring data in a datastream index, number_of_shards is set to around 100 or more.
Since user always search the monitoring indices by tenant_id or cluster_id, so we use the tenant_id as the routing value. Unfortunately we also use the index.routing_partition_size setting, In order to avoid some large tenant_id's data is too large. It causes the time_series data to not only be in one shard, but it works fine with no mechanism problem.

tsdb not supporting index.routing_partition_size is not a problem for us. As we can increase the frequency of rollover to make number_of_shards smaller.
But not supporting routing maybe lead to performance decrease. Maybe many users are facing the same question.

I think what you are proposing sounds a lot like my initial try. Like you said, the issue was dynamic mappings. And I think what you are proposing isn't something we want on the coordinating node because it's too heavy. That's most of the parsing stage. I really would like to replace routing_path, but I couldn't think of anything better. I agree its kind of janky.

I think the routing_path setting can be an internal setting, it is build of mapping, not by user. The routing_path is build of mappings's the dynamic_templates and properties field which configured time_series_dimension, if updating mapping changes the time_series_dimension field, the routing_path is also updated to the new value.

How to handle custom routing requirements? I thought of two ways:

  1. Users can pass routing field, just as the standard index, but tt maybe cause the same _tsid docs not in the same shard.The choice is in the hands of users. It can work fine, no mechanism problem. Although not efficient.
  2. Set a new setting field, just like routing_path. It limit the field to only one dimension field. Or consider support multi dimension field, And users pass the routing field to search\get\delete data, like I mentioned above.

@nik9000
Copy link
Member Author

nik9000 commented Mar 7, 2022

@elasticmachine, update branch

@nik9000
Copy link
Member Author

nik9000 commented Mar 7, 2022

@elasticmachine update branch

@nik9000 nik9000 requested a review from imotov March 7, 2022 15:35
@nik9000
Copy link
Member Author

nik9000 commented Mar 7, 2022

The failed release tests are #84698 - we'll ignore them for now.

@nik9000
Copy link
Member Author

nik9000 commented Mar 8, 2022

@elasticmachine update branch

Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. LGTM. Left a few naming suggestions to clarify intent of some methods and classes for future readers.

assert autoGeneratedTimestamp == UNSET_AUTO_GENERATED_TIMESTAMP : "timestamp has already been generated!";
assert ifSeqNo == UNASSIGNED_SEQ_NO;
assert ifPrimaryTerm == UNASSIGNED_PRIMARY_TERM;
autoGeneratedTimestamp = Math.max(0, System.currentTimeMillis()); // extra paranoia
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's copied from above. I think the paranoia is about never setting it to a negative number. I'll dig a little and leave a better mesage.

String uid = UUIDs.base64UUID();
id(uid);
}
public void autoGenerateId() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit spooky. I think it might be cleaner if we moved id generation into IndexRouting with clear assignment. And generated timestamp in separate method called accordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I feel like it's better as a method on IndexRequest because it's mutating the guts of the IndexRequest. The whole process chain is a little spooky to be honest though.

* stored, but we need to keep it so that its FieldType can be used to generate
* queries.
*/
public class StandardIdFieldMapper extends IdFieldMapper {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should call it NoopIdFieldMapper to better reflect what it does (nothing) instead of where it is currently used (standard index mode).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It exposes the _id for fetching and querying and sometimes provides field data for it. It isn't really noop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NonGeneratingIdFieldMapper? 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you are trolling! But, yeah, I don't like the name either. I'm saving this for a follow up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wasn't trying to troll, I am just trying to brainstorm here. How about ExistingIdFieldMapper , ProvidedIdFieldMapper or SuppliedIdFieldMapper?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was sort of trolling. I know it's just brainstorming. ProvidedIdFieldMapper seems better. But, like, has the fielddata stuff too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The big difference is that it expects the _id to be generated by the coordinating node or handed off to us. And it can make fielddata.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We went with ProvidedIdFieldMapper. It isn't the best name, but it's better than "standard". It gives more of a hint about what makes it unique. Naming is hard.

* stored, but we need to keep it so that its FieldType can be used to generate
* queries.
*/
public class TimeSeriesModeIdFieldMapper extends IdFieldMapper {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Maybe we can rename this into something like TsidExtractingIdFiledMapper.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I can get behind.

@nik9000
Copy link
Member Author

nik9000 commented Mar 10, 2022

run elasticsearch-ci/release-tests

@nik9000 nik9000 merged commit 37ea6a8 into elastic:master Mar 10, 2022
@weizijun
Copy link
Contributor

Congratulations! It final merged!

@nik9000
Copy link
Member Author

nik9000 commented Mar 11, 2022

Congratulations! It final merged!

Thanks! Getting something like this in does feel like an accomplishment! Now to do all the follow up changes I promised!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) test-release Trigger CI checks against release build v8.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants