Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support Data Streams in OpenSearch #675

Closed
ketanv3 opened this issue May 8, 2021 · 0 comments · Fixed by #690
Closed

[Feature] Support Data Streams in OpenSearch #675

ketanv3 opened this issue May 8, 2021 · 0 comments · Fixed by #690
Labels
enhancement Enhancement or improvement to existing feature or request

Comments

@ketanv3
Copy link
Contributor

ketanv3 commented May 8, 2021

Is your feature request related to a problem? Please describe.

Data streams let users store time-series data across multiple indices while exposing a single named resource for requests. It is well suited for logs, events, metrics, and other continuously generated data where documents are seldom updated and searches generally target the most recent documents.

The creation of a data stream requires a matching index template containing the mappings and settings used to configure the data stream’s backing indices. The data_stream field indicates that the template creates a data stream instead of a regular index.

PUT /_index_template/my-data-stream-template
{
    "index_patterns": [ "logs-haproxy", "logs-nginx", "logs-redis" ],
    "data_stream": { }
}

Though OpenSearch already has the APIs to interact with data streams (create/read/delete/get stats), the creation of a new data stream is still not possible due to the lack of a metadata field mapper. This mapper is necessary to parse the data_stream field in the index template, and to perform timestamp field validation on the ingested documents.

Without this metadata field mapper, the creation of an index template fails with this error:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "composable template [my-data-stream-template] template after composition is invalid"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "composable template [my-data-stream-template] template after composition is invalid",
    "caused_by" : {
      "type" : "x_content_parse_exception",
      "reason" : "[index_template] unknown field [data_stream]"
    }
  },
  "status" : 400
}

Describe the solution you'd like

We need to create a MetadataFieldMapper to parse the _data_stream_timestamp metadata field mapping used to create data streams. This mapper also overrides the postParse method to ensure that each indexed document has the timestamp field present.

Question: Should the timestamp field name be standardized or made configurable?
A data stream currently only allows "@timestamp" as the timestamp field name for each ingested document. We can remove this restriction and allow users to change the default timestamp field name as required using an index template.

# "@timestamp" will be used as the default timestamp field name.
PUT /_index_template/my-data-stream-template
{
    "index_patterns": [ "logs-haproxy", "logs-nginx", "logs-redis" ],
    "data_stream": { }
}

# Users can also manually configure the timestamp field name.
PUT /_index_template/my-data-stream-template
{
    "index_patterns": [ "logs-haproxy", "logs-nginx", "logs-redis" ],
    "data_stream": { "timestamp_field": { "name": "created_at" } }
}

Additional context

As data streams can be queried just like regular indices/aliases, plugins like SQL, PPL, and Asynchronous Search will work seamlessly with data streams. Integration with other OpenSearch plugins such as the following can be improved to further extend the functionality of data streams. These will be tracked as separate issues.

  1. Index Management plugin – An ISM policy can be associated with a data stream to manage the underlying backing indices. These backing indices when rolled over can be moved to a different state, deleted after some time, or rolled up into a summarized index.
  2. Index Management Dashboards plugin – We will update the Index Management user interface to include the ability to view data streams and their underlying backing indices, and assign or edit a policy. Creating index patterns can also be made simpler as the timestamp field is known for a data stream.
  3. Security plugin – Similar to regular indices, user access can be limited to the entire data stream, part of the backing indices of the data stream, as well as at a document or field level.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant