-
Notifications
You must be signed in to change notification settings - Fork 509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation for star tree index feature #8598
Conversation
Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged. Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer. When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review. |
1a7be5d
to
fecec7d
Compare
fecec7d
to
dee4bee
Compare
1ac0829
to
f3833a0
Compare
f3833a0
to
26ff94f
Compare
Signed-off-by: Bharathwaj G <bharath78910@gmail.com>
26ff94f
to
95a47ac
Compare
Signed-off-by: Bharathwaj G <bharath78910@gmail.com>
0369c66
to
78b4c41
Compare
Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Naarcha-AWS Final comments/changes. I'd like to read lines 59-62 in star-tree-index.md before approving. Thanks!
Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Naarcha-AWS LGTM!
|
||
Define star-tree index mappings in the `composite` section in `mappings`. | ||
|
||
The following example API request creates a corresponding star-tree index for all `request_aggs`. To compute metric aggregations for `request_size` and `latency` fields with queries on `port` and `status` fields, configure the following mappings: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about following :
The following example API request creates a corresponding star-tree index configuration under request_aggs
"all request_aggs
" for me sounds a bit confusing
|
||
| Parameter | Required/Optional | Description | | ||
| :--- | :--- | :--- | | ||
| `name` | Required | The name of the field. The field name should be present in the `properties` section as part of the index `mapping`. Ensure that the `doc_values` setting is `enabled` for any associated fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confused on this - config
itself doesn't have a name
property. Can we remove this ?
Under config , user can specify ordered_dimensions
, metrics
, max_leaf_docs
and skip_star_node_creation_for_dimensions
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove this. We have the definitions for max_leaf_docs
and skip_star_node_creation_for_dimensions
on line 193.
|
||
| Parameter | Description | | ||
| :--- | :--- | | ||
| `max_leaf_docs` | The maximum number of star-tree documents that a leaf node can point to. After the maximum number of documents is reached, the nodes will be split based on the value of the next dimension. Default is `10000`. A lower value will use more storage but result in faster query performance. Inversely, a higher value will use less storage but result in slower query performance. For more information, see [Star-tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure). | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the nodes will be split based on the value of the next dimension.
How about once a node crosses threshold of
max_leaf_docs , children nodes will be created based on the unique values
or something similar.
|
||
<img src="{{site.url}}{{site.baseurl}}/images/star-tree-index.png" alt="A star-tree index containing two dimensions and two metrics" width="700"> | ||
|
||
Sorted and aggregated star-tree documents are backed by `doc_values` in an index. `doc_values` use the following pattern: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
" Values stored in doc_values
use the following pattern "
doc_values
itself is a singular column and generally end user might not understand it well either , so we need to enhance it a bit similar to above.
|
||
Sorted and aggregated star-tree documents are backed by `doc_values` in an index. `doc_values` use the following pattern: | ||
|
||
- The values are sorted based on the order of their `ordered_dimension`. In the preceding image, the dimensions are determined by the `status` setting and then by the `port` for each status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values across fields are primary sorted based on first field in ordered_dimension
, secondary sorted by corresponding fields mentioned in the ordered_dimension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exact wording but we need to call out that the sort is based on the fields specified in the ordered dimension.
|
||
### Star nodes | ||
|
||
Star nodes are children of non-leaf nodes that contain preaggregated records for data split after dimension removal, aggregating metrics for rows containing dimensions with identical values. These aggregated documents are then appended to the end of star-tree documents. If a document does contain a dimension with identical values, it traverses through the star node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inaccurate as star nodes can be leaf or non - leaf nodes. Lets reword as below:
"""
Star nodes are special nodes which has the aggregated data of all the other nodes in the same dimension.
- This helps when we need to query the aggregated value of a particular field without traversing through all the nodes of a particular field [ dimension ] in the star tree.
- This also helps in skipping the dimension which is not part of the query.
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original wording itself looks good to me
"""
There are special nodes called star nodes (*) which helps in skipping non-competitive nodes and also in fetching aggregated document wherever applicable during query time.
"""
|
||
Star nodes are children of non-leaf nodes that contain preaggregated records for data split after dimension removal, aggregating metrics for rows containing dimensions with identical values. These aggregated documents are then appended to the end of star-tree documents. If a document does contain a dimension with identical values, it traverses through the star node. | ||
|
||
The star-tree index structure diagram contains the following three examples demonstrating how a document does or does not traverse star-tree nodes (indicated by the `*` symbol in the diagram) during a `Term` query, based on the average request size of the query and whether the document contains matching dimensions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we must be talking about query algorithm here
"""
The star-tree index structure diagram contains the following three examples demonstrating how query algorithm traverses the star tree to get the results
"""
|
||
The star-tree index structure diagram contains the following three examples demonstrating how a document does or does not traverse star-tree nodes (indicated by the `*` symbol in the diagram) during a `Term` query, based on the average request size of the query and whether the document contains matching dimensions: | ||
|
||
- When the port equals `8443` and the status equals `200`. Because the status equals `200`, the query does not traverse through a star node, and the aggregated metric is stored at the end of a star-tree document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets not talk about how star node is traversed / not traversed. Lets keep the wording on how query algorithm traverses the star tree as a whole for a particular query. Image clearly explains the path which is taken by the query
Something similar to the original wording
- Compute average request size aggregation with Terms query where port equals 8443 and status equals 200 - the query visits the actual nodes of 8443 and 200 values(Support for Terms query will be added in upcoming release, see https://github.com/opensearch-project/OpenSearch/issues/15257)
Compute count of requests aggregation with Term query where status equals 200 (query traverses through * node of port dimension since port is not present as part of query)
Compute average request size aggregation with Term query where port equals 5600 (query traverses through * node of status dimension since status is not present as part of query).
The second and third examples uses star nodes.
- When the status equals `200`. The query traverses through a star node in the `port` dimension because `port` is not present as part of the query. | ||
- When the port equals `5600`. The query traverses through a star node in the `status` dimension because `status` is not present as part of the query. | ||
|
||
Support for the `Term` query will be added in a future version. For more information, see [GitHub issue #15257](https://github.com/opensearch-project/OpenSearch/issues/15257). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this should be Terms
query. Term
query is already supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can say individual term query is already supported and support for multiple terms query is to be supported to avoid confusion to the user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just switch to terms
here.
|
||
## Example mapping | ||
|
||
In the following example, index mappings define the star-tree configuration. This star-tree index precomputes aggregations in the `log` index. The aggregations are calculated using the `size` and `latency` fields for all the combinations of values indexed in the `port` and `status` fields: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit : logs
index
|
||
### Aggregation example | ||
|
||
The following example gets the sum of the `size` field for all error logs with `status=500`, using the [example mapping](#example-mapping): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reword to something similar :
The following example gets the sum of all the values in the size
field - for all error logs with status=500
, using the example mapping:
|
||
With the star-tree index, the result will be retrieved from a single aggregated document as it traverses to the `status=500` node, as opposed to scanning through all of the matching documents. This results in lower query latency. | ||
|
||
## Using queries with a star-tree index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be without star tree index ?
|
||
Star-tree indexes can be used to optimize queries and aggregations. | ||
|
||
### Supported queries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section gives the impression that these query shapes are supported independently.
Ideally this should be a sub-section within Supported Aggregations
itself.
Basically, with the below supported aggregations, one can also add a term-query to a search request.
* Adding documentation for star tree index feature Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * addressing comments Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * addressing comments Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * fixes and addressing comments Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * addressing comments Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * addressing comments Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * addressing comments Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * fixing json Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * fixing json Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * addressing comments Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * addressing comments Signed-off-by: Bharathwaj G <bharath78910@gmail.com> * Add edits for star tree field page Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Add index edit Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update improving-search-performance.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update star-tree-index.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update star-tree.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update _field-types/supported-field-types/star-tree.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> * Update star-tree-index.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Bharathwaj G <bharath78910@gmail.com> Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: Eric Pugh <epugh@opensourceconnections.com>
Description
This PR adds documentation for star tree index feature. OpenSearch RFC / Meta
Issues Resolved
Closes #8131
Version
List the OpenSearch version to which this PR applies, e.g. 2.14, 2.12--2.14, or all.
Frontend features
If you're submitting documentation for an OpenSearch Dashboards feature, add a video that shows how a user will interact with the UI step by step. A voiceover is optional.
Checklist
For more information on following Developer Certificate of Origin and signing off your commits, please check here.