Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for star tree index feature #8598

Merged
merged 34 commits into from
Nov 1, 2024
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
95a47ac
Adding documentation for star tree index feature
bharath-techie Oct 22, 2024
78b4c41
addressing comments
bharath-techie Oct 23, 2024
0e0483a
addressing comments
bharath-techie Oct 23, 2024
dcf47cf
fixes and addressing comments
bharath-techie Oct 23, 2024
8ecd473
addressing comments
bharath-techie Oct 23, 2024
d8357ae
addressing comments
bharath-techie Oct 24, 2024
ffdc6dc
addressing comments
bharath-techie Oct 24, 2024
b3b5783
fixing json
bharath-techie Oct 24, 2024
05edca0
fixing json
bharath-techie Oct 24, 2024
69387a2
Merge branch 'main' into startree
Naarcha-AWS Oct 28, 2024
06848eb
addressing comments
bharath-techie Oct 29, 2024
5f51c3a
addressing comments
bharath-techie Oct 29, 2024
e5cf72d
Merge branch 'main' into startree
Naarcha-AWS Oct 30, 2024
47de351
Merge branch 'main' into startree
Naarcha-AWS Oct 31, 2024
759a258
Add edits for star tree field page
Naarcha-AWS Oct 31, 2024
db0e127
Add index edit
Naarcha-AWS Oct 31, 2024
f4d3a79
Update improving-search-performance.md
Naarcha-AWS Oct 31, 2024
b4205dd
Update star-tree-index.md
Naarcha-AWS Oct 31, 2024
4aea8bf
Update star-tree.md
Naarcha-AWS Oct 31, 2024
1dd9302
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
f7ef88f
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
37c6f11
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
704212a
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
fe891e6
Update _field-types/supported-field-types/star-tree.md
Naarcha-AWS Nov 1, 2024
01d1eef
Merge branch 'main' into startree
Naarcha-AWS Nov 1, 2024
521fbb0
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
6a5d89e
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
d249946
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
6ce9d22
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
c0c5ec0
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
e8bdea5
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
3e372f7
Apply suggestions from code review
Naarcha-AWS Nov 1, 2024
19eaad0
Update star-tree-index.md
Naarcha-AWS Nov 1, 2024
f98e02d
Merge branch 'main' into startree
Naarcha-AWS Nov 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions _field-types/supported-field-types/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
k-NN vector | [`knn_vector`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/knn-vector/): Allows indexing a k-NN vector into OpenSearch and performing different kinds of k-NN search.
Percolator | [`percolator`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/percolator/): Specifies to treat this field as a query.
Derived | [`derived`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/derived/): Creates new fields dynamically by executing scripts on existing fields.
Star-tree | [`star_tree`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/star-tree/): Precomputes aggregations and stores them in a [star-tree index](https://docs.pinot.apache.org/basics/indexing/star-tree-index), accelerating the performance of aggregation queries.

Check failure on line 33 in _field-types/supported-field-types/index.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: Precomputes. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: Precomputes. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_field-types/supported-field-types/index.md", "range": {"start": {"line": 33, "column": 103}}}, "severity": "ERROR"}

## Arrays

Expand Down
199 changes: 199 additions & 0 deletions _field-types/supported-field-types/star-tree.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
---
layout: default
title: Star-tree
nav_order: 61
parent: Supported field types
---

# Star-tree field type

This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion on the [OpenSearch forum](https://forum.opensearch.org/).
{: .warning}

A [star-tree index](https://docs.pinot.apache.org/basics/indexing/star-tree-index) precomputes aggregations, accelerating the performance of aggregation queries.

Check failure on line 13 in _field-types/supported-field-types/star-tree.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: precomputes. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: precomputes. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_field-types/supported-field-types/star-tree.md", "range": {"start": {"line": 13, "column": 84}}}, "severity": "ERROR"}
If a star-tree index is configured as part of an index mapping, the star-tree index is created and maintained as data is ingested in real time.

OpenSearch will automatically use the star-tree index to optimize aggregations if the queried fields are part of star-tree index dimension fields and the aggregations are on star-tree index metric fields. No changes are required in the query syntax or the request parameters.

For more information, see [Star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/).

## Prerequisites

To use a star-tree index, follow the instructions in [Enabling a star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index#enabling-the-star-tree-index).
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

## Limitations

The star-tree index feature has the following limitations:

natebower marked this conversation as resolved.
Show resolved Hide resolved
- A star-tree index should only be enabled on indexes whose data is not updated or deleted because standard updates and deletions are not accounted for in a star-tree index.
- Currently, only `one` star-tree index can be created per index. Support for multiple star-trees will be added in a future version.

## Examples

The following examples show how to use a star-tree index.

### Star-tree index mapping
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

natebower marked this conversation as resolved.
Show resolved Hide resolved
Define star-tree mapping in the `composite` section in `mappings`.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

The following example API request creates a corresponding star-tree index for all `request_aggs`. To compute metric aggregations for `request_size` and `latency` fields with queries on `port` and `status` fields, configure the following mappings:
Copy link
Contributor Author

@bharath-techie bharath-techie Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about following :

The following example API request creates a corresponding star-tree index configuration under request_aggs

"all request_aggs " for me sounds a bit confusing


```json
PUT logs
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 0,
"index.composite_index": true
},
"mappings": {
"composite": {
"request_aggs": {
"type": "star_tree",
"config": {
"max_leaf_docs": 10000,
"skip_star_node_creation_for_dimensions": [
"port"
],
"ordered_dimensions": [
{
"name": "status"
},
{
"name": "port"
}
],
"metrics": [
{
"name": "request_size",
"stats": [
"sum",
"value_count",
"min",
"max"
]
},
{
"name": "latency",
"stats": [
"sum",
"value_count",
"min",
"max"
]
}
]
}
}
},
"properties": {
"status": {
"type": "integer"
},
"port": {
"type": "integer"
},
"request_size": {
"type": "integer"
},
"latency": {
"type": "scaled_float",
"scaling_factor": 10
}
}
}
}
```



## Star-tree mapping parameters

Specify any star-tree configuration mapping options in the `config` section. Parameters cannot be modified without reindexing documents.

The star-tree `config` section supports the following property.

| Parameter | Required/Optional | Description |
| :--- | :--- | :--- |
| `name` | Required | The name of the field. The field name should be present in the `properties` section as part of the index `mapping`. Ensure that the `doc_values` setting is `enabled` for any associated fields.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confused on this - config itself doesn't have a name property. Can we remove this ?

Under config , user can specify ordered_dimensions, metrics, max_leaf_docs and skip_star_node_creation_for_dimensions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this. We have the definitions for max_leaf_docs and skip_star_node_creation_for_dimensions on line 193.


### Ordered dimensions

bharath-techie marked this conversation as resolved.
Show resolved Hide resolved
The `ordered_dimensions` parameter are fields based on which metrics will be aggregated in a star-tree index. The star-tree index will be picked for querying only if all the fields in the query are part of the `ordered_dimensions`.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
When using the `ordered_dimesions` parameter, follow these best practices:

- The order of dimensions matters. You can define the dimensions ordered from the highest cardinality to the lowest cardinality for efficient storage and query pruning.
- Avoid using high-cardinality fields as dimensions. High-cardinality fields adversely affect storage space, indexing throughput, and query performance.
- Currently, fields supported by the `ordered_dimensions` parameter are all [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/), with the exception of `unsigned_long`. For more information, see [GitHub issue #15231](https://github.com/opensearch-project/OpenSearch/issues/15231).
- Support for other field types, such as `keyword` and `ip`, will be added in future versions. For more information, see [GitHub issue #16232](https://github.com/opensearch-project/OpenSearch/issues/16232).
- A minimum of `2` and a maximum of `10` dimensions are supported per star-tree index.

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
The `ordered_dimensions` parameter supports the following property.

| Parameter | Required/Optional | Description |
| :--- | :--- | :--- |
| `name` | Required | The name of the field. The field name should be present in the `properties` section as part of the index `mapping`. Ensure that the `doc_values` setting is `enabled` for any associated fields. |


### Metrics

Configure any metric fields on which you need to perform aggregations. `Metrics` are required as part of a star-tree configuration.

When using `metrics`, follow these best practices:

- Currently, fields supported by `metrics` are all [numeric field types](https://opensearch.org/docs/latest/field-types/supported-field-types/numeric/), with the exception of `unsigned_long`. For more information, see [GitHub issue #15231](https://github.com/opensearch-project/OpenSearch/issues/15231).
- Supported metric aggregations include `Min`, `Max`, `Sum`, `Avg`, and `Value_count`.
- `Avg` is a derived metric based on `Sum` and `Value_count` and is not indexed when a query is run. The remaining base metrics are indexed.
- A maximum of `100` base metrics are supported per star-tree index.
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved

If `Min`, `Max`, `Sum`, and `Value_count` are defined as `metrics` for each field, then up to 25 such fields can be configured, as shown in the following example:

```json
{
"metrics": [
{
"name": "field1",
"stats": [
"sum",
"value_count",
"min",
"max"
],
...,
...,
"name": "field25",
"stats": [
"sum",
"value_count",
"min",
"max"
]
}
]
}
```


#### Properties

The `metrics` parameter supports the following properties.

| Parameter | Required/Optional | Description |
| :--- | :--- | :--- |
| `name` | Required | The name of the field. The field name should be present in the `properties` section as part of the index `mapping`. Ensure that the `doc_values` setting is `enabled` for any associated fields. |
| `stats` | Optional | A list of metric aggregations computed for each field. You can choose between `Min`, `Max`, `Sum`, `Avg`, and `Value Count`.<br/>Default is `Sum` and `Value_count`.<br/>`Avg` is a derived metric statistic that will automatically be supported in queries if `Sum` and `Value_Count` are present as part of metric `stats`.

### Star-tree configuration parameters

The following parameters are optional and cannot be modified following index creation.

| Parameter | Description |
| :--- | :--- |
Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
| `max_leaf_docs` | The maximum number of star-tree documents that a leaf node can point to. After the maximum number of documents is reached, the nodes will be split based on the value of the next dimension. Default is `10000`. A lower value will use more storage but result in faster query performance. Inversely, a higher value will use less storage but result in slower query performance. For more information, see [Star-tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure). |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the nodes will be split based on the value of the next dimension.

How about once a node crosses threshold of max_leaf_docs , children nodes will be created based on the unique values or something similar.

| `skip_star_node_creation_for_dimensions` | A list of dimensions for which a star-tree index will skip star node creation. When `true`, this reduces storage size at the expense of query performance. Default is `false`. For more information about star nodes, see [Star-tree indexing structure]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#star-tree-index-structure). |

## Supported queries and aggregations

For more information about supported queries and aggregations, see [Supported queries and aggregations for a star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/#supported-queries-and-aggregations).

Naarcha-AWS marked this conversation as resolved.
Show resolved Hide resolved
4 changes: 3 additions & 1 deletion _search-plugins/improving-search-performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,6 @@ OpenSearch offers several ways to improve search performance:

- Run resource-intensive queries asynchronously with [asynchronous search]({{site.url}}{{site.baseurl}}/search-plugins/async/).

- Search segments concurrently using [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/).
- Search segments concurrently using [concurrent segment search]({{site.url}}{{site.baseurl}}/search-plugins/concurrent-segment-search/).

- Improve aggregation performance using a [star-tree index]({{site.url}}{{site.baseurl}}/search-plugins/star-tree-index/).
Loading
Loading