-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClickHouse Logs table sub-optimal primary key (SeverityNumber vs SeverityText) #32215
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Hey @cwegener, Thank you for submitting this issue. I am actively reviewing the logs table schema with the ClickHouse core team. I hope to submit a revised table schema soon with optimizations similar to the one you described here. I will forward this to them so we can consider it in the updated schema. Thanks! |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
still valid |
**Description:** Closes #32215 This PR updates the default `otel_logs` table schema. The intent of this change is to provide a good starting point for users to [***create their own schema***](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/clickhouseexporter#schema-management). This schema works as-is, but should be tuned to fit the final workflow when querying (specifically the indices, `TTL`, and `ORDER BY`). There are no breaking changes, this schema is compatible with the previous `INSERT` statement. This does not affect existing users, only new users that have `create_schema` enabled in the config. Notable changes: - 2 new Timestamp columns have been added for `Date` and `DateTime` types. Nanoseconds `Timestamp` is preserved. This does not affect the `INSERT` since these are derived by `DEFAULT` from `Timestamp`. - Partitioning has been optimized to use a larger time range, based on the new `TimestampDate` time - `TTL` is now based on `TimestampTime` - Data types for `TraceFlags` and `SeverityNumber` have been reduced to a `UInt8`, since this is all that is required by the OTel specification. - String-like columns have been changed to `LowCardinality` where appropriate - `ORDER BY` has been reduced to only `TimestampDate` and `TimestampTime`. This offers good performance as-is, but if you have a specific workflow that depends on `ServiceName` for example, feel free to add it at as the first column. - Updated the TTL expression generator, dependent functions will need to convert to DateTime type manually (dependent code has been updated for this) **Testing:** - Updated integration tests to include new timestamp columns Integration tests are disabled, but I fixed them locally to test that this change works as intended. Unit tests are passing as well. **Documentation:** - Updated README quick-start queries to use the more efficient `TimestampTime` column - Example DDL has been updated
Component(s)
exporter/clickhouse
What happened?
Description
The
otel_logs
table definition in the ClickHouse exporter usesSeverityText
as part of the primary key.And
SeverityNumber
is not even listed at all in the primary key.opentelemetry-collector-contrib/exporter/clickhouseexporter/exporter_logs.go
Line 158 in 25e300a
Steps to Reproduce
N/A. Primarily raising this as a design issue for now. There might be implications to insert and query performance.
Expected Result
SeverityNumber
should be part of the primary key, since it is the field that the source system's severities are mapped to when translating from an external log model into OTEL Logs data model. AndSeverityText
comes from the external log source and is not a representation of severity as it is defined in OTEL's Log data model.Actual Result
SeverityText
is part of the primary key.Collector version
v0.97.0
Environment information
Environment
OS: "Ubuntu 22.04"
OpenTelemetry Collector configuration
No response
Log output
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: