Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ClickHouse Logs table sub-optimal primary key (SeverityNumber vs SeverityText) #32215

Closed
cwegener opened this issue Apr 8, 2024 · 4 comments · Fixed by #33611
Closed

ClickHouse Logs table sub-optimal primary key (SeverityNumber vs SeverityText) #32215

cwegener opened this issue Apr 8, 2024 · 4 comments · Fixed by #33611
Labels
bug Something isn't working exporter/clickhouse

Comments

@cwegener
Copy link
Contributor

cwegener commented Apr 8, 2024

Component(s)

exporter/clickhouse

What happened?

Description

The otel_logs table definition in the ClickHouse exporter uses SeverityText as part of the primary key.

And SeverityNumber is not even listed at all in the primary key.

ORDER BY (ServiceName, SeverityText, toUnixTimestamp(Timestamp), TraceId)

Steps to Reproduce

N/A. Primarily raising this as a design issue for now. There might be implications to insert and query performance.

Expected Result

SeverityNumber should be part of the primary key, since it is the field that the source system's severities are mapped to when translating from an external log model into OTEL Logs data model. And SeverityText comes from the external log source and is not a representation of severity as it is defined in OTEL's Log data model.

Actual Result

SeverityText is part of the primary key.

Collector version

v0.97.0

Environment information

Environment

OS: "Ubuntu 22.04"

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

@cwegener cwegener added bug Something isn't working needs triage New item requiring triage labels Apr 8, 2024
Copy link
Contributor

github-actions bot commented Apr 8, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@SpencerTorres
Copy link
Member

Hey @cwegener,

Thank you for submitting this issue. I am actively reviewing the logs table schema with the ClickHouse core team. I hope to submit a revised table schema soon with optimizations similar to the one you described here. I will forward this to them so we can consider it in the updated schema.

Thanks!

@JaredTan95 JaredTan95 removed the needs triage New item requiring triage label Apr 8, 2024
Copy link
Contributor

github-actions bot commented Jun 7, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jun 7, 2024
@hanjm
Copy link
Member

hanjm commented Jun 7, 2024

still valid

@github-actions github-actions bot removed the Stale label Jun 8, 2024
dmitryax pushed a commit that referenced this issue Jun 19, 2024
**Description:**

Closes #32215 

This PR updates the default `otel_logs` table schema.
The intent of this change is to provide a good starting point for users
to [***create their own
schema***](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/clickhouseexporter#schema-management).
This schema works as-is, but should be tuned to fit the final workflow
when querying (specifically the indices, `TTL`, and `ORDER BY`).

There are no breaking changes, this schema is compatible with the
previous `INSERT` statement. This does not affect existing users, only
new users that have `create_schema` enabled in the config.

Notable changes:
- 2 new Timestamp columns have been added for `Date` and `DateTime`
types. Nanoseconds `Timestamp` is preserved. This does not affect the
`INSERT` since these are derived by `DEFAULT` from `Timestamp`.
- Partitioning has been optimized to use a larger time range, based on
the new `TimestampDate` time
- `TTL` is now based on `TimestampTime`
- Data types for `TraceFlags` and `SeverityNumber` have been reduced to
a `UInt8`, since this is all that is required by the OTel specification.
- String-like columns have been changed to `LowCardinality` where
appropriate
- `ORDER BY` has been reduced to only `TimestampDate` and
`TimestampTime`. This offers good performance as-is, but if you have a
specific workflow that depends on `ServiceName` for example, feel free
to add it at as the first column.
- Updated the TTL expression generator, dependent functions will need to
convert to DateTime type manually (dependent code has been updated for
this)

**Testing:**
- Updated integration tests to include new timestamp columns
Integration tests are disabled, but I fixed them locally to test that
this change works as intended.
Unit tests are passing as well.

**Documentation:**
- Updated README quick-start queries to use the more efficient
`TimestampTime` column
- Example DDL has been updated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/clickhouse
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants