Skip to content

Commit

Permalink
[exporter/clickhouse] Update default logs table schema (2) (#34203)
Browse files Browse the repository at this point in the history
**Description:**

Previously updated in #33611, I am opening this to start a discussion on
further improvements that can be made to the table.

Notable changes:
- Changed from monthly partitions to daily. With
`ttl_only_drop_parts=1`, this will help drop data for TTLs shorter than
1 month (such as when your log retention is only 7 days).
- Changed `idx_body` granularity to `8`, which should reduce the index
size (especially beneficial for cloud services with separate storage)
- Removed `TimestampDate` column
- Simplified primary key to only use `TimestampTime`. Performance
difference is negligible if not better. Also makes queries easier to
write-- with the current version it requires that you provide both
`TimestampDate` and `TimestampTime` for optimal sorting performance.
- Separated and updated order by. Now it matches the primary key, with
the addition of `Timestamp`, so that nanoseconds sorting is preserved by
default.

Let me know if you have any more suggestions.

**Link to tracking Issue:** <Issue number if applicable>

**Testing:** <Describe what testing was performed and which tests were
added.>

**Documentation:**
  • Loading branch information
SpencerTorres authored Aug 6, 2024
1 parent 1035b3b commit 44f6861
Show file tree
Hide file tree
Showing 3 changed files with 35 additions and 8 deletions.
27 changes: 27 additions & 0 deletions .chloggen/clickhouseexporter_update_default_logs_table.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: clickhouseexporter

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Updated the default logs table to a more optimized schema

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [34203]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: Improved partitioning and time range queries.

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: []
8 changes: 4 additions & 4 deletions exporter/clickhouseexporter/example/default_ddl/logs.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

CREATE TABLE IF NOT EXISTS otel_logs (
Timestamp DateTime64(9) CODEC(Delta(8), ZSTD(1)),
TimestampDate Date DEFAULT toDate(Timestamp),
TimestampTime DateTime DEFAULT toDateTime(Timestamp),
TraceId String CODEC(ZSTD(1)),
SpanId String CODEC(ZSTD(1)),
Expand All @@ -26,9 +25,10 @@ CREATE TABLE IF NOT EXISTS otel_logs (
INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_key mapKeys(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_value mapValues(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 1
INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(TimestampDate)
ORDER BY (ServiceName, TimestampDate, TimestampTime)
PARTITION BY toDate(TimestampTime)
PRIMARY KEY (ServiceName, TimestampTime)
ORDER BY (ServiceName, TimestampTime, Timestamp)
TTL TimestampTime + toIntervalDay(180)
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1;
8 changes: 4 additions & 4 deletions exporter/clickhouseexporter/exporter_logs.go
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,6 @@ const (
createLogsTableSQL = `
CREATE TABLE IF NOT EXISTS %s %s (
Timestamp DateTime64(9) CODEC(Delta(8), ZSTD(1)),
TimestampDate Date DEFAULT toDate(Timestamp),
TimestampTime DateTime DEFAULT toDateTime(Timestamp),
TraceId String CODEC(ZSTD(1)),
SpanId String CODEC(ZSTD(1)),
Expand All @@ -158,10 +157,11 @@ CREATE TABLE IF NOT EXISTS %s %s (
INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_key mapKeys(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_log_attr_value mapValues(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 1
INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 8
) ENGINE = %s
PARTITION BY toYYYYMM(TimestampDate)
ORDER BY (ServiceName, TimestampDate, TimestampTime)
PARTITION BY toDate(TimestampTime)
PRIMARY KEY (ServiceName, TimestampTime)
ORDER BY (ServiceName, TimestampTime, Timestamp)
%s
SETTINGS index_granularity = 8192, ttl_only_drop_parts = 1;
`
Expand Down

0 comments on commit 44f6861

Please sign in to comment.