Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise portions of the docs LogQL section #4012

Merged
merged 2 commits into from
Jul 22, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
214 changes: 139 additions & 75 deletions docs/sources/logql/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,29 +4,27 @@ weight: 700
---
# LogQL: Log Query Language

Loki comes with its own PromQL-inspired language for queries called *LogQL*.
LogQL can be considered a distributed `grep` that aggregates log sources.
LogQL is Loki's PromQL-inspired query language.
Queries act as if they are a distributed `grep` to aggregate log sources.
jeschkies marked this conversation as resolved.
Show resolved Hide resolved
LogQL uses labels and operators for filtering.

There are two types of LogQL queries:

- *Log queries* return the contents of log lines.
- *Metric queries* extend log queries and calculate sample values based on the content of logs from a log query.
- *Metric queries* extend log queries to calculate values
based on query results.

## Log Queries

A basic log query consists of two parts:
All LogQL queries contain a **log stream selector**.

- **log stream selector**
- **log pipeline**

Due to Loki's design, all LogQL queries must contain a **log stream selector**.
![parts of a query](query_components.png)
cyriltovena marked this conversation as resolved.
Show resolved Hide resolved

The log stream selector determines how many log streams (unique sources of log content, such as files) will be searched.
A more granular log stream selector then reduces the number of searched streams to a manageable volume.
This means that the labels passed to the log stream selector will affect the relative performance of the query's execution.

**Optionally** the log stream selector can be followed by a **log pipeline**. A log pipeline is a set of stage expressions chained together and applied to the selected log streams. Each expressions can filter out, parse and mutate log lines and their respective labels.
Optionally, the log stream selector can be followed by a **log pipeline**. A log pipeline is a set of stage expressions that are chained together and applied to the selected log streams. Each expression can filter out, parse, or mutate log lines and their respective labels.

The following example shows a full log query in action:

Expand All @@ -45,95 +43,153 @@ This is specially useful when writing a regular expression which contains multip

### Log Stream Selector

The log stream selector determines which log streams should be included in your query results.
The stream selector is comprised of one or more key-value pairs, where each key is a **log label** and each value is that **label's value**.
The stream selector determines which log streams to include in a query's results.
The stream selector is specified by one or more comma-separated key-value pairs. Each key is a log label and each value is that label's value.
Curly braces (`{` and `}`) delimit the stream selector.

The log stream selector is written by wrapping the key-value pairs in a pair of curly braces:
Consider this stream selector:

```logql
{app="mysql",name="mysql-backup"}
```

In this example, all log streams that have a label of `app` whose value is `mysql` _and_ a label of `name` whose value is `mysql-backup` will be included in the query results.
Note that this will match any log stream whose labels _at least_ contain `mysql-backup` for their name label;
if there are multiple streams that contain that label, logs from all of the matching streams will be shown in the results.
All log streams that have both a label of `app` whose value is `mysql`
and a label of `name` whose value is `mysql-backup` will be included in
the query results.
A stream may contain other pairs of labels and values,
but only the specified pairs within the stream selector are used to determine
which streams will be included within the query results.

The same rules that apply for [Prometheus Label Selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#instant-vector-selectors) apply for Loki log stream selectors.

The `=` operator after the label name is a **label matching operator**.
The following label matching operators are supported:

- `=`: exactly equal.
- `!=`: not equal.
- `=~`: regex matches.
- `!~`: regex does not match.

Examples:

- `{name=~"mysql.+"}`
- `{name!~"mysql.+"}`
- `` {name!~`mysql-\d+`} ``

The same rules that apply for [Prometheus Label Selectors](https://prometheus.io/docs/prometheus/latest/querying/basics/#instant-vector-selectors) apply for Loki log stream selectors.
- `=`: exactly equal
- `!=`: not equal
- `=~`: regex matches
- `!~`: regex does not match

**Important note:** The `=~` regex operator is fully anchored, meaning regex must match against the *entire* string, including newlines. The regex `.` character does not match newlines by default. If you want the regex dot character to match newlines you can use the single-line flag, like so: `(?s)search_term.+` matches `search_term\n`.
Regex log stream examples:

#### Offset modifier
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Offset modifier section got lost in this PR.

The offset modifier allows changing the time offset for individual range vectors in a query.
- `{name =~ "mysql.+"}`
- `{name !~ "mysql.+"}`
- `` {name !~ `mysql-\d+`} ``

For example, the following expression counts all the logs within the last ten minutes to five minutes rather than last five minutes for the MySQL job. Note that the `offset` modifier always needs to follow the range vector selector immediately.
```logql
count_over_time({job="mysql"}[5m] offset 5m) // GOOD
count_over_time({job="mysql"}[5m]) offset 5m // INVALID
```
**Note:** The `=~` regex operator is fully anchored, meaning regex must match against the *entire* string, including newlines. The regex `.` character does not match newlines by default. If you want the regex dot character to match newlines you can use the single-line flag, like so: `(?s)search_term.+` matches `search_term\n`.

### Log Pipeline

A log pipeline can be appended to a log stream selector to further process and filter log streams. It usually is composed of one or multiple expressions, each expressions is executed in sequence for each log line. If an expression filters out a log line, the pipeline will stop at this point and start processing the next line.

Some expressions can mutate the log content and respective labels (e.g `| line_format "{{.status_code}}"`), which will be then available for further filtering and processing following expressions or metric queries.
Some expressions can mutate the log content and respective labels.
For example,

```
| line_format "{{.status_code}}"`)
```

will be available for further filtering and processing following expressions or metric queries.

A log pipeline can be composed of:

- [Line Filter Expression](#line-filter-expression).
- [Line Filter Expression](#line-filter-expression)
- [Parser Expression](#parser-expression)
- [Label Filter Expression](#label-filter-expression)
- [Line Format Expression](#line-format-expression)
- [Labels Format Expression](#labels-format-expression)
- [Unwrap Expression](#unwrapped-range-aggregations)

The [unwrap Expression](#unwrapped-range-aggregations) is a special expression that should only be used within metric queries.
- [Unwrap Expression](#unwrapped-range-aggregations). An unwrapped expression is only used within metric queries.

#### Line Filter Expression

The line filter expression is used to do a distributed `grep` over the aggregated logs from the matching log streams.
The line filter expression does a distributed `grep`
over the aggregated logs from the matching log streams.
It searches the contents of the log line,
discarding those lines that do not match the case sensitive expression.

Each line filter expression has a **filter operator**
followed by text or a regular expression.
These filter operators are supported:

After writing the log stream selector, the resulting set of logs can be further filtered with a search expression.
The search expression can be just text or regex:
- `|=`: Log line contains string
- `!=`: Log line does not contain string
- `|~`: Log line contains a match to the regular expression
- `!~`: Log line does not contain a match to the regular expression

- `{job="mysql"} |= "error"`
- `{name="kafka"} |~ "tsdb-ops.*io:2003"`
- `` {name="cassandra"} |~ `error=\w+` ``
- `{instance=~"kafka-[23]",name="kafka"} != "kafka.server:type=ReplicaManager"`
Line filter expression examples:

In the previous examples, `|=`, `|~`, and `!=` act as **filter operators**.
The following filter operators are supported:
- Keep log lines that have the substring "error":

- `|=`: Log line contains string.
- `!=`: Log line does not contain string.
- `|~`: Log line matches regular expression.
- `!~`: Log line does not match regular expression.
```
|= "error"
```

Filter operators can be chained and will sequentially filter down the expression - resulting log lines must satisfy _every_ filter:
A complete query using this example:

```
{job="mysql"} |= "error"
```

- Discard log lines that have the substring "kafka.server:type=ReplicaManager":

```
!= "kafka.server:type=ReplicaManager"
```

A complete query using this example:

```
{instance=~"kafka-[23]",name="kafka"} != "kafka.server:type=ReplicaManager"
```

- Keep log lines that contain a substring that starts with `tsdb-ops` and ends with `io:2003`. A complete query with a regular expression:

```
{name="kafka"} |~ "tsdb-ops.*io:2003"
```

- Keep log lines that contain a substring that starts with `error=`,
and is followed by 1 or more word characters. A complete query with a regular expression:

```
{name="cassandra"} |~ `error=\w+`
```

Filter operators can be chained.
Filters are applied sequentially.
Query results will have satisfied every filter.
This complete query example will give results that include the string `error`,
and do not include the string `timeout`.

```logql
{job="mysql"} |= "error" != "timeout"
```

When using `|~` and `!~`, Go (as in [Golang](https://golang.org/)) [RE2 syntax](https://github.com/google/re2/wiki/Syntax) regex may be used.
The matching is case-sensitive by default and can be switched to case-insensitive prefixing the regex with `(?i)`.
The matching is case-sensitive by default.
Switch to case-insensitive matching by prefixing the regular expression
with `(?i)`.

While line filter expressions could be placed anywhere within a log pipeline,
it is almost always better to have them at the beginning.
Placing them at the beginning improves the performance of the query,
as it only does further processing when a line matches.
For example,
while the results will be the same,
the query specified with
Comment on lines +177 to +179
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be an example 🙂


```
{job="mysql"} |= "error" | json | line_format "{{.err}}"
```

While line filter expressions could be placed anywhere in a pipeline, it is almost always better to have them at the beginning. This ways it will improve the performance of the query doing further processing only when a line matches.
will always run faster than

For example, while the result will be the same, the following query `{job="mysql"} |= "error" | json | line_format "{{.err}}"` will always run faster than `{job="mysql"} | json | line_format "{{.message}}" |= "error"`. Line filter expressions are the fastest way to filter logs after log stream selectors.
```
{job="mysql"} | json | line_format "{{.message}}" |= "error"
```

Line filter expressions are the fastest way to filter logs once the
log stream selectors have been applied.

#### Parser Expression

Expand All @@ -159,7 +215,7 @@ Loki supports [JSON](#json), [logfmt](#logfmt), [pattern](#pattern), [regexp](#
It's easier to use the predefined parsers `json` and `logfmt` when you can. If you can't, the `pattern` and `regexp` parsers can be used for log lines with an unusual structure. The `pattern` parser is easier and faster to write; it also outperforms the `regexp` parser.
Multiple parsers can be used by a single log pipeline. This is useful for parsing complex logs. There are examples in [Multiple parsers](#multiple-parsers).

##### Json
##### JSON

The **json** parser operates in two modes:

Expand Down Expand Up @@ -531,45 +587,53 @@ The result would be:

## Metric Queries

LogQL also supports wrapping a log query with functions that allow for creating metrics out of the logs.
LogQL supports applying a function to log query results.
This powerful feature creates metrics from logs.

Metric queries can be used to calculate things such as the rate of error messages, or the top N log sources with the most amount of logs over the last 3 hours.

Combined with log [parsers](#parser-expression), metrics queries can also be used to calculate metrics from a sample value within the log line such latency or request size.
Furthermore all labels, including extracted ones, will be available for aggregations and generation of new series.
Combined with log parsers, metrics queries can also be used to calculate metrics from a sample value within the log line, such as latency or request size.
All labels, including extracted ones, will be available for aggregations and generation of new series.

### Range Vector aggregation

LogQL shares the same [range vector](https://prometheus.io/docs/prometheus/latest/querying/basics/#range-vector-selectors) concept from Prometheus, except the selected range of samples is a range of selected log or label values.
LogQL shares the [range vector](https://prometheus.io/docs/prometheus/latest/querying/basics/#range-vector-selectors) concept of Prometheus.
In Loki, the selected range of samples is a range of selected log or label values.

The aggregation is applied over a time duration.
Loki defines [Time Durations](https://prometheus.io/docs/prometheus/latest/querying/basics/#time-durations) with the same syntax as Prometheus.

Loki supports two types of range aggregations. Log range and unwrapped range aggregations.
Loki supports two types of range vector aggregations: log range aggregations and unwrapped range aggregations.

#### Log Range Aggregations

A log range is a log query (with or without a log pipeline) followed by the range notation e.g [1m]. It should be noted that the range notation `[5m]` can be placed at end of the log pipeline or right after the log stream matcher.
A log range aggregation is a query followed by a duration.
A function is applied to aggregate the query over the duration.
The duration can be placed
after the log stream selector or at end of the log pipeline.

The first type uses log entries to compute values and supported functions for operating over are:
The functions:

- `rate(log-range)`: calculates the number of entries per second
- `count_over_time(log-range)`: counts the entries for each log stream within the given range.
- `bytes_rate(log-range)`: calculates the number of bytes per second for each stream.
- `bytes_over_time(log-range)`: counts the amount of bytes used by each log stream for a given range.
- `absent_over_time(log-range)`: returns an empty vector if the range vector passed to it has any elements and a 1-element vector with the value 1 if the range vector passed to it has no elements. (`absent_over_time` is useful for alerting on when no time series and logs stream exist for label combination for a certain amount of time.)

##### Log Examples
Examples:

```logql
count_over_time({job="mysql"}[5m])
```
- Count all the log lines within the last five minutes for the MySQL job.

This example counts all the log lines within the last five minutes for the MySQL job.
```logql
count_over_time({job="mysql"}[5m])
```

```logql
sum by (host) (rate({job="mysql"} |= "error" != "timeout" | json | duration > 10s [1m]))
```
- This aggregation includes filters and parsers.
It returns the per-second rate of all non-timeout errors within the last minutes per host for the MySQL job and only includes errors whose duration is above ten seconds.

This example demonstrates a LogQL aggregation which includes filters and parsers.
It returns the per-second rate of all non-timeout errors within the last minutes per host for the MySQL job and only includes errors whose duration is above ten seconds.
```logql
sum by (host) (rate({job="mysql"} |= "error" != "timeout" | json | duration > 10s [1m]))
```

#### Unwrapped Range Aggregations

Expand Down
Binary file added docs/sources/logql/query_components.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.