Labels vanishing after 30 minutes of inactivity #453

candlerb · 2019-04-06T17:00:41Z

Describe the bug
Test loki instance seemed to lose its labels after a while (30+ minutes of inactivity)

To Reproduce
Steps initially used to reproduce the behavior:

Started loki (c994823) in a screen session
Started promtail (c994823) to tail '/var/logs' using default config, in another screen session
Show labels and log entries
Go away for a while
Show labels and log entries

EDIT: A better way to reproduce this is shown below. The above behaviour depends on an Ubuntu container running cron.hourly at 17 minutes past the hour, and otherwise not logging anything. From 47 minutes past the hour until 17 minutes past the next hour, labels had vanished.

Expected behavior

At startup I was able to read labels, and query logs via the API:

curl 'localhost:3100/api/prom/label'
curl 'localhost:3100/api/prom/label/job/values'
curl 'localhost:3100/api/prom/query?query=%7bjob="varlogs"%7d' | jq .
... saw a bunch of logs

But after going away and coming back again, no labels were shown:

# curl 'localhost:3100/api/prom/label'
{}
# curl 'localhost:3100/api/prom/label/job/values'
{}

And yet the chunks are still there:

# ls /tmp/loki/chunks/
 ZmFrZS83YTI1N2M5ZWI2YTYyMDkwOjE2OWYzMTZiODE2OjE2OWYzMTZiODYyOjdkNzMyNGQy
 ZmFrZS83YTI1N2M5ZWI2YTYyMDkwOjE2OWYzMzk2ZWExOjE2OWYzMzk2ZWExOjE1MGVkZDA2
 ZmFrZS83YTI1N2M5ZWI2YTYyMDkwOjE2OWYzNzA1Y2YzOjE2OWYzNzA1Y2YzOmY3YzNhNjk3
 ZmFrZS83ZTAyYmFmNDA1NmYyN2Q0OjE2OWYzMTZiODE5OjE2OWYzMWE2MjU0OjE0ZDliNGE4
 ZmFrZS83ZTAyYmFmNDA1NmYyN2Q0OjE2OWYzMjExYmEyOjE2OWYzMjI5ZmU0OjNkMzM2M2I3
 ZmFrZS84ZTk0MDgzMzdiZDZlYzA0OjE2OWYzMTZiODFhOjE2OWYzMTZiODJmOjIyYmFhYWI2
'ZmFrZS85N2RiMjQ0MDM5ZWFiOGY6MTY5ZjM3MDVjZjM6MTY5ZjM3MDVjZjM6OGNiNmIwMjY='
'ZmFrZS85N2RiMjQ0MDM5ZWFiOGY6MTY5ZjMxNmI4MTc6MTY5ZjMxNmI4MjI6YWM3NWFkZmI='
'ZmFrZS85N2RiMjQ0MDM5ZWFiOGY6MTY5ZjMzOTZlYTE6MTY5ZjMzOTZlYTE6YTc3MjQ0MDA='
 ZmFrZS85NDY0MWIwNjQ0MGRkNGE0OjE2OWYzMTZiODFiOjE2OWYzMTZiOGVmOjI0ODBjYWVh
 ZmFrZS85ZjY0NjJiM2YxZmM5ZmZlOjE2OWYzMTZiODE5OjE2OWYzMTZiODFmOjE3NjA0ODFk

After a few minutes, some logs reappeared:

# curl 'localhost:3100/api/prom/label'
{"values":["__filename__","job"]}
# curl 'localhost:3100/api/prom/query?query=%7bjob="varlogs"%7d' | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1084  100  1084    0     0   132k      0 --:--:-- --:--:-- --:--:--  132k
{
  "streams": [
    {
      "labels": "{__filename__=\"/var/log/syslog\", job=\"varlogs\"}",
      "entries": [
        {
          "ts": "2019-04-06T17:02:01.225904931Z",
          "line": "Apr  6 17:02:01 loki CRON[13227]: (root) CMD (   test -x /etc/cron.daily/popularity-contest && /etc/cron.daily/popularity-contest --crond)"
        },
        {
          "ts": "2019-04-06T16:17:01.17174724Z",
          "line": "Apr  6 16:17:01 loki CRON[13204]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)"
        }
      ]
    },
    {
      "labels": "{__filename__=\"/var/log/auth.log\", job=\"varlogs\"}",
      "entries": [
        {
          "ts": "2019-04-06T17:02:01.22562594Z",
          "line": "Apr  6 17:02:01 loki CRON[13226]: pam_unix(cron:session): session closed for user root"
        },
        {
          "ts": "2019-04-06T17:02:01.22561571Z",
          "line": "Apr  6 17:02:01 loki CRON[13226]: pam_unix(cron:session): session opened for user root by (uid=0)"
        },
        {
          "ts": "2019-04-06T16:17:01.171625148Z",
          "line": "Apr  6 16:17:01 loki CRON[13203]: pam_unix(cron:session): session closed for user root"
        },
        {
          "ts": "2019-04-06T16:17:01.171614933Z",
          "line": "Apr  6 16:17:01 loki CRON[13203]: pam_unix(cron:session): session opened for user root by (uid=0)"
        }
      ]
    }
  ]
}

It's as if the API doesn't always return all available labels and log records. However the API documentation doesn't mention this.

Actually, the API documentation doesn't give any defaults for the query parameters. I would expect 'end' to default to the current time. It's unclear if there's some default for 'limit', and/or 'start' defaults to some fixed offset before the end time.

Environment:

Ubuntu 18.04 lxd container running on Ubuntu 18.04 host
Loki built from source with go 1.12.1

Additional
During all this time, no messages were logged to the screen where loki was running, except for the messages which were output when it started up.

# cd ~/go/src/github.com/grafana/loki; ./loki -config.file=./cmd/loki/loki-local-config.yaml
level=info ts=2019-04-06T14:44:43.408338242Z caller=loki.go:122 msg=initialising module=server
level=info ts=2019-04-06T14:44:43.408965936Z caller=gokit.go:36 http=[::]:3100 grpc=[::]:9095 msg="server listening on addresses"
level=info ts=2019-04-06T14:44:43.409629113Z caller=loki.go:122 msg=initialising module=overrides
level=info ts=2019-04-06T14:44:43.409891026Z caller=override.go:33 msg="per-tenant overides disabled"
level=info ts=2019-04-06T14:44:43.410067021Z caller=loki.go:122 msg=initialising module=store
level=info ts=2019-04-06T14:44:43.410367709Z caller=loki.go:122 msg=initialising module=ingester
level=info ts=2019-04-06T14:44:43.410842862Z caller=loki.go:122 msg=initialising module=ring
level=info ts=2019-04-06T14:44:43.411310724Z caller=loki.go:122 msg=initialising module=distributor
level=info ts=2019-04-06T14:44:43.411437991Z caller=loki.go:122 msg=initialising module=querier
level=info ts=2019-04-06T14:44:43.413546771Z caller=loki.go:122 msg=initialising module=all
level=info ts=2019-04-06T14:44:43.413740112Z caller=main.go:56 msg="Starting Loki" version="(version=, branch=, revision=)"
level=info ts=2019-04-06T14:44:43.416685505Z caller=lifecycler.go:366 msg="entry not found in ring, adding with no tokens"
level=error ts=2019-04-06T14:44:43.416927349Z caller=consul_client.go:182 msg="error getting path" key=ring err=null
level=info ts=2019-04-06T14:44:43.417484986Z caller=lifecycler.go:296 msg="auto-joining cluster after timeout"

The text was updated successfully, but these errors were encountered:

candlerb · 2019-04-06T17:20:09Z

OK, it does seem to be something to do with start time; I am able to get all the logs I saw previously, by using:

# curl 'localhost:3100/api/prom/query?query=%7bjob="varlogs"%7d&start=1554556352000000000' | jq .

where that start time is date +%s - 14400, multiplied up to nanoseconds.

(Aside: it's a bit annoying that the JSON output has "ts" in ISO datetime format, while the query string uses Unix epoch nanoseconds. It would be nice to stick with one or the other)

But I still don't understand why the labels vanished. If I query even an empty time range, with start=end, I now get labels:

# curl 'localhost:3100/api/prom/label?start=1554567417000000000&end=1554567417000000000'
{"values":["__filename__","job"]}
# curl 'localhost:3100/api/prom/label?start=0&end=1'
{"values":["__filename__","job"]}

But I definitely wasn't getting any earlier, when writing up this ticket.

candlerb · 2019-04-06T18:32:12Z

I was able to reproduce, just leaving the system alone.

# while [ 1 ]; do curl 'localhost:3100/api/prom/label'; sleep 300; done
{"values":["__filename__","job"]}
{"values":["__filename__","job"]}
{"values":["__filename__","job"]}
{}
{}
{}
{}
{}
{}
{"values":["__filename__","job"]}
{"values":["__filename__","job"]}
{"values":["__filename__","job"]}
{"values":["__filename__","job"]}
{"values":["__filename__","job"]}
{"values":["__filename__","job"]}
{}
{}
{}
{}
{}
{}
{"values":["__filename__","job"]}
{"values":["__filename__","job"]}
{"values":["__filename__","job"]}
{"values":["__filename__","job"]}
{"values":["__filename__","job"]}
{"values":["__filename__","job"]}
{}
{}
{}
{}
{}
{}
{"values":["__filename__","job"]}
{"values":["__filename__","job"]}
{"values":["__filename__","job"]}
{"values":["__filename__","job"]}
{"values":["__filename__","job"]}
{"values":["__filename__","job"]}
{}
{}
{}
{}
{}
{}
{"values":["__filename__","job"]}
{"values":["__filename__","job"]}
{"values":["__filename__","job"]}

Suggests that maybe labels are only shown for log entries in the past half hour? Is this documented?

# date
Sat Apr  6 21:32:29 UTC 2019
# curl 'localhost:3100/api/prom/query?query=%7bjob="varlogs"%7d' | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   606  100   606    0     0  60600      0 --:--:-- --:--:-- --:--:-- 60600
{
  "streams": [
    {
      "labels": "{__filename__=\"/var/log/auth.log\", job=\"varlogs\"}",
      "entries": [
        {
          "ts": "2019-04-06T21:17:01.407876399Z",
          "line": "Apr  6 21:17:01 loki CRON[13472]: pam_unix(cron:session): session closed for user root"
        },
        {
          "ts": "2019-04-06T21:17:01.407867827Z",
          "line": "Apr  6 21:17:01 loki CRON[13472]: pam_unix(cron:session): session opened for user root by (uid=0)"
        }
      ]
    },
    {
      "labels": "{__filename__=\"/var/log/syslog\", job=\"varlogs\"}",
      "entries": [
        {
          "ts": "2019-04-06T21:17:01.401544901Z",
          "line": "Apr  6 21:17:01 loki CRON[13473]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)"
        }
      ]
    }
  ]
}

candlerb · 2019-04-07T08:21:36Z

The label API endpoint doesn't appear to honour start and end. Testing during one of the times when the labels are missing:

# curl "localhost:3100/api/prom/label"
{}
# echo "localhost:3100/api/prom/label?start=$(($(date +%s)-3600))000000000&end=$(($(date +%s)))000000000"
localhost:3100/api/prom/label?start=1554621389000000000&end=1554624989000000000
# curl "localhost:3100/api/prom/label?start=$(($(date +%s)-3600))000000000&end=$(($(date +%s)))000000000"
{}
# curl "localhost:3100/api/prom/label?start=$(($(date +%s)-14400))000000000&end=$(($(date +%s)))000000000"
{}

The last query covers the time up to 4 hours ago, but still no labels are seen.

I also notice that logcli labels doesn't appear to send a start or end time in the request it sends, whereas logcli query does.

It's now pretty clear that the labels are visible for 30 minutes after HH:17, which is the time that cron.hourly runs.

# grep hourly /etc/crontab
17 *	* * *	root    cd / && run-parts --report /etc/cron.hourly

daixiang0 · 2019-04-08T02:13:11Z

now logcli can not query with timestamp.
Do you restart loki, now we do not support WAL(no write-ahead log implemented).

candlerb · 2019-04-08T09:17:04Z

now logcli can not query with timestamp.

I'm using curl, not logcli, and I'm following the API documentation. It says that the /api/prom/label endpoint is "For retrieving the names of the labels one can query on". It does not say that it only returns labels which have been active in the last 30 minutes. Hence I was surprised that labels vanished from the list, and later reappeared.

This could be a question of fixing the documentation (if this is intentional behaviour), or it could be a question of fixing the the implementation.

One thing is for sure: I would be surprised to go into Grafana and query for available labels, and not find any. (This is what actually happened when I first tried Grafana+Loki; I switched to using the API directly to try to narrow down why logs were randomly disappearing).

Do you restart loki, now we do not support WAL(no write-ahead log implemented).

No - Loki was left running continuously for the whole test.

daixiang0 · 2019-04-08T09:32:46Z

https://github.com/grafana/loki/blob/master/pkg/querier/http.go#L20
it is a default setting.
@tomwilkie hi, is there any thought about this setting?

candlerb · 2019-04-08T09:42:41Z

I'm not sure that's the same.

defaulSince (sic) is one hour before current time, but as you can see from my sleep 300 loop, labels disappear after 30 minutes.
I tried making calls to /api/prom/label with explicit start and end times, but it didn't make any difference. Therefore I think start/end are only used for log query, not label requests.

candlerb · 2019-04-08T09:45:43Z

I did find this 30 minute constant:

./pkg/ingester/ingester.go:	f.DurationVar(&cfg.MaxChunkIdle, "ingester.chunks-idle-period", 30*time.Minute, "")

Could it be something here? When chunks are flushed, the labels are forgotten?

candlerb · 2019-04-08T10:25:30Z

I will turn this into a reproducer which doesn't require promtail and doesn't depend on Ubuntu's cron.hourly behaviour.

Install loki
Ensure /tmp/loki doesn't exist
Start loki in a separate terminal or screen session:
cd ~/go/src/github.com/grafana/loki; ./loki -config.file=./cmd/loki/loki-local-config.yaml
Add a log entry, check labels, wait >30 minutes, check labels again. Like this:

# curl -H Content-Type:application/json -d '{"streams":[{"labels":"{foo=\"bar\"}","entries":[{ "ts": "2018-12-18T08:28:06.801064-04:00", "line": "baz" }]}]}}' localhost:3100/api/prom/push
# curl localhost:3100/api/prom/label
{"values":["foo"]}
# sleep 1860; curl localhost:3100/api/prom/label
{}
# curl 'localhost:3100/api/prom/label?start=1545049686000000000&end=1545222486000000000'
{}

(Aside: for consistency I would have expected {"values":[]} here?)

But the main thing is, labels have vanished. Even if you give an explicit time range. The time range specified above is from 24 hours before to 24 hours after the given ts value.

irb(main):003:0> require 'time'
=> true
irb(main):004:0> t = Time.parse("2018-12-18T08:28:06.801064-04:00")
=> 2018-12-18 08:28:06 -0400
irb(main):005:0> t.to_i
=> 1545136086
irb(main):006:0> t.to_i - 86400
=> 1545049686
irb(main):007:0> t.to_i + 86400
=> 1545222486

garo · 2019-04-09T06:08:38Z

Based on what I learned from the source code the /api/prom/label only queries the labels from the ingestion instances, which means that you are getting labels which are currently "active" (or have just recently been) (see https://github.com/grafana/loki/blob/master/pkg/querier/querier.go#L130)

When you query actual log data the query is sent to both ingestion nodes and the index backend. But for this you need to specify the "start" option as the label indices are spread along the time axis. I just created a PR #461 which describes this behaviour, but not the first (active labels).

Does this explain what you are getting? It's another question should this be improved.

candlerb · 2019-04-09T06:27:14Z

Does this explain what you are getting?

Sounds like it could. The 30 minute ingester idle time is not documented anywhere.

This behaviour isn't good. When I'm in the Grafana UI, I cannot see any logs (including historical logs) until I've selected at least one label. But if that particular label hasn't been active for >30 minutes, it's not shown. This makes loki pretty useless.

davkal · 2019-04-10T12:32:42Z

This issue is being worked on over the next couple of days. Thanks for the thorough reporting.

devAcad · 2019-04-17T14:04:28Z

we are hitting the same issue. Any ETA on when the fix will be available?

cyriltovena · 2019-04-26T14:05:20Z

Fix is in review !

Thank you.

devAcad · 2019-05-28T09:24:28Z

Any more update on this?

rasple · 2019-11-25T08:22:20Z

Is there a fix yet? I'm having the same problem with latest docker image.

slim-bean · 2019-11-26T09:25:54Z

This particular issue was fixed with #521 where labels would be queried from the store.

If you are missing labels in explore you should expand out the timeframe of your query and reload the page (the reload is only required to work around a temporary change made in explore to accommodate slow label query performance which is fixed in the v11 schema update that will be made the default as soon as we figure out how to best adapt this in the helm chart)

candlerb changed the title ~~Labels vanishing after a while~~ Labels vanishing after 30 minutes of inactivity Apr 8, 2019

candlerb mentioned this issue Apr 9, 2019

Stream labels empty in response when sending logs via fluentd #322

Closed

cyriltovena self-assigned this Apr 18, 2019

cyriltovena mentioned this issue Apr 18, 2019

Adds the ability the fetch label values from the store cortexproject/cortex#1337

Merged

cyriltovena added component/loki kind/bug labels Apr 24, 2019

cyriltovena mentioned this issue Apr 26, 2019

Query label values and names are now fetched from the store. #521

Merged

cyriltovena mentioned this issue Apr 26, 2019

Support new Loki labels query grafana/grafana#16788

Closed

kaydelaney mentioned this issue Jun 11, 2019

Explore: Adds support for new loki 'start' and 'end' params for labels endpoint grafana/grafana#17512

Merged

cyriltovena mentioned this issue Jun 17, 2019

Loki-Grafana - metadata/pod list in the search labels #408

Closed

cyriltovena mentioned this issue Jul 3, 2019

redeploy in k8s. The old data can not be retrieved #712

Closed

cyriltovena closed this as completed in #521 Jul 8, 2019

candlerb mentioned this issue Jul 14, 2019

Loki data loss #430

Closed

rasple mentioned this issue Nov 25, 2019

Label endpoint returns empty despite labels being present #1308

Closed

Mario-Hofstaetter mentioned this issue Mar 27, 2020

No labels after restarting loki 1.3.0 #1858

Closed

chaudum added the type/bug Somehing is not working as expected label Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Labels vanishing after 30 minutes of inactivity #453

Labels vanishing after 30 minutes of inactivity #453

candlerb commented Apr 6, 2019 •

edited

Loading

candlerb commented Apr 6, 2019

candlerb commented Apr 6, 2019 •

edited

Loading

candlerb commented Apr 7, 2019

daixiang0 commented Apr 8, 2019

candlerb commented Apr 8, 2019

daixiang0 commented Apr 8, 2019

candlerb commented Apr 8, 2019

candlerb commented Apr 8, 2019

candlerb commented Apr 8, 2019

garo commented Apr 9, 2019

candlerb commented Apr 9, 2019

davkal commented Apr 10, 2019

devAcad commented Apr 17, 2019

cyriltovena commented Apr 26, 2019

devAcad commented May 28, 2019

rasple commented Nov 25, 2019

slim-bean commented Nov 26, 2019

Labels vanishing after 30 minutes of inactivity #453

Labels vanishing after 30 minutes of inactivity #453

Comments

candlerb commented Apr 6, 2019 • edited Loading

candlerb commented Apr 6, 2019

candlerb commented Apr 6, 2019 • edited Loading

candlerb commented Apr 7, 2019

daixiang0 commented Apr 8, 2019

candlerb commented Apr 8, 2019

daixiang0 commented Apr 8, 2019

candlerb commented Apr 8, 2019

candlerb commented Apr 8, 2019

candlerb commented Apr 8, 2019

garo commented Apr 9, 2019

candlerb commented Apr 9, 2019

davkal commented Apr 10, 2019

devAcad commented Apr 17, 2019

cyriltovena commented Apr 26, 2019

devAcad commented May 28, 2019

rasple commented Nov 25, 2019

slim-bean commented Nov 26, 2019

candlerb commented Apr 6, 2019 •

edited

Loading

candlerb commented Apr 6, 2019 •

edited

Loading