Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No labels after restarting loki 1.3.0 #1858

Closed
Mario-Hofstaetter opened this issue Mar 27, 2020 · 21 comments
Closed

No labels after restarting loki 1.3.0 #1858

Mario-Hofstaetter opened this issue Mar 27, 2020 · 21 comments

Comments

@Mario-Hofstaetter
Copy link
Contributor

Mario-Hofstaetter commented Mar 27, 2020

Describe the bug
After restarting loki, previously existing labels are not shown anymore using label_values / labels.
Old logmessage can still be retrieved though. Tested using v1.3.0 AND current docker image grafana/loki:latest 398b0bc8f72b

To Reproduce

[EDIT 2020-03-30: See below for second description and debug logs]

The issues has something to do with my logdata, I was able to simplify it.

Via docker:

wget https://gist.githubusercontent.com/Mario-Hofstaetter/d47d1194e9b3eeb11874f880a69fedb9/raw/fdabae0861589fa3aa541dc105de45ba71fa1a18/docker-compose.yaml
# this is modified from https://github.com/grafana/loki/blob/v1.3.0/production/docker-compose.yaml

wget https://gist.github.com/Mario-Hofstaetter/d47d1194e9b3eeb11874f880a69fedb9/raw/fdabae0861589fa3aa541dc105de45ba71fa1a18/docker-config.yaml
wget https://gist.github.com/Mario-Hofstaetter/d47d1194e9b3eeb11874f880a69fedb9/raw/fdabae0861589fa3aa541dc105de45ba71fa1a18/lorem.exe.log

# pull and start
docker-compose pull && docker-compose up -d loki promtail

loki is running and recevied logs. Using logcli 1.3.0: logcli labels:

http://localhost:3100/loki/api/v1/labels
__name__
filename
instance
job
level
logger
sitename

The labels are correctly shown.

# Now restart loki and query again
docker-compose restart -t 30 loki

After restart all labels but __name__ are missing, logcli labels:

http://localhost:3100/loki/api/v1/labels
__name__

This is also visible in grafana explorer:
loki-no-labels

Expected behavior
Restarting loki should not have any effect on persisted data.

Environment:

  • Loki v1.3.0 on Windows 10 1903
  • Also described above on current docker images running on CentOS 7

Related issue?
#453 #521
However is does not happen after a certain time, but immediately when restarting loki

EDIT 2020-03-30

Configuration, including the debug-logs:

https://github.com/Mario-Hofstaetter/demo-loki-issue-1858.git

asciinema demo of the steps including UTC timestamps:

https://asciinema.org/a/V4CF7850IU0bQ5LbB7EMsvwFP

Note: I have an alias dc for docker-compose
This ran using the latest docker images, after pulling, that is:

grafana/promtail          latest                   3e26c1100beb        4 hours ago         135MB
grafana/loki              latest                   398b0bc8f72b        3 days ago          47.8MB

@cyriltovena
Copy link
Contributor

cyriltovena commented Mar 28, 2020 via email

@cyriltovena
Copy link
Contributor

Can you try with a bigger restart timeout something like 600 ?

@Mario-Hofstaetter
Copy link
Contributor Author

Can you try with a bigger restart timeout something like 600 ?

Actually the container is stopping within < 5 seconds, the 30 sec is already extra.

@cyriltovena
Copy link
Contributor

I'm not able to reproduce:

~/go/src/github.com/grafana/loki/production master
❯ docker-compose restart -t 600  loki
Restarting production_loki_1 ... done

~/go/src/github.com/grafana/loki/production master
❯ logcli labels  --addr="http://localhost:3100"
http://localhost:3100/loki/api/v1/labels
__name__
component
level
service

@cyriltovena
Copy link
Contributor

cyriltovena commented Mar 28, 2020

If you have more informations let us know. This is weird. FYI You can't switch between 1.3 and latest (same for the other way around). This will cause issues, because they use different schemas.

You should activate log debug and let us know if you see anything there on Loki. --log.level=debug

@Mario-Hofstaetter
Copy link
Contributor Author

Did you use my example logfile? because your labels are different than mine.
It has something to do with my data, the only "special" thing there is the label logger that is used there.

Otherwise i will provide an more complete example, including debug logs.

@Mario-Hofstaetter
Copy link
Contributor Author

If I keep the job from the example:

  - job_name: system
    static_configs:
    - targets:
        - localhost
      labels:
        job: varlogs
        __path__: /var/log/*log

It will look like this after restarting loki.

logcli labels

http://localhost:3100/loki/api/v1/labels
__name__
filename
job

logcli labels job

http://localhost:3100/loki/api/v1/label/job/values
varlogs

So just the labels from - job_name: problemjob are disappearing.

Also I just noticed my logger label violates the prometheus label naming guidlines.
https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels

I have logger label with values containing dots such as

DATA.API.DataConnection.Trace

Could that be an issue? Label sitename is just MySite1 though

@cyriltovena
Copy link
Contributor

If the labels is rejected you'll have an error in the logs.

@cyriltovena
Copy link
Contributor

Can you share logs before and after restart please ?

@previ
Copy link

previ commented Mar 29, 2020

I do have a similar problem: logs are not available after a loki restart.
Also, logs from a pod which produce logs rarely, seems to be unavailable after exactly 5'.

My configuration:

  • loki version: 1.3.0
  • installed in k8s cluster via helm charts
  • store: cassandra
  • object_store: azure blob
  • auth_enabled: true (tenant defined by k8s namespaces)

full configuration here: https://gist.github.com/previ/63d8fd4f52925ab0a0c752e6df5554a6

@Mario-Hofstaetter
Copy link
Contributor Author

I just noticed something.
If I change my logs to have a timestamp of about NOW, the labels are available after restarting loki.
My original logfile had timestamps of around 2020-03-25 15:41:10 UTC

What timerange is used for getting lables? Does loki look at ALL logs stored, or just what the timerang currently requrested by grafana?

@Mario-Hofstaetter
Copy link
Contributor Author

Please see attached video.
Logs are shown, labels are shown (timestamps from 2020-03-25)

After refreshing grafana (F5) the labels are not shown in explorer, but logmessages are.

After repeatedly exactly 30 seconds, labels are shown again in grafana.
Of course this is an issue when used in dashboard templating variables (using Refresh on Time Range Change does not help)

labels-gone-appearing-30sec.mp4.zip

@cyriltovena
Copy link
Contributor

@Mario-Hofstaetter Labels query use time range; start and end, by default last hour. So if you don't have log in the last hour, we don't show any labels. If you increase the time range and refresh the page it will do a larger label query.

@Mario-Hofstaetter
Copy link
Contributor Author

Ok but that does not explain the behavior of grafana explore in the video attached above?

It does explain why logcli confused me too. After ingestion logcli labels does return labels that are NOT within the 1 hour window, it apears the information is still in RAM?
After restarting loki, logcli then does not return any lables any more since I worked with logs a few days old.

@previ
Copy link

previ commented Apr 11, 2020

In my case, the problem was related to azure blob. It seems like loki was not able to read the blobs back.
I configured loki to use s3 storage (via minio as adapter for azure blob) ant it works.
The problem with azure blob might be related to this issue in the cortex parent project, which was resolved with this pr.

@slim-bean
Copy link
Collaborator

Thanks for the follow up @previ

Loki v1.4.0 should have vendored a new enough version of Cortex to include this fix.

I'm going to close this issue, however, if you try Loki 1.4.0 against azure and still have issues please comment on this and we will reopen it!

@previ
Copy link

previ commented Apr 16, 2020

I can confirm that loki 1.4.0 works with azure blobs, thanks!

@vsile
Copy link

vsile commented Aug 20, 2020

I have the same issue with loki-docker-driver:1.6.0 labels (compose_project, compose_service, container_name and host).

loki-docker-driver-labels-error-2020-08-20_12.34.15.mp4.zip

Grafana 7.1.3
Loki 1.6.0

@MrAmbiG
Copy link

MrAmbiG commented Jan 28, 2021

same problem with loki or loki-stack latest version as of today.

@MaestroJurko
Copy link

MaestroJurko commented Aug 11, 2022

I have the same issue as well. I am using minio and I see data in /fake folder that are older but it returns nothing when trying to get logs from before the restart. What could be the reason for it?

My configuration:

      compactor:
        retention_enabled: true
      limits_config:
        max_cache_freshness_per_query: 10m
        # global retention period
        retention_period: 24h
        # per selector retention period
        retention_stream:
          - selector: '{app="appname"}'
            priority: 1
            period: 744h
      common:
        storage:
          filesystem: null
          s3:
            endpoint: minio.monitoring.svc.cluster.local:9000
            insecure: true
            bucketnames: loki-data
            access_key_id: loki
            secret_access_key: supersecret
            s3forcepathstyle: true
      schema_config:
        configs:
          - from: "2020-09-07"
            store: boltdb-shipper
            object_store: s3
            schema: v11
            index:
              period: 24h
              prefix: loki_index_

@southquist
Copy link

@puppeteer701

Did you ever figure this out? I'm having a similar issue using loki with s3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants