Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Label endpoint returns empty despite labels being present #1308

Closed
rasple opened this issue Nov 22, 2019 · 7 comments
Closed

Label endpoint returns empty despite labels being present #1308

rasple opened this issue Nov 22, 2019 · 7 comments

Comments

@rasple
Copy link

rasple commented Nov 22, 2019

Describe the bug
I have loki + promtail + grafana deployed as a stack on a docker swarm with one node (latest images). Promtail scans logs on a volume mounted inside the container and positions.yaml as well as lokis storage is persisted on a mount. When I deploy the stack, for a couple of minutes everything works fine and I can query the logs via grafana and loki. After some time the following error message occurs:

Error connecting to datasource: Data source connected, but no labels received. Verify that Loki and Promtail is configured properly.

Trying to query lokis api for a specific label

curl -G -s "http://<host>:3100/loki/api/v1/label/log_name/values" | jq .

returns the labels as it should.

 {
  "values": [
    "access",
    "error"
  ]
}

However the call that grafana most likely does to show the available labels and their values

curl -G -s "http://<host>:3100/loki/api/v1/label" | jq .

returns

{}

I have done everything in the troubleshooting section relating to this error. Promtail works perfectly until Loki stops serving labels for some reason. Most likely because there are no new logs.

Expected behavior
When I delete positions.yaml and redeploy the stack it works for some time and returns the labels

curl -G -s "http://<host>:3100/loki/api/v1/label" | jq .

returns

{
  "values": [
    "app_name",
    "container_id",
    "filename",
    "hostname",
    "http_method",
    "job",
    "level",
    "log_name",
    "pid",
    "protocol",
    "service_name",
    "stack_name",
    "tid",
    "user_agent"
  ]
}

Maybe I am getting this wrong but since loki persists logs, it should be able to supply them even if there are no logs coming in from promtail at any time.

Environment:
Single node docker swarm on a Ubuntu Server.

Screenshots, Promtail config, or terminal output
promtail-config.yaml excluding pipeline because it works obviously

server:
  http_listen_port: 9080
  grpc_listen_port: 0
  http_listen_host: 0.0.0.0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/api/prom/push
    backoff_config:
      minbackoff: 1s
      maxbackoff: 5s
      maxretries: 10000

scrape_configs:

loki-config.yaml

auth_enabled: false

server:
  http_listen_port: 3100

ingester:
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 5m
  chunk_retain_period: 30s
  max_transfer_retries: 1

schema_config:
  configs:
  - from: 2018-04-15
    store: boltdb
    object_store: filesystem
    schema: v9
    index:
      prefix: index_
      period: 168h

storage_config:
  boltdb:
    directory: /tmp/loki/index

  filesystem:
    directory: /tmp/loki/chunks

limits_config:
  enforce_metric_name: false
  reject_old_samples: false
  #reject_old_samples_max_age: 168h

chunk_store_config:
  max_look_back_period: 30s

table_manager:
  chunk_tables_provisioning:
    inactive_read_throughput: 0
    inactive_write_throughput: 0
    provisioned_read_throughput: 0
    provisioned_write_throughput: 0
  index_tables_provisioning:
    inactive_read_throughput: 0
    inactive_write_throughput: 0
    provisioned_read_throughput: 0
    provisioned_write_throughput: 0
  retention_deletes_enabled: false
  retention_period: 0

docker-stack.yml (hostnames removed for privacy)

version: "3.4"

services:
  loki:
    image: private-dtr/repo/loki:latest
    volumes:
      - /mnt/loki_data_monitoring:/tmp/loki
      - ./loki-config.yaml:/etc/loki/loki-config.yaml:ro
    ports:
      - "3100:3100"
    command: -config.file=/etc/loki/loki-config.yaml
    networks:
      - loki

  grafana:
    image: private-dtr/repo/grafana:latest
    #ports:
    #  - "3000:3000"
    depends_on:
      - influxdb
      - loki
    environment:
      - GF_SERVER_ROOT_URL=http://<host>/grafana
    volumes:
      - /mnt/grafana_config_monitoring/:/var/lib/grafana/
    deploy:
      labels:
          - "traefik.frontend.rule=PathPrefixStrip:/grafana"
          - "traefik.port=3000"
          - "traefik.backend=grafana"
          - "traefik.docker.network=traefik-net"
    networks:
      - loki
      - influxdb
      - traefik-net

  promtail:
      image: private-dtr/repo/promtail:latest
      depends_on:
        - loki
      ports:
        - "9080:9080"
      volumes:
        - /var/log:/var/log:ro
        - /var/lib/docker/containers/:/var/lib/docker/containers/:ro
        - /var/lib/docker/volumes/:/var/lib/docker/volumes/:ro
        - ./promtail-config.yaml:/etc/promtail/promtail-config.yaml:ro
        - /tmp:/tmp
      command: -config.file=/etc/promtail/promtail-config.yaml -log.level=debug
      deploy:
        mode: global
      networks:
        - loki
        - traefik-net
networks:
  influxdb:
  loki:
  traefik-net:
    external: true
@rasple
Copy link
Author

rasple commented Nov 25, 2019

Possible duplicate of #453 though I thought this was fixed a long time ago.

@slim-bean
Copy link
Collaborator

This is interesting, the problem seems to be the slow volume of logs in relation to how Grafana does a healthcheck on Loki. I'm guessing others haven't seen this as they have some volume of logs continue to trickle in.

Ultimately we should probably find a better way for Grafana to do a health check on Loki rather than running a label query.

For now I think you might need something that logs at least every few minutes to keep Grafana happy.

@rasple
Copy link
Author

rasple commented Nov 26, 2019

Thank you for your explanation. This should definitely changed in either Loki or Grafana as it causes a vanilla setup to fail if there are not enough (recent) logs. I will close this as a duplicate of #453.

@rasple rasple closed this as completed Nov 26, 2019
@slim-bean
Copy link
Collaborator

@davkal what are your thoughts on changing how Grafana does a healthcheck on Loki? Should Loki add a specific endpoint for this?

@slim-bean slim-bean reopened this Nov 26, 2019
@Mortega5
Copy link

I think the problem is related to this. But version 1.0.0 solves the problem because the API always responds with the __name__ tag.

@davkal
Copy link
Contributor

davkal commented Dec 11, 2019

Fixed via grafana/grafana#20971

@davkal davkal closed this as completed Dec 11, 2019
@davkal
Copy link
Contributor

davkal commented Dec 11, 2019

The health check should be cheap, but confer if things are operating nominally. For prometheus we use 1 as a constant query and call it a day. But for Loki we decided to use the labels API to make sure that loki instance is in a state where it's useful. The idea being, if you don't have labels you're gonna have a bad time in Grafana.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants