Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ProxySQL] Metrics lost after proxysql restart #7629

Closed
reyiyo opened this issue Sep 19, 2020 · 2 comments
Closed

[ProxySQL] Metrics lost after proxysql restart #7629

reyiyo opened this issue Sep 19, 2020 · 2 comments

Comments

@reyiyo
Copy link
Contributor

reyiyo commented Sep 19, 2020

After a ProxySQL restart, metrics are lost until an agent restart is made.

Output of the info page

Before ProxySQL restart:

Getting the status from the agent.

===============
Agent (v7.22.0)
===============

  Status date: 2020-09-19 08:59:01.097131 -04
  Agent start: 2020-09-19 08:55:18.206970 -04
  Pid: 386
  Go Version: go1.13.11
  Python Version: 3.8.5
  Build arch: amd64
  Agent flavor: agent
  Check Runners: 4
  Log Level: info

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    System UTC time: 2020-09-19 08:59:01.097131 -04

  Host Info
  =========
    bootTime: 2020-07-18 17:51:15.000000 -04
    kernelArch: x86_64
    kernelVersion: 4.15.0-1073-aws
    os: linux
    platform: debian
    platformFamily: debian
    platformVersion: bullseye/sid
    procs: 143
    uptime: 1503h4m6s
    virtualizationRole: guest
    virtualizationSystem: docker

  Hostnames
  =========
    ec2-hostname: ******.ec2.internal
    hostname: i-******
    instance-id: i-******
    socket-fqdn: i-******-******.
    socket-hostname: i-******-******
    host tags:
      cloud_provider:aws
    hostname provider: aws
    unused hostname providers:
      configuration/environment: hostname is empty
      gce: unable to retrieve hostname from GCE: status code 404 trying to GET http://169.254.169.254/computeMetadata/v1/instance/hostname

  Metadata
  ========
    cloud_provider: AWS
    hostname_source: aws

=========
Collector
=========

  Running Checks
  ==============

    cpu
    ---
      Instance ID: cpu [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
      Total Runs: 14
      Metric Samples: Last Run: 7, Total: 92
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2020-09-19 08:58:50.000000 -04
      Last Successful Execution Date : 2020-09-19 08:58:50.000000 -04


    disk (2.10.1)
    -------------
      Instance ID: disk:8fcf256b1be7c58a [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/disk.yaml
      Total Runs: 15
      Metric Samples: Last Run: 84, Total: 1,260
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 20ms
      Last Execution Date : 2020-09-19 08:58:51.000000 -04
      Last Successful Execution Date : 2020-09-19 08:58:51.000000 -04


    docker
    ------
      Instance ID: docker [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/docker.d/conf.yaml.default
      Total Runs: 15
      Metric Samples: Last Run: 188, Total: 2,820
      Events: Last Run: 0, Total: 8
      Service Checks: Last Run: 1, Total: 15
      Average Execution Time : 23ms
      Last Execution Date : 2020-09-19 08:58:57.000000 -04
      Last Successful Execution Date : 2020-09-19 08:58:57.000000 -04


    file_handle
    -----------
      Instance ID: file_handle [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
      Total Runs: 14
      Metric Samples: Last Run: 5, Total: 70
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2020-09-19 08:58:49.000000 -04
      Last Successful Execution Date : 2020-09-19 08:58:49.000000 -04


    io
    --
      Instance ID: io [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
      Total Runs: 15
      Metric Samples: Last Run: 104, Total: 1,488
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2020-09-19 08:58:56.000000 -04
      Last Successful Execution Date : 2020-09-19 08:58:56.000000 -04


    load
    ----
      Instance ID: load [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
      Total Runs: 14
      Metric Samples: Last Run: 6, Total: 84
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2020-09-19 08:58:48.000000 -04
      Last Successful Execution Date : 2020-09-19 08:58:48.000000 -04


    memory
    ------
      Instance ID: memory [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
      Total Runs: 15
      Metric Samples: Last Run: 18, Total: 270
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2020-09-19 08:58:55.000000 -04
      Last Successful Execution Date : 2020-09-19 08:58:55.000000 -04


    network (1.17.0)
    ----------------
      Instance ID: network:e0204ad63d43c949 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/network.d/conf.yaml.default
      Total Runs: 14
      Metric Samples: Last Run: 31, Total: 434
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 2ms
      Last Execution Date : 2020-09-19 08:58:47.000000 -04
      Last Successful Execution Date : 2020-09-19 08:58:47.000000 -04


    ntp
    ---
      Instance ID: ntp:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
      Total Runs: 1
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 1
      Average Execution Time : 20.026s
      Last Execution Date : 2020-09-19 08:55:41.000000 -04
      Last Successful Execution Date : 2020-09-19 08:55:41.000000 -04


    proxysql (1.2.1)
    ----------------
      Instance ID: proxysql:930ef00729a3325b [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/proxysql.d/conf.yaml
      Total Runs: 15
      Metric Samples: Last Run: 853, Total: 12,795
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 7, Total: 105
      Average Execution Time : 68ms
      Last Execution Date : 2020-09-19 08:58:58.000000 -04
      Last Successful Execution Date : 2020-09-19 08:58:58.000000 -04
      metadata:
        version.build: g58a909a0
        version.major: 2
        version.minor: 0
        version.patch: 12
        version.raw: 2.0.12-38+g58a909a0
        version.release: 38
        version.scheme: semver


    uptime
    ------
      Instance ID: uptime [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
      Total Runs: 15
      Metric Samples: Last Run: 1, Total: 15
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2020-09-19 08:58:54.000000 -04
      Last Successful Execution Date : 2020-09-19 08:58:54.000000 -04

========
JMXFetch
========

  Initialized checks
  ==================
    no checks

  Failed checks
  =============
    no checks

=========
Forwarder
=========

  Transactions
  ============
    CheckRunsV1: 14
    Connections: 0
    Containers: 0
    Deployments: 0
    Dropped: 0
    DroppedOnInput: 0
    Events: 0
    HostMetadata: 0
    IntakeV1: 3
    Metadata: 0
    Pods: 0
    Processes: 0
    RTContainers: 0
    RTProcesses: 0
    ReplicaSets: 0
    Requeued: 0
    Retried: 0
    RetryQueueSize: 0
    Series: 0
    ServiceChecks: 0
    Services: 0
    SketchSeries: 0
    Success: 31
    TimeseriesV1: 14

  API Keys status
  ===============
    API key ending with 1ac47: API Key valid

==========
Endpoints
==========
  http://***** - API Key ending with:
      - 1ac47

==========
Logs Agent
==========


  Logs Agent is not running

=========
APM Agent
=========
  Status: Not running or unreachable on localhost:8126.
  Error: Get http://localhost:8126/debug/vars: dial tcp 127.0.0.1:8126: connect: connection refused

=========
Aggregator
=========
  Checks Metric Sample: 19,622
  Dogstatsd Metric Sample: 491
  Event: 9
  Events Flushed: 9
  Number Of Flushes: 14
  Series Flushed: 17,031
  Service Check: 268
  Service Checks Flushed: 267

=========
DogStatsD
=========
  Event Packets: 0
  Event Parse Errors: 0
  Metric Packets: 490
  Metric Parse Errors: 0
  Service Check Packets: 0
  Service Check Parse Errors: 0
  Udp Bytes: 87,134
  Udp Packet Reading Errors: 0
  Udp Packets: 226
  Uds Bytes: 0
  Uds Origin Detection Errors: 0
  Uds Packet Reading Errors: 0
  Uds Packets: 0

After ProxySQL restart:

===============
Agent (v7.22.0)
===============

  Status date: 2020-09-19 09:04:30.733853 -04
  Agent start: 2020-09-19 08:55:18.206970 -04
  Pid: 386
  Go Version: go1.13.11
  Python Version: 3.8.5
  Build arch: amd64
  Agent flavor: agent
  Check Runners: 4
  Log Level: info

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    System UTC time: 2020-09-19 09:04:30.733853 -04

  Host Info
  =========
    bootTime: 2020-07-18 17:51:15.000000 -04
    kernelArch: x86_64
    kernelVersion: 4.15.0-1073-aws
    os: linux
    platform: debian
    platformFamily: debian
    platformVersion: bullseye/sid
    procs: 143
    uptime: 1503h4m6s
    virtualizationRole: guest
    virtualizationSystem: docker

  Hostnames
  =========
    ec2-hostname: ******.ec2.internal
    hostname: i-******
    instance-id: i-******
    socket-fqdn: i-******-******.
    socket-hostname: i-******-******
    host tags:
      cloud_provider:aws
    hostname provider: aws
    unused hostname providers:
      configuration/environment: hostname is empty
      gce: unable to retrieve hostname from GCE: status code 404 trying to GET http://169.254.169.254/computeMetadata/v1/instance/hostname

  Metadata
  ========
    cloud_provider: AWS
    hostname_source: aws

=========
Collector
=========

  Running Checks
  ==============

    cpu
    ---
      Instance ID: cpu [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/cpu.d/conf.yaml.default
      Total Runs: 36
      Metric Samples: Last Run: 7, Total: 246
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2020-09-19 09:04:20.000000 -04
      Last Successful Execution Date : 2020-09-19 09:04:20.000000 -04


    disk (2.10.1)
    -------------
      Instance ID: disk:8fcf256b1be7c58a [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/disk.yaml
      Total Runs: 37
      Metric Samples: Last Run: 84, Total: 3,108
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 18ms
      Last Execution Date : 2020-09-19 09:04:21.000000 -04
      Last Successful Execution Date : 2020-09-19 09:04:21.000000 -04


    docker
    ------
      Instance ID: docker [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/docker.d/conf.yaml.default
      Total Runs: 37
      Metric Samples: Last Run: 188, Total: 6,932
      Events: Last Run: 0, Total: 11
      Service Checks: Last Run: 1, Total: 37
      Average Execution Time : 20ms
      Last Execution Date : 2020-09-19 09:04:27.000000 -04
      Last Successful Execution Date : 2020-09-19 09:04:27.000000 -04


    file_handle
    -----------
      Instance ID: file_handle [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/file_handle.d/conf.yaml.default
      Total Runs: 36
      Metric Samples: Last Run: 5, Total: 180
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2020-09-19 09:04:19.000000 -04
      Last Successful Execution Date : 2020-09-19 09:04:19.000000 -04


    io
    --
      Instance ID: io [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/io.d/conf.yaml.default
      Total Runs: 37
      Metric Samples: Last Run: 104, Total: 3,776
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2020-09-19 09:04:26.000000 -04
      Last Successful Execution Date : 2020-09-19 09:04:26.000000 -04


    load
    ----
      Instance ID: load [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/load.d/conf.yaml.default
      Total Runs: 36
      Metric Samples: Last Run: 6, Total: 216
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2020-09-19 09:04:18.000000 -04
      Last Successful Execution Date : 2020-09-19 09:04:18.000000 -04


    memory
    ------
      Instance ID: memory [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/memory.d/conf.yaml.default
      Total Runs: 37
      Metric Samples: Last Run: 18, Total: 666
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2020-09-19 09:04:25.000000 -04
      Last Successful Execution Date : 2020-09-19 09:04:25.000000 -04


    network (1.17.0)
    ----------------
      Instance ID: network:e0204ad63d43c949 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/network.d/conf.yaml.default
      Total Runs: 36
      Metric Samples: Last Run: 31, Total: 1,116
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 1ms
      Last Execution Date : 2020-09-19 09:04:17.000000 -04
      Last Successful Execution Date : 2020-09-19 09:04:17.000000 -04


    ntp
    ---
      Instance ID: ntp:d884b5186b651429 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/ntp.d/conf.yaml.default
      Total Runs: 1
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 1
      Average Execution Time : 20.026s
      Last Execution Date : 2020-09-19 08:55:41.000000 -04
      Last Successful Execution Date : 2020-09-19 08:55:41.000000 -04


    proxysql (1.2.1)
    ----------------
      Instance ID: proxysql:6ff7fe576dd01d77 [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/proxysql.d/conf.yaml
      Total Runs: 11
      Metric Samples: Last Run: 0, Total: 0
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 1, Total: 11
      Average Execution Time : 92ms
      Last Execution Date : 2020-09-19 09:04:16.000000 -04
      Last Successful Execution Date : 2020-09-19 09:04:16.000000 -04


    uptime
    ------
      Instance ID: uptime [OK]
      Configuration Source: file:/etc/datadog-agent/conf.d/uptime.d/conf.yaml.default
      Total Runs: 37
      Metric Samples: Last Run: 1, Total: 37
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 0s
      Last Execution Date : 2020-09-19 09:04:24.000000 -04
      Last Successful Execution Date : 2020-09-19 09:04:24.000000 -04

========
JMXFetch
========

  Initialized checks
  ==================
    no checks

  Failed checks
  =============
    no checks

=========
Forwarder
=========

  Transactions
  ============
    CheckRunsV1: 36
    Connections: 0
    Containers: 0
    Deployments: 0
    Dropped: 0
    DroppedOnInput: 0
    Events: 0
    HostMetadata: 0
    IntakeV1: 8
    Metadata: 0
    Pods: 0
    Processes: 0
    RTContainers: 0
    RTProcesses: 0
    ReplicaSets: 0
    Requeued: 0
    Retried: 0
    RetryQueueSize: 0
    Series: 0
    ServiceChecks: 0
    Services: 0
    SketchSeries: 0
    Success: 80
    TimeseriesV1: 36

  API Keys status
  ===============
    API key ending with 1ac47: API Key valid

==========
Endpoints
==========
  http://****** - API Key ending with:
      - 1ac47

==========
Logs Agent
==========


  Logs Agent is not running

=========
APM Agent
=========
  Status: Not running or unreachable on localhost:8126.
  Error: Get http://localhost:8126/debug/vars: dial tcp 127.0.0.1:8126: connect: connection refused

=========
Aggregator
=========
  Checks Metric Sample: 46,864
  Dogstatsd Metric Sample: 1,326
  Event: 12
  Events Flushed: 12
  Number Of Flushes: 36
  Series Flushed: 35,510
  Service Check: 650
  Service Checks Flushed: 679

=========
DogStatsD
=========
  Event Packets: 0
  Event Parse Errors: 0
  Metric Packets: 1,325
  Metric Parse Errors: 0
  Service Check Packets: 0
  Service Check Parse Errors: 0
  Udp Bytes: 233,410
  Udp Packet Reading Errors: 0
  Udp Packets: 606
  Uds Bytes: 0
  Uds Origin Detection Errors: 0
  Uds Packet Reading Errors: 0
  Uds Packets: 0

Additional environment details (Operating System, Cloud provider, etc):

Steps to reproduce the issue:

  1. Have ProxySQL and datadog agent running and reporting metrics.
  2. Restart ProxySQL

Describe the results you received:
Metrics are not collected any more

Describe the results you expected:
Metric collection should restart

Additional information you deem important (e.g. issue happens only occasionally):

@hithwen
Copy link
Contributor

hithwen commented Sep 28, 2020

Hallo @reyiyo,can you provide us the output of datadog-agent check proxysql after the restart? Another question is if this is running in a containers environment

@FlorianVeaux
Copy link
Member

FlorianVeaux commented Oct 13, 2020

Hello @reyiyo
Circling back here as we found a cause for your issue that we fixed with #7750
In short, the "queries" performed by the proxysql integration were shared between check instances (an instance being a configuration stanza). This issue was unnoticed as sharing the queries between the check instances is not a problem as long as all instances are still considered "valid" by the agent.
When using autodiscovery, instances are created/deleted on the fly. So after a restart of the proxysql container, a new instance of the check is created but this instance tries to report metrics using the initial check instance (that was removed and is invalid). This is unexpected behavior and led to the agent not submitting the metrics that were collected,

The fix will be included in the next version of the datadog-agent (v7.24.0 and v6.24.0), but if you would like to test before that, you can use the following Dockerfile:

FROM datadog/agent:7.23.0
RUN pip --disable-pip-version-check install datadog_checks_base==15.0.0
CMD ["/init"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants