psutil version 5.4.7 does not support /proc/diskstats in linux kernels 4.18+ #2707

vail130 · 2018-12-08T13:23:19Z

Here is the offending line in psutil==5.4.7: https://github.com/giampaolo/psutil/blob/0699c04eb20132dc6f15f69e29565c67d264f00e/psutil/_pslinux.py#L1084

Here is the fixed line in master, which is not in a release yet (newest release right now is 5.4.8): https://github.com/giampaolo/psutil/blob/master/psutil/_pslinux.py#L1079

Output of the info page

Getting the status from the agent.

==============
Agent (v6.7.0)
==============

  Status date: 2018-12-08 13:15:11.801329 UTC
  Pid: 346
  Python Version: 2.7.15
  Logs:
  Check Runners: 4
  Log Level: info

  Paths
  =====
    Config File: /etc/datadog-agent/datadog.yaml
    conf.d: /etc/datadog-agent/conf.d
    checks.d: /etc/datadog-agent/checks.d

  Clocks
  ======
    NTP offset: -1.378ms
    System UTC time: 2018-12-08 13:15:11.801329 UTC

  Host Info
  =========
    bootTime: 2018-12-08 12:44:28.000000 UTC
    kernelVersion: 4.19.0-1.el7.elrepo.x86_64
    os: linux
    platform: debian
    platformFamily: debian
    platformVersion: buster/sid
    procs: 72
    uptime: 1m16s
    virtualizationRole: guest
    virtualizationSystem: docker

  Hostnames
  =========
    ec2-hostname: ip-****.ec2.internal
    hostname: gitlab-runner-i-****
    instance-id: i-****
    socket-fqdn: gitlab-autoscaler-utility-****
    socket-hostname: gitlab-autoscaler-utility-****
    hostname provider: configuration

=========
Collector
=========

  Running Checks
  ==============

    cpu
    ---
        Instance ID: cpu [OK]
        Total Runs: 118
        Metric Samples: 6, Total: 702
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 0s


    disk (1.4.0)
    ------------
        Instance ID: disk:9cc7d03419d80578 [ERROR]
        Total Runs: 118
        Metric Samples: 120, Total: 14,160
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 63ms
        Error: not sure how to interpret line ' 259       0 nvme0n1 7088 1 569342 20336 20023 36470 3492694 372963 0 1697310 2068563 0 0 0 0\n'
        Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/base/checks/base.py", line 366, in run
          self.check(copy.deepcopy(self.instances[0]))
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/disk/disk.py", line 43, in check
          self.collect_metrics_psutil()
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/disk/disk.py", line 148, in collect_metrics_psutil
          self.collect_latency_metrics()
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/datadog_checks/disk/disk.py", line 230, in collect_latency_metrics
          for disk_name, disk in psutil.disk_io_counters(True).iteritems():
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/psutil/__init__.py", line 2017, in disk_io_counters
          rawdict = _psplatform.disk_io_counters(**kwargs)
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/psutil/_pslinux.py", line 1117, in disk_io_counters
          for entry in gen:
        File "/opt/datadog-agent/embedded/lib/python2.7/site-packages/psutil/_pslinux.py", line 1090, in read_procfs
          raise ValueError("not sure how to interpret line %r" % line)
      ValueError: not sure how to interpret line ' 259       0 nvme0n1 7088 1 569342 20336 20023 36470 3492694 372963 0 1697310 2068563 0 0 0 0\n'

    docker
    ------
        Instance ID: docker [OK]
        Total Runs: 117
        Metric Samples: 34, Total: 3,978
        Events: 0, Total: 4
        Service Checks: 1, Total: 117
        Average Execution Time : 6ms


    file_handle
    -----------
        Instance ID: file_handle [OK]
        Total Runs: 118
        Metric Samples: 5, Total: 590
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 0s


    io
    --
        Instance ID: io [OK]
        Total Runs: 117
        Metric Samples: 26, Total: 3,024
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 0s


    load
    ----
        Instance ID: load [OK]
        Total Runs: 118
        Metric Samples: 6, Total: 708
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 0s


    memory
    ------
        Instance ID: memory [OK]
        Total Runs: 117
        Metric Samples: 17, Total: 1,989
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 0s


    network (1.7.0)
    ---------------
        Instance ID: network:2a218184ebe03606 [OK]
        Total Runs: 118
        Metric Samples: 20, Total: 2,360
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 0s


    ntp
    ---
        Instance ID: ntp:b4579e02d1981c12 [OK]
        Total Runs: 118
        Metric Samples: 1, Total: 118
        Events: 0, Total: 0
        Service Checks: 1, Total: 118
        Average Execution Time : 0s


    uptime
    ------
        Instance ID: uptime [OK]
        Total Runs: 118
        Metric Samples: 1, Total: 118
        Events: 0, Total: 0
        Service Checks: 0, Total: 0
        Average Execution Time : 0s

========
JMXFetch
========

  Initialized checks
  ==================
    no checks

  Failed checks
  =============
    no checks

=========
Forwarder
=========

  CheckRunsV1: 117
  Dropped: 0
  DroppedOnInput: 0
  Events: 0
  HostMetadata: 0
  IntakeV1: 11
  Metadata: 0
  Requeued: 0
  Retried: 0
  RetryQueueSize: 0
  Series: 0
  ServiceChecks: 0
  SketchSeries: 0
  Success: 245
  TimeseriesV1: 117

  API Keys status
  ===============
    API key ending with **** on endpoint https://app.datadoghq.com: API Key valid

=========
DogStatsD
=========

  Checks Metric Sample: 29,868
  Event: 5
  Events Flushed: 5
  Number Of Flushes: 117
  Series Flushed: 30,705
  Service Check: 1,296
  Service Checks Flushed: 1,402
  Dogstatsd Metric Sample: 1,933

Additional environment details (Operating System, Cloud provider, etc):

AWS EC2 running datadog agent in Docker continer

docker run \
    -e DD_API_KEY=$_dd_api_key \
    -e DD_APM_ENABLED=true \
    -e DD_DOGSTATSD_NON_LOCAL_TRAFFIC=true \
    -e DD_HOSTNAME=$_name-$_instance_id \
    -e DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=false \
    -e DD_LOGS_ENABLED=true \
    -e DD_PROCESS_AGENT_ENABLED=true \
    -e \"DD_TAGS=env:$_environment region:$_region az:$_az app:$_name\" \
    --log-driver=awslogs \
    --log-opt awslogs-group=$_log_group_name \
    --log-opt awslogs-region=$_region \
    --log-opt awslogs-stream=$_instance_id-datadog \
    --name datadog-agent \
    --net host \
    --privileged \
    -v /opt/datadog-agent/conf.d:/conf.d:ro \
    -v /opt/datadog-agent/run:/opt/datadog-agent/run:rw \
    -v /proc/:/host/proc/:ro \
    -v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro \
    -v /var/run/docker.sock:/var/run/docker.sock:ro \
    datadog/agent:$_dd_version

Steps to reproduce the issue:

Upgrade linux to kernel 4.18+
Use datadog disk integration

Describe the results you received:
disk integration error

Describe the results you expected:
No disk integration error

Additional information you deem important (e.g. issue happens only occasionally):

The text was updated successfully, but these errors were encountered:

ofek · 2018-12-10T03:41:57Z

@vail130 Thanks for the heads-up! I've begun watching their repo for releases and when I get a notification I'll update our pinned version straight away.

vail130 · 2019-01-02T20:37:40Z

@ofek I was just taking a look at this issue: giampaolo/psutil#1354

Is there any way to update to the latest git version rather than PyPI? If not, I understand, but they have fixed the issue in master and seem to be very slow to cut a new release.

Thanks!

aerostitch · 2019-01-15T02:31:09Z

I'm guessing the only way to get that fixed in the docker image right now is to build a custom image, just adding sed -i 's#flen == 14:#flen == 14 or flen == 18:#g' /opt/datadog-agent/embedded/lib/python2.7/site-packages/psutil/_pslinux.py before starting the agent right?

There's no plan on the Datadog side to get that fixed sooner than when psutils gets the time to release the new version which would have the patch from over 2 months ago in it right?

Right now it's a bit problematic as we don't have disk monitoring anymore...

rmoriz · 2019-01-18T14:55:16Z

Since we have some machines running 4.20 datadog started to logflood because of this issue. please cut a release with the updated psutil ASAP. Thanks.

KarthikRangaraju · 2020-04-27T21:51:41Z

@ofek I'm still seeing this issue in datadog 7.18.1 docker image.

    disk (2.7.0)
    ------------
      Instance ID: disk:e5dffb8bef24336f [ERROR]
      Configuration Source: file:/etc/datadog-agent/conf.d/disk.d/conf.yaml.default
      Total Runs: 17
      Metric Samples: Last Run: 672, Total: 11,424
      Events: Last Run: 0, Total: 0
      Service Checks: Last Run: 0, Total: 0
      Average Execution Time : 308ms
      Last Execution Date : 2020-04-27 21:20:26.000000 UTC
      Last Successful Execution Date : Never
      Error: not sure how to interpret line '   8       0 sda 45066 9000 10004532 512126 138525639 193915661 3435741544 440462636 0 127530527 368668748 0 0 0 0 0 0\n'
      Traceback (most recent call last):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/base/checks/base.py", line 713, in run
          self.check(instance)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/disk/disk.py", line 121, in check
          self.collect_latency_metrics()
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/datadog_checks/disk/disk.py", line 244, in collect_latency_metrics
          for disk_name, disk in iteritems(psutil.disk_io_counters(True)):
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/psutil/__init__.py", line 2168, in disk_io_counters
          rawdict = _psplatform.disk_io_counters(**kwargs)
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/psutil/_pslinux.py", line 1125, in disk_io_counters
          for entry in gen:
        File "/opt/datadog-agent/embedded/lib/python3.8/site-packages/psutil/_pslinux.py", line 1098, in read_procfs
          raise ValueError("not sure how to interpret line %r" % line)
      ValueError: not sure how to interpret line '   8       0 sda 45066 9000 10004532 512126 138525639 193915661 3435741544 440462636 0 127530527 368668748 0 0 0 0 0 0\n'

root@9eabbd05b5df:/opt/datadog-agent/embedded/lib/python3.8/site-packages# ls | grep psutil
psutil
psutil-5.6.7.dist-info

It uses 5.6.7 ps util and in your PR #3019 you seem to have updated it to 5.5.0. Although I'm at a higher version, I'm still facing this issue.

masci added the integration/disk label Dec 17, 2018

ofek mentioned this issue Jan 23, 2019

Upgrade psutil #3019

Merged

ofek closed this as completed in #3019 Jan 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

psutil version 5.4.7 does not support /proc/diskstats in linux kernels 4.18+ #2707

psutil version 5.4.7 does not support /proc/diskstats in linux kernels 4.18+ #2707

vail130 commented Dec 8, 2018

ofek commented Dec 10, 2018

vail130 commented Jan 2, 2019

aerostitch commented Jan 15, 2019 •

edited

Loading

rmoriz commented Jan 18, 2019

KarthikRangaraju commented Apr 27, 2020

psutil version 5.4.7 does not support /proc/diskstats in linux kernels 4.18+ #2707

psutil version 5.4.7 does not support /proc/diskstats in linux kernels 4.18+ #2707

Comments

vail130 commented Dec 8, 2018

ofek commented Dec 10, 2018

vail130 commented Jan 2, 2019

aerostitch commented Jan 15, 2019 • edited Loading

rmoriz commented Jan 18, 2019

KarthikRangaraju commented Apr 27, 2020

aerostitch commented Jan 15, 2019 •

edited

Loading