Docker Memory Input Usage Not Consistent with Docker Engine's stats #10640

mbentley · 2022-02-13T17:01:07Z

Relevant telegraf.conf

# all other defaults are set
[[inputs.docker]]                         
  endpoint = "unix:///var/run/docker.sock"

Logs from Telegraf

# telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d --test --input-filter docker | grep plex
2022-02-13T16:51:23Z I! Starting Telegraf 1.21.3
2022-02-13T16:51:23Z I! Loaded inputs: docker
2022-02-13T16:51:23Z I! Loaded aggregators:
2022-02-13T16:51:23Z I! Loaded processors:
2022-02-13T16:51:23Z W! Outputs are not used in testing mode!
2022-02-13T16:51:23Z I! Tags enabled: host=athena
2022-02-13T16:51:23Z W! [inputs.docker] 'perdevice' setting is set to 'true' so 'blkio' and 'network' metrics will be collected. Please set it to 'false' and use 'perdevice_include' instead to control this behaviour as 'perdevice' will be deprecated
2022-02-13T16:51:23Z W! [inputs.docker] 'total' setting is set to 'false' so 'blkio' and 'network' metrics will not be collected. Please set it to 'true' and use 'total_include' instead to control this behaviour as 'total' will be deprecated
> docker_container_status,container_image=plexinc/pms-docker,container_name=plex,container_status=running,container_version=unknown,engine_host=athena,host=athena,server_version=20.10.12 container_id="64c8abf723d9f3b136e1ffa5841a0b6fa254228e948ced8020e3ef91d3aeba2b",exitcode=0i,finished_at=1644506302938896168i,oomkilled=false,pid=3897902i,started_at=1644507836533416653i,uptime_ns=263249512760295i 1644771086000000000
> docker_container_mem,container_image=plexinc/pms-docker,container_name=plex,container_status=running,container_version=unknown,engine_host=athena,host=athena,server_version=20.10.12 active_anon=540672i,active_file=10407936i,container_id="64c8abf723d9f3b136e1ffa5841a0b6fa254228e948ced8020e3ef91d3aeba2b",inactive_anon=147820544i,inactive_file=8379084800i,limit=8589934592i,max_usage=0i,pgfault=1887028803i,pgmajfault=858i,unevictable=0i,usage=8587341824i,usage_percent=99.96981620788574 1644771086000000000
> docker_container_cpu,container_image=plexinc/pms-docker,container_name=plex,container_status=running,container_version=unknown,cpu=cpu-total,engine_host=athena,host=athena,server_version=20.10.12 container_id="64c8abf723d9f3b136e1ffa5841a0b6fa254228e948ced8020e3ef91d3aeba2b",throttling_periods=1311336i,throttling_throttled_periods=45i,throttling_throttled_time=3417204000i,usage_in_kernelmode=8200527599000i,usage_in_usermode=13830553385000i,usage_percent=4.256972151898735,usage_system=12326356720000000i,usage_total=22031080985000i 1644771086000000000
> docker_container_net,container_image=plexinc/pms-docker,container_name=plex,container_status=running,container_version=unknown,engine_host=athena,host=athena,network=eth0,server_version=20.10.12 container_id="64c8abf723d9f3b136e1ffa5841a0b6fa254228e948ced8020e3ef91d3aeba2b",rx_bytes=796952777i,rx_dropped=0i,rx_errors=0i,rx_packets=5113303i,tx_bytes=45161602957i,tx_dropped=0i,tx_errors=0i,tx_packets=2581821i 1644771086000000000
> docker_container_blkio,container_image=plexinc/pms-docker,container_name=plex,container_status=running,container_version=unknown,device=230:192,engine_host=athena,host=athena,server_version=20.10.12 container_id="64c8abf723d9f3b136e1ffa5841a0b6fa254228e948ced8020e3ef91d3aeba2b",io_service_bytes_recursive_read=11804672i,io_service_bytes_recursive_write=12212419072i 1644771086000000000

System info

Telegraf 1.21.3, Debian 11, Docker 20.10.12

Docker

This isn't specific to this one container but just as an example of how I am running the container:

docker run -d --restart=always --name plex \                                                                    
  --runtime=nvidia \                                                                                            
  --gpus all,capabilities=video \                                                                               
  --no-healthcheck \                                                                                            
  --hostname plex \                                                                                             
  --network macvlan1 \                                                                                          
  --ip 192.168.2.206 \                                                                                          
  -e TZ="US/Eastern" \                                                                                          
  -e PLEX_CLAIM="" \                                                                  
  -e PLEX_UID=500 \                                                                                             
  -e PLEX_GID=1501 \                                                                                            
  -e CUDA_DRIVER_CAPABILITIES="compute,video,utility" \                                                         
  -e NVIDIA_DRIVER_CAPABILITIES="compute,video,utility" \                                                       
  -e NVIDIA_VISIBLE_DEVICES="all" \                                                                             
  --mount type=bind,source=/zfs/apps/plex/database,destination=/config,readonly=false \                         
  --mount type=tmpfs,destination=/transcode,tmpfs-size=10G \                                                    
  --mount type=bind,source=/zfs/storage/media,destination=/data,readonly=false \                  
  --cpus 6 \                                                                                                    
  --memory 8g \                                                                                                 
  --memory-swap 9g \                                                                                            
  plexinc/pms-docker

# docker info
Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Docker Buildx (Docker Inc., v0.7.1-docker)

Server:
 Containers: 52
  Running: 51
  Paused: 0
  Stopped: 1
 Images: 72
 Server Version: 20.10.12
 Storage Driver: overlay2
  Backing Filesystem: xfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: local
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux nvidia runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
 runc version: v1.0.2-0-g52b36a2
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: default
  cgroupns
 Kernel Version: 5.10.0-11-amd64
 Operating System: Debian GNU/Linux 11 (bullseye)
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 125.5GiB
 Name: athena
 ID: BDEV:YXXO:FAGS:OHLJ:N64Q:LQTC:JJDK:TCKM:WN6T:P4HG:LKEQ:HHTE
 Docker Root Dir: /zfs/apps/docker
 Debug Mode: false
 Username: mbentley
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Registry Mirrors:
  https://registry-mirror.casa.mbentley.net/
 Live Restore Enabled: false

Steps to reproduce

Not sure I can specifically reproduce this or know how to trigger it.

Run a container, the memory usage stats do not seem to match what Docker sees:

The container does not seem to be having memory issues:

# docker exec -it plex top -bn1
top - 11:59:17 up 8 days, 23:37,  0 users,  load average: 0.95, 1.00, 1.04
Tasks:   9 total,   1 running,   8 sleeping,   0 stopped,   0 zombie
%Cpu(s): 20.0 us,  0.0 sy,  0.0 ni, 80.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 128494.8 total,  16400.5 free,  95074.0 used,  17020.2 buff/cache
MiB Swap:   4096.0 total,   4032.0 free,     64.0 used.  32140.6 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      1 root      20   0     204      4      0 S   0.0   0.0   0:00.04 s6-svsc+
     39 root      20   0     204      4      0 S   0.0   0.0   0:00.00 s6-supe+
    235 root      20   0     204      4      0 S   0.0   0.0   0:00.00 s6-supe+
    238 plex      20   0 4759960 116800  37820 S   0.0   0.1  78:39.32 PMS Run+
    275 plex      35  15   91604  72204  10880 S   0.0   0.1   5:14.15 Plex Sc+
    316 plex      20   0   40200  10896   9308 S   0.0   0.0   3:50.92 Plex Tu+
  25446 plex      20   0    2564   1008    884 S   0.0   0.0   1:22.84 EasyAud+
  36575 plex      20   0   46264  17788   9292 S   0.0   0.0   8:27.50 Plex Tr+
  36679 root      20   0    7728   3356   2916 R   0.0   0.0   0:00.03 top

Influx query in Grafana:

SELECT mean("usage_percent") FROM "docker_container_mem" WHERE ("container_name" =~ /^$ContainerName$/) AND $timeFilter GROUP BY time($__interval), "container_name" fill(null)

Expected behavior

Stats would match what docker stats shows:

# docker stats plex --no-stream
CONTAINER ID   NAME      CPU %     MEM USAGE / LIMIT   MEM %     NET I/O          BLOCK I/O         PIDS
64c8abf723d9   plex      6.99%     198MiB / 8GiB       2.42%     799MB / 45.4GB   12.2MB / 12.4GB   89

Actual behavior

Additional info

No response

The text was updated successfully, but these errors were encountered:

powersj · 2022-02-14T14:06:34Z

I believe this is fixed with #10491, you could try one of our nightlies or wait for our next bug fix release (this week I believe) and try that out.

Thanks!

mbentley · 2022-02-14T14:09:30Z

Ah shoot, my search-fu failed me 🤦 Sorry and thanks for digging up the PR mentioned, will take a look.

powersj · 2022-02-14T14:18:15Z

Ah shoot, my search-fu failed me facepalm Sorry and thanks for digging up the PR mentioned, will take a look.

No worries! If that doesn't seem to resolve the issue for you, please do let us know.

mbentley added the bug unexpected problem or unintended behavior label Feb 13, 2022

telegraf-tiger bot added the area/nvidia label Feb 13, 2022

powersj closed this as completed Feb 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker Memory Input Usage Not Consistent with Docker Engine's stats #10640

Docker Memory Input Usage Not Consistent with Docker Engine's stats #10640

mbentley commented Feb 13, 2022 •

edited

Loading

powersj commented Feb 14, 2022

mbentley commented Feb 14, 2022 •

edited

Loading

powersj commented Feb 14, 2022

Docker Memory Input Usage Not Consistent with Docker Engine's stats #10640

Docker Memory Input Usage Not Consistent with Docker Engine's stats #10640

Comments

mbentley commented Feb 13, 2022 • edited Loading

Relevant telegraf.conf

System info

Docker

Steps to reproduce

Expected behavior

Actual behavior

Additional info

powersj commented Feb 14, 2022

mbentley commented Feb 14, 2022 • edited Loading

powersj commented Feb 14, 2022

mbentley commented Feb 13, 2022 •

edited

Loading

mbentley commented Feb 14, 2022 •

edited

Loading