SMART does not separately tag sub-devices under a single device (megaraid) #6284

mattster98 · 2019-08-20T13:08:05Z

Relevant telegraf.conf:

devices = [ "/dev/bus/0 -d megaraid,0", "/dev/bus/0 -d megaraid,1" ]

System info:

Telegraf 1.11.4 (git: HEAD d9ca76e)
Linux r820 4.15.0-48-generic #51-Ubuntu SMP Wed Apr 3 08:28:49 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-48-generic] (local build)

Steps to reproduce:

Issue SMART does not found device behind Raid-Controller #4881 prevents SMART from automatically including the devices underneath a raid controller.
Manually list the devices so that they are included.
Since the top-level device (/dev/bus/0) is the same, the values gathered for the two sub-devices get rolled up under the same device tag, making it impossible to report on them separately.

Expected behavior:

The two distinct devices would get somehow tagged to make them distinct. "0-0" and "0-1" for my given config snippet, for example.

Actual behavior:

The only device that shows up is "0", and as best I can tell, the values for both devices are recorded against that device.

Additional info:

I tried this workaround but haven't had any luck. Either my syntax is wrong or it's ignoring the tag specified.

[[inputs.smart]]
   path = "/usr/sbin/smartctl"
   use_sudo = true
   attributes = false
   devices = [ "/dev/nvme0n1 -d nvme" ]

[[inputs.smart]]
   path = "/usr/sbin/smartctl"
   use_sudo = true
   attributes = false
   devices = [ "/dev/bus/0 -d megaraid,0" ]
   [inputs.smart.tags]
      device = "0"

[[inputs.smart]]
   path = "/usr/sbin/smartctl"
   use_sudo = true
   attributes = false
   devices = [ "/dev/bus/0 -d megaraid,1" ]
   [inputs.smart.tags]
      device = "1"

The text was updated successfully, but these errors were encountered:

mattster98 · 2019-08-20T13:23:02Z

My influxdbsql is weak, but I hacked together what I think confirms that it's recording both values against the same tag:

> select "temp_c" from "smart_device" where "device" =~ /^(0)$/ and time >= now() - 2m;
name: smart_device
time                temp_c
----                ------
1566307230000000000 25
1566307230000000000 26
1566307240000000000 25
1566307240000000000 26
1566307250000000000 25
1566307250000000000 26
1566307260000000000 25
1566307260000000000 26
1566307270000000000 26
1566307270000000000 26
1566307280000000000 26
1566307280000000000 26
1566307290000000000 25
1566307290000000000 26
>

danielnelson · 2019-08-21T00:22:56Z

Easiest way to check how we are recording the values is with:

telegraf --input-filter smart --test

Once the values hit the database the later data will overwrite earlier if the measurement+tagset+field+timestamp is the same, you can only have one value for each combination of these. Since above there are 2 values for temp_c at the same timestamp, we know there must be at least one tag that differs.

With the workaround try grouping by disk and device, or for debugging it can be useful to group by '*':

select temp_c from smart_device where time >= now() - 2m group by *;

mattster98 · 2019-08-22T13:52:46Z

That's super helpful - thanks! Yes, the serial number and WWN are unique which explains why there's two separate values for the same timestamp.

mattster98 · 2019-08-22T14:08:09Z

Interesting side-effect, again maybe due to bad syntax on my part, but when I use the above config (added attributes=true for the nvme), it just reports the last device in the config file three times rather than reporting three distinct devices. It does add the "disk" tag to two of the three! If I reorder them, the last one is reported 3 times.

Is this a separate bug?

[serial numbers replaced with dashes]

$ sudo telegraf --input-filter smart --test | grep -i temp                                                    [9:58:24]
2019-08-22T13:58:27Z I! Starting Telegraf 1.11.4
2019-08-22T13:58:27Z I! Using config file: /etc/telegraf/telegraf.conf
> smart_attribute,device=nvme0n1,disk=0,host=r820,id=194,name=Temperature_Celsius,serial_no=-------- raw_value=31i 1566482308000000000
> smart_device,device=nvme0n1,disk=0,host=r820,model=INTEL\ SSDPEDME016T4S,serial_no=---------------- exit_status=0i,health_ok=true,temp_c=31i 1566482308000000000
> smart_attribute,device=nvme0n1,disk=1,host=r820,id=194,name=Temperature_Celsius,serial_no=------------------ raw_value=31i 1566482308000000000
> smart_device,device=nvme0n1,disk=1,host=r820,model=INTEL\ SSDPEDME016T4S,serial_no=----------- exit_status=0i,health_ok=true,temp_c=31i 1566482308000000000
> smart_attribute,device=nvme0n1,host=r820,id=194,name=Temperature_Celsius,serial_no=------------------- raw_value=31i 1566482308000000000
> smart_device,device=nvme0n1,host=r820,model=INTEL\ SSDPEDME016T4S,serial_no=-------------- exit_status=0i,health_ok=true,temp_c=31i 1566482308000000000

danielnelson · 2019-08-23T05:00:30Z

It seems like another bug, though I'm not able to reproduce this on my, all SATA, system:

[[inputs.smart]]
  devices = ["/dev/sda"]
  attributes = false
  use_sudo = true

[[inputs.smart]]
  devices = ["/dev/sdb"]
  attributes = false
  use_sudo = true
  [inputs.smart.tags]
    disk = "0"

[[inputs.smart]]
  devices = ["/dev/sdc"]
  attributes = false
  use_sudo = true
  [inputs.smart.tags]
    disk = "1"

> smart_device,capacity=500107862016,device=sda,enabled=Enabled,host=loki,model=Samsung\ SSD\ 850\ EVO\ 500GB,serial_no=S21HNXAGB00873F,wwn=5002538d4075fd17 exit_status=0i,health_ok=true,temp_c=27i,udma_crc_errors=0i 1566536286000000000
> smart_device,capacity=500107862016,device=sdb,disk=0,enabled=Enabled,host=loki,model=Samsung\ SSD\ 850\ EVO\ 500GB,serial_no=S2RANB0J505626W,wwn=5002538d4200c738 exit_status=0i,health_ok=true,temp_c=28i,udma_crc_errors=0i 1566536286000000000
> smart_device,capacity=640135028736,device=sdc,disk=1,enabled=Enabled,host=loki,model=WDC\ WD6400AAKS-00A7B2,serial_no=WD-WMASY7276305,wwn=50014ee0abd36d7e exit_status=0i,health_ok=true,read_error_rate=0i,seek_error_rate=0i,temp_c=34i,udma_crc_errors=0i 1566536286000000000

mattster98 · 2019-08-23T15:14:50Z

Interesting.. I'm able to reproduce this problem on both of my Dell servers. One running 1.7.2 and the other 1.11.4.

Neither are just plain SATA. One is built-in RAID, plus the PCIe SSD, and the other has HBA interfaces to disk arrays and whatnot. Not sure how that would affect the output when the plugin is run in parallel though.

I'll file a separate bug.

glinton · 2019-08-26T21:35:07Z

@mattster98 can you test this with a nightly build, I'm wondering if the cause of this is similar to your other bug.

reimda · 2022-08-08T18:41:29Z

Hi @mattster98 is this still a problem for you? Is it still happening with recent releases of telegraf?

mattster98 · 2022-08-08T19:42:09Z

This does appear to be breaking out by serial number now. Thanks! Tested with 1.23.3

danielnelson added area/smart bug unexpected problem or unintended behavior labels Aug 21, 2019

reimda added the waiting for response waiting for response from contributor label Aug 8, 2022

mattster98 closed this as completed Aug 8, 2022

telegraf-tiger bot removed the waiting for response waiting for response from contributor label Aug 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SMART does not separately tag sub-devices under a single device (megaraid) #6284

SMART does not separately tag sub-devices under a single device (megaraid) #6284

mattster98 commented Aug 20, 2019 •

edited

Loading

mattster98 commented Aug 20, 2019

danielnelson commented Aug 21, 2019

mattster98 commented Aug 22, 2019

mattster98 commented Aug 22, 2019

danielnelson commented Aug 23, 2019

mattster98 commented Aug 23, 2019 •

edited

Loading

glinton commented Aug 26, 2019

reimda commented Aug 8, 2022

mattster98 commented Aug 8, 2022

SMART does not separately tag sub-devices under a single device (megaraid) #6284

SMART does not separately tag sub-devices under a single device (megaraid) #6284

Comments

mattster98 commented Aug 20, 2019 • edited Loading

Relevant telegraf.conf:

System info:

Steps to reproduce:

Expected behavior:

Actual behavior:

Additional info:

mattster98 commented Aug 20, 2019

danielnelson commented Aug 21, 2019

mattster98 commented Aug 22, 2019

mattster98 commented Aug 22, 2019

danielnelson commented Aug 23, 2019

mattster98 commented Aug 23, 2019 • edited Loading

glinton commented Aug 26, 2019

reimda commented Aug 8, 2022

mattster98 commented Aug 8, 2022

mattster98 commented Aug 20, 2019 •

edited

Loading

mattster98 commented Aug 23, 2019 •

edited

Loading