[metricbeat] Provide detailed network counters via netstat/snmp and sysctl #12210

fearful-symmetry · 2019-05-20T18:05:33Z

This is a continuation of #11951, now that #12050 was merged.

What this is

So, right now the data we have coming from system/network is pretty limited:

        "network": {
            "in": {
                "bytes": 37904869172,
                "dropped": 32,
                "errors": 0,
                "packets": 32143403
            },
            "name": "wlp4s0",
            "out": {
                "bytes": 6299331926,
                "dropped": 0,
                "errors": 0,
                "packets": 13362703
            }
        }

In some way this makes sense, as the metricset is cross-platform and different OSes can expose wildly different metrics and data. What we have now is basically a distillation of what we can probably get regardless of platform. With #12050 at least, we have a demo of how to report certain metrics depending on the OS. So it stands we might want to do this for other metricsets and data sources as well. It should be noted that netdata and node_exporter both provide data that is far more comprehensive compared to system/network.

In linux, the two sources of data that come to mind are /proc/net/snmp and /proc/net/netstat

/proc/net/netstat provides a wealth of extended data on TCP/IP counters, as does /proc/net/snmp, which also includes counters and stats for ICMP and UDP. Together, these procfs files make up the majority of the information you would get via netstat -s on a linux system.

We can also do work on BSD as well. Although netstat is very different on bsd, netstat -s is still a thing, and provides a great deal of TCP/IP stack data, and info from UDP and ICMP as well. As this is BSD, it uses sysctl under the hood as opposed to a pseudo file system.

How the data would look

The most obvious way to present these files is a 1-to-1 direct dump of the procfs files/sysctl counters. This would provide a familar interface for users, and make blog posts straight forward (monitoring netstat with metricbeat...).

In the data itself, if the operating system supports it, we would just append an extra object to the existing system/network metricset:

        "network": {
            "in": {
                "bytes": 37904869172,
                "dropped": 32,
                "errors": 0,
                "packets": 32143403
            },
            "name": "wlp4s0",
            "out": {
                "bytes": 6299331926,
                "dropped": 0,
                "errors": 0,
                "packets": 13362703
            },
            "snmp": {
                 "ip": {
                     "forwarding":2,
                     "default_ttl": 64,
                      "in_receives": 50851,
                     .....
                 }
            }
        }

At least on linux, this should be an 'easy' win, as dealing with procfs is something we're already used to. On BSD this might get a tad harder, as dealing with the more complex opaque types that sysctl returns can be a snarl in golang.

How can I look at this myself?

Create some linux and freeBSD VMs as you normally do.
On linux:

$ cat /proc/net/netstat
$ cat /proc/net/snmp

on FreeBSD:

$ netstat -s

The text was updated successfully, but these errors were encountered:

jsoriano · 2019-05-24T16:34:20Z

@fearful-symmetry following with my comment in #11951 (comment), I think that snmp and netstat metrics are system-wide, while the metrics collected by the network metricset are specific of each interface.
I agree that it'd be great to collect these additional metrics, but we have to look for a metricset where they fit better, maybe in socket_summary? or a new one? Another option would be to add another metricset for network interfaces, and use the network one for general network metrics, but this would be a breaking change, I'd avoid to do it.
We can also extend the network interface with other metrics, like the additional metrics that are already collected by gopsutil per interface and we are not using yet.

fearful-symmetry · 2019-05-24T16:53:10Z

I think that snmp and netstat metrics are system-wide, while the metrics collected by the network metricset are specific of each interface.

Bah, I keep on forgetting about that! I should change the name. I think I kinda gravitate towards network because the name seems to fit the bill but yes, we're already using it for interface data, which is different. The data in socket_summary seems like a good fit, but the name would perhaps be a little misleading as this isn't socket-specific either. Maybe a new metricset? network_counters or network_global or something like that?

jsoriano · 2019-05-24T17:11:12Z

Maybe a new metricset? network_counters or network_global or something like that?

Maybe an option is to name it network_summary (as we have socket and socket_summary), but I am not sure, all the options look a bit confusing to me.
We could go with any name by now and consider reorganizing all network metrics in 8.0 as breaking change.
@exekias do you have an opinion about this?

fearful-symmetry · 2019-05-24T17:15:07Z

network_summary

Yah, that seems the most 'native' option considering our naming scheme now.

We could go with any name by now and consider reorganizing all network metrics in 8.0 as breaking change.

Also true. IMO, network should be interface_summary and network becomes some kind of global counter data.

exekias · 2019-05-27T11:24:02Z

We may be at risk of bloating system.* with network related namespaces. In general it sounds like system.network should be a common namespace to all these metrics. As of today, it's already used for basic packet counters (network.in.packets and friends), but we could try to reorganize it into a tree to give more room for new data, ie:

Keep current metrics to avoid a breaking change, move them to system.network.interface in 8.0
Add more sub categories where needed, ie system.network.snmp, system.network.netstat.

We can do step 2 without breaking change, just by putting metrics on different events? Question remains if we should keep adding more metricsets or make the right decisions for the user from a few of them.

fearful-symmetry · 2019-05-30T14:04:20Z

Just getting back to this, I thought @exekias raised some good issues as well.

We may be at risk of bloating system.* with network related namespaces.

I agree. Considering the rather large amount of network counters/namespaces/weird APIs/ that many OSes expose, I wonder if we should have a network metricset? If we're gonna make breaking changes, it's something to consider.

Add more sub categories where needed, ie system.network.snmp, system.network.netstat

How would this work? Would it be some kind of sub-metricset? Do we want to do that?

exekias · 2019-06-04T09:51:25Z

Sorry for the late response 🤕 :

How about adding a snmp metricset that reports under system.network.snmp? That would allow you to add more metrics under network while keeping them namespaced. Then the only breaking change needed in 8.0 is moving current network metrics into network.interfaces (or similar, tbd):

- module: system
  metricsets:
    - cpu
    - memory
    - network
    - snmp
     ...

snmp reporting like:

{
  "system": {
    "network": {
      "snmp": {
         "ip": {
           "forwarding":2,
           "default_ttl": 64,
           "in_receives": 50851,
           .....
        }
      }
    }
  }
}

fearful-symmetry · 2019-10-30T14:55:49Z

Picking this up again since it's next in line after I finish systemd work.

So, I'm not worried too much about metricsets/namespaces right now. I think either network or network_summary would make sense. There's too many name collisions with snmp or netstat.

What I am worried about is cross-compatibility. A lot of similar but different metrics are gathered in BSD. We have 3 ways to deal with this:

Don't collect BSD data.
Collect BSD data, and try to 'standardize' what we can across platforms. Not sure it's worth it, and we'll never be able to standardize all the counters.
Don't standardize anything. The easiest to do, and also perhaps easier for users that know what metrics they're looking for. Also might be harder for cross-OS/Cross-node event correlation.

fearful-symmetry · 2019-10-30T15:02:35Z

@jsoriano / @exekias Do either of you have thoughts on How we should handle BSD data?

A while ago I looked at expanding BSD support for some network counters, and getting it natively done with sysctl in go was an absolute nightmare. Extensive use of CGO absolutely required.

jsoriano · 2019-11-09T23:30:07Z

@fearful-symmetry regarding the BSD data, we could definitely go without BSD data by now, but keeping in mind that we can add more platforms in the future.
If now or later we decide to add BSD data we could go without standardization, and once we are collecting metrics, decide if some of them could go to a common field because they represent equivalent things in the different platforms.
We could consider duplicating the most important metrics, or use aliases, so they can be found both in common fields and also in OS-specific fields.

fearful-symmetry · 2019-11-12T20:00:33Z

regarding the BSD data, we could definitely go without BSD data by now, but keeping in mind that we can add more platforms in the future.

Yep.

We could consider duplicating the most important metrics, or use aliases, so they can be found both in common fields and also in OS-specific fields.

I didn't consider that. Is aliasing/duplication something we do often?

jsoriano · 2019-11-13T12:30:31Z

We could consider duplicating the most important metrics, or use aliases, so they can be found both in common fields and also in OS-specific fields.

I didn't consider that. Is aliasing/duplication something we do often?

We used aliasing mainly for the migration to ECS, I guess we can consider using it for other purposes. The problem with aliasing is that I think it only works on query time, the resulting events don't have the aliased fields.

About duplicating, I would consider it for cases where it improves the usability, I would only do it for some fields that we feel are going to be the most used ones. I think that we have also used it in the past to rename fields without breaking backwards compatibility.

fearful-symmetry · 2020-01-14T14:30:11Z

Addressed in #15196

fearful-symmetry added enhancement Metricbeat Metricbeat Team:Integrations Label for the Integrations team labels May 20, 2019

fearful-symmetry self-assigned this May 20, 2019

fearful-symmetry changed the title ~~[metricbeat] enhance system/network metricset with data from netstat/snmp~~ [metricbeat] Provide detailed network counters via netstat/snmp and sysctl May 24, 2019

andresrc added [zube]: Inbox discuss Issue needs further discussion. labels Jun 17, 2019

fearful-symmetry mentioned this issue Dec 3, 2019

Add Linux Networking counters elastic/go-sysinfo#72

Merged

fearful-symmetry mentioned this issue Dec 18, 2019

[Metricbeat] add a network_summary metricset #15196

Merged

1 task

fearful-symmetry closed this as completed Jan 14, 2020

zube bot added [zube]: Done and removed [zube]: Inbox labels Jan 14, 2020

andresrc removed the [zube]: Done label Jan 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[metricbeat] Provide detailed network counters via netstat/snmp and sysctl #12210

[metricbeat] Provide detailed network counters via netstat/snmp and sysctl #12210

fearful-symmetry commented May 20, 2019 •

edited by andresrc

Loading

jsoriano commented May 24, 2019

fearful-symmetry commented May 24, 2019

jsoriano commented May 24, 2019

fearful-symmetry commented May 24, 2019

exekias commented May 27, 2019

fearful-symmetry commented May 30, 2019 •

edited

Loading

exekias commented Jun 4, 2019 •

edited

Loading

fearful-symmetry commented Oct 30, 2019

fearful-symmetry commented Oct 30, 2019 •

edited

Loading

jsoriano commented Nov 9, 2019

fearful-symmetry commented Nov 12, 2019

jsoriano commented Nov 13, 2019

fearful-symmetry commented Jan 14, 2020

[metricbeat] Provide detailed network counters via netstat/snmp and sysctl #12210

[metricbeat] Provide detailed network counters via netstat/snmp and sysctl #12210

Comments

fearful-symmetry commented May 20, 2019 • edited by andresrc Loading

jsoriano commented May 24, 2019

fearful-symmetry commented May 24, 2019

jsoriano commented May 24, 2019

fearful-symmetry commented May 24, 2019

exekias commented May 27, 2019

fearful-symmetry commented May 30, 2019 • edited Loading

exekias commented Jun 4, 2019 • edited Loading

fearful-symmetry commented Oct 30, 2019

fearful-symmetry commented Oct 30, 2019 • edited Loading

jsoriano commented Nov 9, 2019

fearful-symmetry commented Nov 12, 2019

jsoriano commented Nov 13, 2019

fearful-symmetry commented Jan 14, 2020

fearful-symmetry commented May 20, 2019 •

edited by andresrc

Loading

fearful-symmetry commented May 30, 2019 •

edited

Loading

exekias commented Jun 4, 2019 •

edited

Loading

fearful-symmetry commented Oct 30, 2019 •

edited

Loading