Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metricbeat] Provide detailed network counters via netstat/snmp and sysctl #12210

Closed
fearful-symmetry opened this issue May 20, 2019 · 13 comments
Assignees
Labels
discuss Issue needs further discussion. enhancement Metricbeat Metricbeat Team:Integrations Label for the Integrations team

Comments

@fearful-symmetry
Copy link
Contributor

fearful-symmetry commented May 20, 2019

This is a continuation of #11951, now that #12050 was merged.

What this is

So, right now the data we have coming from system/network is pretty limited:

        "network": {
            "in": {
                "bytes": 37904869172,
                "dropped": 32,
                "errors": 0,
                "packets": 32143403
            },
            "name": "wlp4s0",
            "out": {
                "bytes": 6299331926,
                "dropped": 0,
                "errors": 0,
                "packets": 13362703
            }
        }

In some way this makes sense, as the metricset is cross-platform and different OSes can expose wildly different metrics and data. What we have now is basically a distillation of what we can probably get regardless of platform. With #12050 at least, we have a demo of how to report certain metrics depending on the OS. So it stands we might want to do this for other metricsets and data sources as well. It should be noted that netdata and node_exporter both provide data that is far more comprehensive compared to system/network.

In linux, the two sources of data that come to mind are /proc/net/snmp and /proc/net/netstat

/proc/net/netstat provides a wealth of extended data on TCP/IP counters, as does /proc/net/snmp, which also includes counters and stats for ICMP and UDP. Together, these procfs files make up the majority of the information you would get via netstat -s on a linux system.

We can also do work on BSD as well. Although netstat is very different on bsd, netstat -s is still a thing, and provides a great deal of TCP/IP stack data, and info from UDP and ICMP as well. As this is BSD, it uses sysctl under the hood as opposed to a pseudo file system.

How the data would look

The most obvious way to present these files is a 1-to-1 direct dump of the procfs files/sysctl counters. This would provide a familar interface for users, and make blog posts straight forward (monitoring netstat with metricbeat...).

In the data itself, if the operating system supports it, we would just append an extra object to the existing system/network metricset:

        "network": {
            "in": {
                "bytes": 37904869172,
                "dropped": 32,
                "errors": 0,
                "packets": 32143403
            },
            "name": "wlp4s0",
            "out": {
                "bytes": 6299331926,
                "dropped": 0,
                "errors": 0,
                "packets": 13362703
            },
            "snmp": {
                 "ip": {
                     "forwarding":2,
                     "default_ttl": 64,
                      "in_receives": 50851,
                     .....
                 }
            }
        }

At least on linux, this should be an 'easy' win, as dealing with procfs is something we're already used to. On BSD this might get a tad harder, as dealing with the more complex opaque types that sysctl returns can be a snarl in golang.

How can I look at this myself?

Create some linux and freeBSD VMs as you normally do.
On linux:

$ cat /proc/net/netstat
$ cat /proc/net/snmp

on FreeBSD:

$ netstat -s
@fearful-symmetry fearful-symmetry added enhancement Metricbeat Metricbeat Team:Integrations Label for the Integrations team labels May 20, 2019
@fearful-symmetry fearful-symmetry self-assigned this May 20, 2019
@jsoriano
Copy link
Member

@fearful-symmetry following with my comment in #11951 (comment), I think that snmp and netstat metrics are system-wide, while the metrics collected by the network metricset are specific of each interface.
I agree that it'd be great to collect these additional metrics, but we have to look for a metricset where they fit better, maybe in socket_summary? or a new one? Another option would be to add another metricset for network interfaces, and use the network one for general network metrics, but this would be a breaking change, I'd avoid to do it.
We can also extend the network interface with other metrics, like the additional metrics that are already collected by gopsutil per interface and we are not using yet.

@fearful-symmetry
Copy link
Contributor Author

I think that snmp and netstat metrics are system-wide, while the metrics collected by the network metricset are specific of each interface.

Bah, I keep on forgetting about that! I should change the name. I think I kinda gravitate towards network because the name seems to fit the bill but yes, we're already using it for interface data, which is different. The data in socket_summary seems like a good fit, but the name would perhaps be a little misleading as this isn't socket-specific either. Maybe a new metricset? network_counters or network_global or something like that?

@fearful-symmetry fearful-symmetry changed the title [metricbeat] enhance system/network metricset with data from netstat/snmp [metricbeat] Provide detailed network counters via netstat/snmp and sysctl May 24, 2019
@jsoriano
Copy link
Member

Maybe a new metricset? network_counters or network_global or something like that?

Maybe an option is to name it network_summary (as we have socket and socket_summary), but I am not sure, all the options look a bit confusing to me.
We could go with any name by now and consider reorganizing all network metrics in 8.0 as breaking change.
@exekias do you have an opinion about this?

@fearful-symmetry
Copy link
Contributor Author

network_summary

Yah, that seems the most 'native' option considering our naming scheme now.

We could go with any name by now and consider reorganizing all network metrics in 8.0 as breaking change.

Also true. IMO, network should be interface_summary and network becomes some kind of global counter data.

@exekias
Copy link
Contributor

exekias commented May 27, 2019

We may be at risk of bloating system.* with network related namespaces. In general it sounds like system.network should be a common namespace to all these metrics. As of today, it's already used for basic packet counters (network.in.packets and friends), but we could try to reorganize it into a tree to give more room for new data, ie:

  • Keep current metrics to avoid a breaking change, move them to system.network.interface in 8.0
  • Add more sub categories where needed, ie system.network.snmp, system.network.netstat.

We can do step 2 without breaking change, just by putting metrics on different events? Question remains if we should keep adding more metricsets or make the right decisions for the user from a few of them.

@fearful-symmetry
Copy link
Contributor Author

fearful-symmetry commented May 30, 2019

Just getting back to this, I thought @exekias raised some good issues as well.

We may be at risk of bloating system.* with network related namespaces.

I agree. Considering the rather large amount of network counters/namespaces/weird APIs/ that many OSes expose, I wonder if we should have a network metricset? If we're gonna make breaking changes, it's something to consider.

Add more sub categories where needed, ie system.network.snmp, system.network.netstat

How would this work? Would it be some kind of sub-metricset? Do we want to do that?

@exekias
Copy link
Contributor

exekias commented Jun 4, 2019

Sorry for the late response 🤕 :

How about adding a snmp metricset that reports under system.network.snmp? That would allow you to add more metrics under network while keeping them namespaced. Then the only breaking change needed in 8.0 is moving current network metrics into network.interfaces (or similar, tbd):

- module: system
  metricsets:
    - cpu
    - memory
    - network
    - snmp
     ...

snmp reporting like:

{
  "system": {
    "network": {
      "snmp": {
         "ip": {
           "forwarding":2,
           "default_ttl": 64,
           "in_receives": 50851,
           .....
        }
      }
    }
  }
}

@andresrc andresrc added [zube]: Inbox discuss Issue needs further discussion. labels Jun 17, 2019
@fearful-symmetry
Copy link
Contributor Author

Picking this up again since it's next in line after I finish systemd work.

So, I'm not worried too much about metricsets/namespaces right now. I think either network or network_summary would make sense. There's too many name collisions with snmp or netstat.

What I am worried about is cross-compatibility. A lot of similar but different metrics are gathered in BSD. We have 3 ways to deal with this:

  • Don't collect BSD data.
  • Collect BSD data, and try to 'standardize' what we can across platforms. Not sure it's worth it, and we'll never be able to standardize all the counters.
  • Don't standardize anything. The easiest to do, and also perhaps easier for users that know what metrics they're looking for. Also might be harder for cross-OS/Cross-node event correlation.

@fearful-symmetry
Copy link
Contributor Author

fearful-symmetry commented Oct 30, 2019

@jsoriano / @exekias Do either of you have thoughts on How we should handle BSD data?

A while ago I looked at expanding BSD support for some network counters, and getting it natively done with sysctl in go was an absolute nightmare. Extensive use of CGO absolutely required.

@jsoriano
Copy link
Member

jsoriano commented Nov 9, 2019

@fearful-symmetry regarding the BSD data, we could definitely go without BSD data by now, but keeping in mind that we can add more platforms in the future.
If now or later we decide to add BSD data we could go without standardization, and once we are collecting metrics, decide if some of them could go to a common field because they represent equivalent things in the different platforms.
We could consider duplicating the most important metrics, or use aliases, so they can be found both in common fields and also in OS-specific fields.

@fearful-symmetry
Copy link
Contributor Author

regarding the BSD data, we could definitely go without BSD data by now, but keeping in mind that we can add more platforms in the future.

Yep.

We could consider duplicating the most important metrics, or use aliases, so they can be found both in common fields and also in OS-specific fields.

I didn't consider that. Is aliasing/duplication something we do often?

@jsoriano
Copy link
Member

We could consider duplicating the most important metrics, or use aliases, so they can be found both in common fields and also in OS-specific fields.

I didn't consider that. Is aliasing/duplication something we do often?

We used aliasing mainly for the migration to ECS, I guess we can consider using it for other purposes. The problem with aliasing is that I think it only works on query time, the resulting events don't have the aliased fields.

About duplicating, I would consider it for cases where it improves the usability, I would only do it for some fields that we feel are going to be the most used ones. I think that we have also used it in the past to rename fields without breaking backwards compatibility.

@fearful-symmetry
Copy link
Contributor Author

Addressed in #15196

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion. enhancement Metricbeat Metricbeat Team:Integrations Label for the Integrations team
Projects
None yet
Development

No branches or pull requests

4 participants