Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for group process monitoring #38

Merged
merged 15 commits into from
Dec 16, 2024
7 changes: 7 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -298,6 +298,13 @@ endif
docker-otelcontribcol:
COMPONENT=otelcontribcol $(MAKE) docker-component

ubuntu-component: check-component
GOOS=linux GOARCH=amd64 $(MAKE) $(COMPONENT)
cp ./bin/$(COMPONENT)_linux_amd64 ./cmd/$(COMPONENT)/$(COMPONENT)

ubuntu-otelcontribcol:
COMPONENT=otelcontribcol $(MAKE) ubuntu-component

.PHONY: docker-ubuntu-component # Not intended to be used directly
docker-ubuntu-component: check-component
GOOS=linux GOARCH=amd64 $(MAKE) $(COMPONENT)
Expand Down
56 changes: 44 additions & 12 deletions receiver/hostmetricsreceiver/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,18 +38,19 @@ hostmetrics:

The available scrapers are:

| Scraper | Supported OSs | Description |
| ------------ | ---------------------------- | ------------------------------------------------------ |
| [cpu] | All except Mac<sup>[1]</sup> | CPU utilization metrics |
| [disk] | All except Mac<sup>[1]</sup> | Disk I/O metrics |
| [load] | All | CPU load metrics |
| [filesystem] | All | File System utilization metrics |
| [memory] | All | Memory utilization metrics |
| [network] | All | Network interface I/O metrics & TCP connection metrics |
| [paging] | All | Paging/Swap space utilization and I/O metrics |
| [processes] | Linux, Mac | Process count metrics |
| [process] | Linux, Windows, Mac | Per process CPU, Memory, and Disk I/O metrics |
| [system] | Linux, Windows, Mac | Miscellaneous system metrics |
| Scraper | Supported OSs | Description |
|----------------| ---------------------------- |---------------------------------------------------------------------------------------------|
| [cpu] | All except Mac<sup>[1]</sup> | CPU utilization metrics |
| [disk] | All except Mac<sup>[1]</sup> | Disk I/O metrics |
| [load] | All | CPU load metrics |
| [filesystem] | All | File System utilization metrics |
| [memory] | All | Memory utilization metrics |
| [network] | All | Network interface I/O metrics & TCP connection metrics |
| [paging] | All | Paging/Swap space utilization and I/O metrics |
| [processes] | Linux, Mac | Process count metrics |
| [process] | Linux, Windows, Mac | Per process CPU, Memory, and Disk I/O metrics |
| [system] | Linux, Windows, Mac | Miscellaneous system metrics |
| [groupprocess] | Linux, Windows, Mac | Aggregated metrics of all processes of group - CPU, Memory, Threads, Open FD, Process Count |

[cpu]: ./internal/scraper/cpuscraper/documentation.md
[disk]: ./internal/scraper/diskscraper/documentation.md
Expand All @@ -61,6 +62,7 @@ The available scrapers are:
[processes]: ./internal/scraper/processesscraper/documentation.md
[process]: ./internal/scraper/processscraper/documentation.md
[system]: ./internal/scraper/systemscraper/documentation.md
[groupprocess]: ./internal/scraper/groupprocessscraper/documentation.md

### Notes

Expand Down Expand Up @@ -134,6 +136,36 @@ The following settings are optional:
- `mute_process_exe_error` (default: false): mute the error encountered when trying to read the executable path of a process the collector does not have permission to read (Linux only). This flag is ignored when `mute_process_all_errors` is set to true as all errors are muted.
- `mute_process_user_error` (default: false): mute the error encountered when trying to read a uid which doesn't exist on the system, eg. is owned by a user that only exists in a container. This flag is ignored when `mute_process_all_errors` is set to true as all errors are muted.

### GroupProcess
The `groupprocessscraper` collects metrics for groups of processes based on user given regex configurations. This allows for more granular monitoring of specific sets of processes.

```yaml
receivers:
hostmetrics:
collection_interval: 10s
scrapers:
groupprocess:
process_configs:
- group_name: "kube_process"
comm:
names:
- "kube-proxy"
match_type: "strict"
exe:
names:
- "/usr/local/bin/kube-proxy"
match_type: "strict"
cmdline:
names:
- "--config=/var/lib/kube-proxy/config.conf"
match_type: "regexp"
```
In this example, the groupprocessscraper is configured to scrape metrics for processes that match the specified comm, exe, and cmdline criteria. The group_name is used to identify the group of processes being monitored.
For `comm` and `exe`, the list of strings is an `OR`, meaning any process matching any of the strings will be added to the process group.
For `cmdline`, the list of regexes is an `AND`, meaning they all must match. <br />

**Note:** If more than one out of `comm`, `exe` and `cmdline` selectors are given then all of them has to match. If a process matches to one group, the same process will not be part of other groups.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, add a line.
If more than one out of(comm, exe and cmdline) is given all of them has to match.

if a process matches to one group, the same process will not be part of other groups.

## Advanced Configuration

### Filtering
Expand Down
22 changes: 12 additions & 10 deletions receiver/hostmetricsreceiver/factory.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import (
"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/cpuscraper"
"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/diskscraper"
"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/filesystemscraper"
"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/groupprocessscraper"
"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/loadscraper"
"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/memoryscraper"
"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/networkscraper"
Expand All @@ -37,16 +38,17 @@ const (
// This file implements Factory for HostMetrics receiver.
var (
scraperFactories = map[string]internal.ScraperFactory{
cpuscraper.TypeStr: &cpuscraper.Factory{},
diskscraper.TypeStr: &diskscraper.Factory{},
loadscraper.TypeStr: &loadscraper.Factory{},
filesystemscraper.TypeStr: &filesystemscraper.Factory{},
memoryscraper.TypeStr: &memoryscraper.Factory{},
networkscraper.TypeStr: &networkscraper.Factory{},
pagingscraper.TypeStr: &pagingscraper.Factory{},
processesscraper.TypeStr: &processesscraper.Factory{},
processscraper.TypeStr: &processscraper.Factory{},
systemscraper.TypeStr: &systemscraper.Factory{},
cpuscraper.TypeStr: &cpuscraper.Factory{},
diskscraper.TypeStr: &diskscraper.Factory{},
loadscraper.TypeStr: &loadscraper.Factory{},
filesystemscraper.TypeStr: &filesystemscraper.Factory{},
memoryscraper.TypeStr: &memoryscraper.Factory{},
networkscraper.TypeStr: &networkscraper.Factory{},
pagingscraper.TypeStr: &pagingscraper.Factory{},
processesscraper.TypeStr: &processesscraper.Factory{},
processscraper.TypeStr: &processscraper.Factory{},
systemscraper.TypeStr: &systemscraper.Factory{},
groupprocessscraper.TypeStr: &groupprocessscraper.Factory{},
}
)

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

package groupprocessscraper // import "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/groupprocessscraper"

import (
"time"

"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal"
"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/groupprocessscraper/internal/metadata"
)

// Config relating to Process Metric Scraper.
type Config struct {
// MetricsBuilderConfig allows to customize scraped metrics/attributes representation.
metadata.MetricsBuilderConfig `mapstructure:",squash"`
internal.ScraperConfig

GroupConfig []GroupMatchConfig `mapstructure:"process_configs"`

// MuteProcessAllErrors is a flag that will mute all the errors encountered when trying to read metrics of a process.
// When this flag is enabled, there is no need to activate any other error suppression flags.
MuteProcessAllErrors bool `mapstructure:"mute_process_all_errors,omitempty"`

// MuteProcessNameError is a flag that will mute the error encountered when trying to read a process name the
// collector does not have permission to read.
// See https://github.com/open-telemetry/opentelemetry-collector/issues/3004 for more information.
// This flag is ignored when MuteProcessAllErrors is set to true as all errors are muted.
MuteProcessNameError bool `mapstructure:"mute_process_name_error,omitempty"`

// MuteProcessIOError is a flag that will mute the error encountered when trying to read IO metrics of a process
// the collector does not have permission to read.
// This flag is ignored when MuteProcessAllErrors is set to true as all errors are muted.
MuteProcessIOError bool `mapstructure:"mute_process_io_error,omitempty"`

// MuteProcessCgroupError is a flag that will mute the error encountered when trying to read the cgroup of a process
// the collector does not have permission to read.
// This flag is ignored when MuteProcessAllErrors is set to true as all errors are muted.
MuteProcessCgroupError bool `mapstructure:"mute_process_cgroup_error,omitempty"`

// MuteProcessExeError is a flag that will mute the error encountered when trying to read the executable path of a process
// the collector does not have permission to read (Linux).
// This flag is ignored when MuteProcessAllErrors is set to true as all errors are muted.
MuteProcessExeError bool `mapstructure:"mute_process_exe_error,omitempty"`

// MuteProcessUserError is a flag that will mute the error encountered when trying to read uid which
// doesn't exist on the system, eg. is owned by user existing in container only.
// This flag is ignored when MuteProcessAllErrors is set to true as all errors are muted.
MuteProcessUserError bool `mapstructure:"mute_process_user_error,omitempty"`

// ScrapeProcessDelay is used to indicate the minimum amount of time a process must be running
// before metrics are scraped for it. The default value is 0 seconds (0s).
ScrapeProcessDelay time.Duration `mapstructure:"scrape_process_delay"`
}

type MatchConfig struct {
Names []string `mapstructure:"names"`
MatchType string `mapstructure:"match_type"`
}

type GroupMatchConfig struct {
Names []string `mapstructure:"names"`
ProcessName string `mapstructure:"group_name"`
Comm MatchConfig `mapstructure:"comm"`
Exe MatchConfig `mapstructure:"exe"`
Cmdline MatchConfig `mapstructure:"cmdline"`
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

//go:generate mdatagen metadata.yaml

package groupprocessscraper // import "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/groupprocessscraper"
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
[comment]: <> (Code generated by mdatagen. DO NOT EDIT.)

# hostmetricsreceiver/groupprocess

**Parent Component:** hostmetrics

## Default Metrics

The following metrics are emitted by default. Each of them can be disabled by applying the following configuration:

```yaml
metrics:
<metric_name>:
enabled: false
```

### process.count

Total number of processes

| Unit | Metric Type | Value Type | Aggregation Temporality | Monotonic |
| ---- | ----------- | ---------- | ----------------------- | --------- |
| count | Sum | Int | Cumulative | false |

### process.cpu.percent

Total CPU percent used by the process

| Unit | Metric Type | Value Type |
| ---- | ----------- | ---------- |
| % | Gauge | Double |

#### Attributes

| Name | Description | Values |
| ---- | ----------- | ------ |
| state | Breakdown of CPU usage by type. | Str: ``system``, ``user``, ``wait``, ``total`` |

### process.memory.percent

Total memory percent used by the process

| Unit | Metric Type | Value Type |
| ---- | ----------- | ---------- |
| % | Gauge | Double |

### process.open_file_descriptors

Total number of open file descriptors

| Unit | Metric Type | Value Type | Aggregation Temporality | Monotonic |
| ---- | ----------- | ---------- | ----------------------- | --------- |
| count | Sum | Int | Cumulative | false |

### process.threads

Total number of threads

| Unit | Metric Type | Value Type | Aggregation Temporality | Monotonic |
| ---- | ----------- | ---------- | ----------------------- | --------- |
| count | Sum | Int | Cumulative | false |

## Resource Attributes

| Name | Description | Values | Enabled |
| ---- | ----------- | ------ | ------- |
| process.name | Name of the process | Any Str | true |
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

package groupprocessscraper // import "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/groupprocessscraper"

import (
"context"
"errors"
"runtime"

"go.opentelemetry.io/collector/component"
"go.opentelemetry.io/collector/featuregate"
"go.opentelemetry.io/collector/receiver"
"go.opentelemetry.io/collector/receiver/scraperhelper"

"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal"
"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/groupprocessscraper/internal/metadata"
)

// This file implements Factory for Group Process scraper.

const (
// TypeStr the value of "type" key in configuration.
TypeStr = "groupprocess"
)

var (
// scraperType is the component type used for the built scraper.
scraperType component.Type = component.MustNewType(TypeStr)
)

var (
bootTimeCacheFeaturegateID = "hostmetrics.groupprocess.bootTimeCache"
bootTimeCacheFeaturegate = featuregate.GlobalRegistry().MustRegister(
bootTimeCacheFeaturegateID,
featuregate.StageBeta,
featuregate.WithRegisterDescription("When enabled, all groupprocess scrapes will use the boot time value that is cached at the start of the process."),
featuregate.WithRegisterReferenceURL("https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/28849"),
featuregate.WithRegisterFromVersion("v0.98.0"),
)
)

// Factory is the Factory for scraper.
type Factory struct {
}

// CreateDefaultConfig creates the default configuration for the Scraper.
func (f *Factory) CreateDefaultConfig() internal.Config {
return &Config{
MetricsBuilderConfig: metadata.DefaultMetricsBuilderConfig(),
}
}

// CreateMetricsScraper creates a resource scraper based on provided config.
func (f *Factory) CreateMetricsScraper(
_ context.Context,
settings receiver.Settings,
cfg internal.Config,
) (scraperhelper.Scraper, error) {
if runtime.GOOS != "linux" && runtime.GOOS != "windows" && runtime.GOOS != "darwin" {
return nil, errors.New("groupprocess scraper only available on Linux, Windows, or MacOS")
}

s, err := newGroupProcessScraper(settings, cfg.(*Config))
if err != nil {
return nil, err
}

return scraperhelper.NewScraper(
scraperType,
s.scrape,
scraperhelper.WithStart(s.start),
)
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

package groupprocessscraper

import (
"context"
"runtime"
"testing"

"github.com/stretchr/testify/assert"
"go.opentelemetry.io/collector/receiver/receivertest"
)

func TestCreateDefaultConfig(t *testing.T) {
factory := &Factory{}
cfg := factory.CreateDefaultConfig()
assert.IsType(t, &Config{}, cfg)
}

func TestCreateResourceMetricsScraper(t *testing.T) {
factory := &Factory{}
cfg := &Config{}

scraper, err := factory.CreateMetricsScraper(context.Background(), receivertest.NewNopSettings(), cfg)

if runtime.GOOS == "linux" || runtime.GOOS == "windows" || runtime.GOOS == "darwin" {
assert.NoError(t, err)
assert.NotNil(t, scraper)
assert.Equal(t, scraperType.String(), scraper.ID().String())
} else {
assert.Error(t, err)
assert.Nil(t, scraper)
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

package handlecount // import "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/groupprocessscraper/internal/handlecount"

type Manager interface {
Refresh() error
GetProcessHandleCount(pid int64) (uint32, error)
}
Loading
Loading