Skip to content

Commit

Permalink
[extension/cgroupruntime]: Initial implementation (#35472)
Browse files Browse the repository at this point in the history
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

This PR adds the initial implementation of a new component to
dynamically set the values of `GOMEMLIMIT` and `GOMAXPROCS` used by the
Go runtime. Those values are normally manually aligned with the cgroup
resource limit to prevent cpu throttling or out of memory scenarios.

The component would ease the manual steps of configuring these
environment variables in K8s deployments (e.g Helm
[templates](https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/templates/_helpers.tpl#L169))
in addition to have fine-grained values (e.g. 90% of the resource memory
limits).

**Link to tracking Issue:** <Issue number if applicable>
#30289

**Testing:** <Describe what testing was performed and which tests were
added.> Unit testing for the component has been added (config and
extension start/stop). But ideally, an integration test that actually
asserts the runtime modifications should be added as well. The extension
relies on "github.com/KimMachineGun/automemlimit/memlimit" and
"go.uber.org/automaxprocs/maxprocs" packages for the runtime
modifications, but they don't provide a way to mock the "cgroups" file
system which is the one they read to get the resource quota limits.

- Automemlimit package tests expect to run in a cgroup environment:
https://github.com/KimMachineGun/automemlimit/blob/main/memlimit/cgroups_test.go#L18
- Automaxprocs does not expose the cpu quota retrieval
https://github.com/uber-go/automaxprocs/blob/master/maxprocs/maxprocs.go#L41

Any suggestion on how to perform this integration tests in the contrib
repository? One possibility is to use the
https://github.com/containerd/cgroups package to set the quota, but this
requires privileged permissions (also in the GHA)


**Documentation:** <Describe the documentation added.>

---------

Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
  • Loading branch information
rogercoll and mx-psi authored Nov 26, 2024
1 parent fa0fe10 commit 28f23aa
Show file tree
Hide file tree
Showing 23 changed files with 684 additions and 0 deletions.
27 changes: 27 additions & 0 deletions .chloggen/add_cgroupruntime_extension.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: new_component

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: extension/cgroupruntime

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Initial implementation for cgroupruntime extension.

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [30289]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: []
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ extension/asapauthextension/ @open-telemetry/collector-cont
extension/awsproxy/ @open-telemetry/collector-contrib-approvers @Aneurysm9 @mxiamxia
extension/basicauthextension/ @open-telemetry/collector-contrib-approvers @jpkrohling @frzifus
extension/bearertokenauthextension/ @open-telemetry/collector-contrib-approvers @jpkrohling @frzifus
extension/cgroupruntimeextension/ @open-telemetry/collector-contrib-approvers @mx-psi @rogercoll
extension/encoding/ @open-telemetry/collector-contrib-approvers @atoulme @dao-jun @dmitryax @MovieStoreGuy @VihasMakwana
extension/encoding/avrologencodingextension/ @open-telemetry/collector-contrib-approvers @thmshmm
extension/encoding/jaegerencodingextension/ @open-telemetry/collector-contrib-approvers @MovieStoreGuy @atoulme
Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ body:
- extension/awsproxy
- extension/basicauth
- extension/bearertokenauth
- extension/cgroupruntime
- extension/encoding
- extension/encoding/avrologencoding
- extension/encoding/jaegerencoding
Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/feature_request.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ body:
- extension/awsproxy
- extension/basicauth
- extension/bearertokenauth
- extension/cgroupruntime
- extension/encoding
- extension/encoding/avrologencoding
- extension/encoding/jaegerencoding
Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/other.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ body:
- extension/awsproxy
- extension/basicauth
- extension/bearertokenauth
- extension/cgroupruntime
- extension/encoding
- extension/encoding/avrologencoding
- extension/encoding/jaegerencoding
Expand Down
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/unmaintained.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ body:
- extension/awsproxy
- extension/basicauth
- extension/bearertokenauth
- extension/cgroupruntime
- extension/encoding
- extension/encoding/avrologencoding
- extension/encoding/jaegerencoding
Expand Down
1 change: 1 addition & 0 deletions extension/cgroupruntimeextension/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include ../../Makefile.Common
42 changes: 42 additions & 0 deletions extension/cgroupruntimeextension/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Cgroup Go runtime extension


<!-- status autogenerated section -->
| Status | |
| ------------- |-----------|
| Stability | [development] |
| Distributions | [contrib] |
| Issues | [![Open issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aopen%20label%3Aextension%2Fcgroupruntime%20&label=open&color=orange&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aopen+is%3Aissue+label%3Aextension%2Fcgroupruntime) [![Closed issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aclosed%20label%3Aextension%2Fcgroupruntime%20&label=closed&color=blue&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aclosed+is%3Aissue+label%3Aextension%2Fcgroupruntime) |
| [Code Owners](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/CONTRIBUTING.md#becoming-a-code-owner) | [@mx-psi](https://www.github.com/mx-psi), [@rogercoll](https://www.github.com/rogercoll) |

[development]: https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/component-stability.md#development
[contrib]: https://github.com/open-telemetry/opentelemetry-collector-releases/tree/main/distributions/otelcol-contrib
<!-- end autogenerated section -->

## Overview

The OpenTelemetry Cgroup Auto-Config Extension is designed to optimize Go runtime performance in containerized environments by automatically configuring GOMAXPROCS and GOMEMLIMIT based on the Linux cgroup filesystem. This extension leverages [automaxprocs](https://github.com/uber-go/automaxprocs) and [automemlimit](https://github.com/KimMachineGun/automemlimit) packages to dynamically adjust Go runtime variables, ensuring efficient resource usage aligned with container limits.

## Configuration

The following settings can be configured:

- `gomaxprocs`: Configures the behavior of setting `GOMAXPROCS`, the maximum number of CPUs for Go runtime. Options:
- `enabled`: A boolean value to enable or disable automatic configuration of `GOMAXPROCS` based on the system’s cgroup settings (default: true).

- `gomemlimit`: Configures the behavior of setting `GOMEMLIMIT`, the maximum memory limit for Go runtime. Options:
- `enabled`: A boolean value to enable or disable automatic configuration of `GOMEMLIMIT` (default: true).
- `ratio`: A floating-point value between 0 and 1 that represents the fraction of the detected memory limit to allocate for the Go runtime (default: 0.9).

## Examples

```yaml
extension:
# processor name: cgroupruntime
cgroupruntime:
gomaxprocs:
enabled: true
gomemlimit:
enabled: true
ratio: 0.8
```
28 changes: 28 additions & 0 deletions extension/cgroupruntimeextension/config.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

package cgroupruntimeextension // import "github.com/open-telemetry/opentelemetry-collector-contrib/extension/cgroupruntimeextension"

import "errors" // Config contains the configuration for the cgroup runtime extension.

type Config struct {
GoMaxProcs GoMaxProcsConfig `mapstructure:"gomaxprocs"`
GoMemLimit GoMemLimitConfig `mapstructure:"gomemlimit"`
}

type GoMaxProcsConfig struct {
Enabled bool `mapstructure:"enabled"`
}

type GoMemLimitConfig struct {
Enabled bool `mapstructure:"enabled"`
Ratio float64 `mapstructure:"ratio"`
}

// Validate checks if the extension configuration is valid
func (cfg *Config) Validate() error {
if cfg.GoMemLimit.Ratio <= 0 || cfg.GoMemLimit.Ratio > 1 {
return errors.New("gomemlimit ratio must be in the (0.0,1.0] range")
}
return nil
}
81 changes: 81 additions & 0 deletions extension/cgroupruntimeextension/config_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

package cgroupruntimeextension

import (
"path/filepath"
"testing"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"go.opentelemetry.io/collector/component"
"go.opentelemetry.io/collector/confmap/confmaptest"

"github.com/open-telemetry/opentelemetry-collector-contrib/extension/cgroupruntimeextension/internal/metadata"
)

func TestLoadConfig(t *testing.T) {
t.Parallel()

tests := []struct {
id component.ID
expected component.Config
unmarshalErrorMessage string
validateErrorMessage string
}{
{
id: component.NewID(metadata.Type),
expected: &Config{
GoMaxProcs: GoMaxProcsConfig{Enabled: true},
GoMemLimit: GoMemLimitConfig{
Enabled: true,
Ratio: 0.9,
},
},
},
{
id: component.NewIDWithName(metadata.Type, "invalid_ratio"),
validateErrorMessage: "gomemlimit ratio must be in the (0.0,1.0] range",
},
{
id: component.NewIDWithName(metadata.Type, "invalid_ratio_disabled"),
validateErrorMessage: "gomemlimit ratio must be in the (0.0,1.0] range",
},
{
id: component.NewIDWithName(metadata.Type, "invalid_ratio_negative"),
validateErrorMessage: "gomemlimit ratio must be in the (0.0,1.0] range",
},
{
id: component.NewIDWithName(metadata.Type, "invalid_ratio_type"),
unmarshalErrorMessage: "decoding failed due to the following error(s):\n\n'gomemlimit.ratio' expected type 'float64', got unconvertible type 'string', value: 'not_valid'",
},
}

for _, tt := range tests {
t.Run(tt.id.String(), func(t *testing.T) {
cm, err := confmaptest.LoadConf(filepath.Join("testdata", "config.yaml"))
require.NoError(t, err)

factory := NewFactory()
cfg := factory.CreateDefaultConfig()

sub, err := cm.Sub(tt.id.String())
require.NoError(t, err)

if tt.unmarshalErrorMessage != "" {
assert.ErrorContains(t, sub.Unmarshal(cfg), tt.unmarshalErrorMessage)
return
}
require.NoError(t, sub.Unmarshal(cfg))

if tt.validateErrorMessage != "" {
assert.EqualError(t, component.ValidateConfig(cfg), tt.validateErrorMessage)
return
}

assert.NoError(t, component.ValidateConfig(cfg))
assert.Equal(t, tt.expected, cfg)
})
}
}
6 changes: 6 additions & 0 deletions extension/cgroupruntimeextension/doc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

//go:generate mdatagen metadata.yaml

package cgroupruntimeextension // import "github.com/open-telemetry/opentelemetry-collector-contrib/extension/cgroupruntimeextension"
77 changes: 77 additions & 0 deletions extension/cgroupruntimeextension/extension.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

package cgroupruntimeextension // import "github.com/open-telemetry/opentelemetry-collector-contrib/extension/cgroupruntimeextension"

import (
"context"
"runtime"
"runtime/debug"

"go.opentelemetry.io/collector/component"
"go.uber.org/zap"
)

type (
undoFunc func()
maxProcsFn func() (undoFunc, error)
memLimitWithRatioFn func(float64) (undoFunc, error)
)

type cgroupRuntimeExtension struct {
config *Config
logger *zap.Logger

// runtime modifiers
maxProcsFn
undoMaxProcsFn undoFunc

memLimitWithRatioFn
undoMemLimitFn undoFunc
}

func newCgroupRuntime(cfg *Config, logger *zap.Logger, maxProcsFn maxProcsFn, memLimitFn memLimitWithRatioFn) *cgroupRuntimeExtension {
return &cgroupRuntimeExtension{
config: cfg,
logger: logger,
maxProcsFn: maxProcsFn,
memLimitWithRatioFn: memLimitFn,
}
}

func (c *cgroupRuntimeExtension) Start(_ context.Context, _ component.Host) error {
var err error
if c.config.GoMaxProcs.Enabled {
c.undoMaxProcsFn, err = c.maxProcsFn()
if err != nil {
return err
}

c.logger.Info("GOMAXPROCS has been set",
zap.Int("GOMAXPROCS", runtime.GOMAXPROCS(-1)),
)
}

if c.config.GoMemLimit.Enabled {
c.undoMemLimitFn, err = c.memLimitWithRatioFn(c.config.GoMemLimit.Ratio)
if err != nil {
return err
}

c.logger.Info("GOMEMLIMIT has been set",
zap.Int64("GOMEMLIMIT", debug.SetMemoryLimit(-1)),
)
}
return nil
}

func (c *cgroupRuntimeExtension) Shutdown(_ context.Context) error {
if c.undoMaxProcsFn != nil {
c.undoMaxProcsFn()
}
if c.undoMemLimitFn != nil {
c.undoMemLimitFn()
}

return nil
}
67 changes: 67 additions & 0 deletions extension/cgroupruntimeextension/extension_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

package cgroupruntimeextension

import (
"context"
"testing"

"github.com/stretchr/testify/require"
"go.opentelemetry.io/collector/component/componenttest"
"go.opentelemetry.io/collector/extension/extensiontest"
)

func TestExtension(t *testing.T) {
tests := []struct {
name string
config *Config
expectedCalls int
}{
{
name: "all enabled",
config: &Config{
GoMaxProcs: GoMaxProcsConfig{
Enabled: true,
},
GoMemLimit: GoMemLimitConfig{
Enabled: true,
Ratio: 0.5,
},
},
expectedCalls: 4,
},
{
name: "everything disabled",
config: &Config{
GoMaxProcs: GoMaxProcsConfig{
Enabled: false,
},
GoMemLimit: GoMemLimitConfig{
Enabled: false,
},
},
expectedCalls: 0,
},
}

for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
allCalls := 0
var _err error
setterMock := func() (undoFunc, error) {
allCalls++
return func() { allCalls++ }, _err
}
settings := extensiontest.NewNopSettings()
cg := newCgroupRuntime(test.config, settings.Logger, setterMock, func(_ float64) (undoFunc, error) { return setterMock() })
ctx := context.Background()

err := cg.Start(ctx, componenttest.NewNopHost())
require.NoError(t, err)

require.NoError(t, cg.Shutdown(ctx))
require.Equal(t, test.expectedCalls, allCalls)
})
}
}
Loading

0 comments on commit 28f23aa

Please sign in to comment.