Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak caused by otelhttp when no Meter Provider is configured. #5190

Closed
strideynet opened this issue Mar 1, 2024 · 9 comments
Closed
Assignees
Labels
area: instrumentation Related to an instrumentation package bug Something isn't working instrumentation: otelhttp
Milestone

Comments

@strideynet
Copy link

strideynet commented Mar 1, 2024

Description

When calling otelhttp.NewTransport() multiple times, memory is leaked.

image

Supposedly this was resolved by open-telemetry/opentelemetry-go#4820 but I do not think this fixes the issue when no meter provider has been configured and the default global delegating meter provider is in place.

Related: #4226
Introduced by: #4707

Environment

go version go1.22.0 darwin/arm64
	go.opentelemetry.io/contrib/instrumentation/github.com/aws/aws-sdk-go-v2/otelaws v0.49.0
	go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.49.0
	go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.49.0
	go.opentelemetry.io/otel v1.24.0
	go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.24.0
	go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.24.0
	go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.24.0
	go.opentelemetry.io/otel/metric v1.24.0
	go.opentelemetry.io/otel/sdk v1.24.0
	go.opentelemetry.io/otel/trace v1.24.0
	go.opentelemetry.io/proto/otlp v1.1.0

Steps To Reproduce

package bench

import (
	"testing"

	"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/metric/noop"
)

func BenchmarkWithNoMeterProviderSet(b *testing.B) {
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		otelhttp.NewTransport(nil)
	}
}

func BenchmarkWithNoopMeterProviderSet(b *testing.B) {
	b.ReportAllocs()
	otel.SetMeterProvider(noop.MeterProvider{})
	for i := 0; i < b.N; i++ {
		otelhttp.NewTransport(nil)
	}
}

Run each of these with test.memprofile and analyze the inuse heap objects/space difference between them.

Alternatively, write an application which calls otelhttp.NewTransport and analyze heap usage.

Expected behavior

Calling otelhttp.NewTransport() repeatedly doesn't leak memory

Workaround

Call otel.SetMeterProvider(noop.MeterProvider{}) at the start of your application.

@tsuna
Copy link

tsuna commented Mar 28, 2024

We also ran into this issue at Arista.

mattmoor pushed a commit to chainguard-dev/terraform-infra-common that referenced this issue Apr 27, 2024
@dashpole dashpole self-assigned this Aug 28, 2024
@dashpole
Copy link
Contributor

Sorry I didn't see this the first time around. Taking a look now

@dashpole
Copy link
Contributor

I am able to reproduce the issue with these benchmarks:

package bench

import (
	"testing"

	"go.opentelemetry.io/otel"
)

const (
	meterName   = "foo"
	metricName  = "bar"
)

func BenchmarkSimple(b *testing.B) {
	b.ReportAllocs()
	meter := otel.Meter(meterName)
	for i := 0; i < b.N; i++ {
		meter.Int64Counter(metricName)
	}
}

It has 1 allocation. Since we are repeatedly creating the same instrument, it should have 0 allocations. If I replace otel.Meter with a noop meterprovider or the real meter provider, it has 0 allocations. I believe the root cause is that we always append the instrument to the list of instruments for the global meter provider: e.g. https://github.com/open-telemetry/opentelemetry-go/blob/d776f74f7cd9cdaf5e4bb97aeb6b7e2d102e7d68/internal/global/meter.go#L142

We should instead maintain a map of instruments, and ensure that we only keep unique instruments, similar to what we did for the SDK.

@dashpole
Copy link
Contributor

The workaround provided above is correct.

@strideynet
Copy link
Author

I believe the root cause is that we always append the instrument to the list of instruments for the global meter provider: e.g. https://github.com/open-telemetry/opentelemetry-go/blob/d776f74f7cd9cdaf5e4bb97aeb6b7e2d102e7d68/internal/global/meter.go#L142

Looking back at my notes when I wrote this ticket - I came to the same conclusion so I think you're on the right lines here. Thank you for confirming the issue and looking into it 🙏

@dashpole
Copy link
Contributor

open-telemetry/opentelemetry-go#5754

@MrAlias MrAlias added this to the v1.30.0 milestone Aug 29, 2024
@dashpole
Copy link
Contributor

Fixed by open-telemetry/opentelemetry-go#5754

@tjsampson
Copy link

I am pretty sure this is still an issue or something else in the golang otel ecosystem. I will get a pprof setup possibly tomorrow, but here's some anecdotal evidence I have:

Screenshot 2024-11-07 at 6 49 14 PM

Pretty easy to see when tracing was implemented from that graph. And yes. I have removed our tracing implementation and it's back to normal memory usage.

Here is a rough draft of our setup. Please let me know if I am doing anything egregiously dumb, but for the most part, it's all pretty standard stuff take from various docs:

go.mod

go 1.23.2

require (
	go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.56.0
	go.opentelemetry.io/otel v1.31.0
	go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetricgrpc v1.31.0
	go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.31.0
	go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.31.0
	go.opentelemetry.io/otel/sdk v1.31.0
	go.opentelemetry.io/otel/trace v1.31.0
)

We are wrapping the otelhttp.NewHandler around the toplevel muxer, so everything is traced. Yes, I know this is expensive, but it shouldn't leak memory. Eventually we will change this to include/exclude/drop stuff, just so we aren't taking in so much volume (ping routes, health checks, etc.) and do more aggressive down sampling.

func NewApi(c config.Config) *Api {
	return &Api{
		c:           &c,
		controllers: []Controller{newControllers(c)},
		server: &http.Server{
			ReadTimeout:  c.ReadTimeout,
			WriteTimeout: c.WriteTimeout,
			IdleTimeout:  c.IdleTimeout,
			Addr:         fmt.Sprintf(":%d", c.Port),
			Handler:      otelhttp.NewHandler(chi.NewMux(), "INGRESS", otelhttp.WithFilter(traceFilter)),
		},
		done:       make(chan bool),
		sigChannel: make(chan os.Signal, 1024),
	}
}

Here is how we are initializing our trace and metrics providers once on boot:

// TracerProvider an OTLP exporter, and configures the corresponding trace provider.
func TracerProvider(ctx context.Context, res *resource.Resource) (func(context.Context) error, error) {
	// Set up a trace exporter
	traceExporter, err := otlptrace.New(ctx, otlptracegrpc.NewClient())
	if err != nil {
		return nil, errors.Wrap(err, "failed to create trace exporter")
	}

	// Register the trace exporter with a TracerProvider, using a batch
	// span processor to aggregate spans before export.
	tracerProvider := sdktrace.NewTracerProvider(
		sdktrace.WithSampler(sdktrace.AlwaysSample()),
		sdktrace.WithResource(res),
		sdktrace.WithBatcher(traceExporter),
	)
	otel.SetTracerProvider(tracerProvider)
	otel.SetTextMapPropagator(
		propagation.NewCompositeTextMapPropagator(
			propagation.TraceContext{},
			propagation.Baggage{},
		))

	// Shutdown will flush any remaining spans and shut down the exporter.
	return tracerProvider.Shutdown, nil
}

// MeterProvider an OTLP exporter, and configures the corresponding meter provider.
func MeterProvider(ctx context.Context, res *resource.Resource) (func(context.Context) error, error) {
	metricExporter, err := otlpmetricgrpc.New(ctx)
	if err != nil {
		return nil, errors.Wrap(err, "failed to create metric exporter")
	}

	meterProvider := sdkmetric.NewMeterProvider(
		sdkmetric.WithReader(sdkmetric.NewPeriodicReader(metricExporter)),
		sdkmetric.WithResource(res),
	)
	otel.SetMeterProvider(meterProvider)

	return meterProvider.Shutdown, nil
}

Then called and shutdown on main:

	shutDownTracer, err := traceinstrument.TracerProvider(ctx, traceRes)
	if err != nil {
		log.Logger.Fatal("failed to create trace provider", zap.Error(err))
	}
	
	defer func(onShutdown func(ctx context.Context) error) {
		if errr := onShutdown(ctx); errr != nil {
			log.Logger.Error("error shutting down trace provider", zap.Error(errr))
		}
	}(shutDownTracer)

	shutdownTraceMetrics, err := traceinstrument.MeterProvider(ctx, traceRes)
	if err != nil {
		log.Logger.Fatal("failed to create meter provider", zap.Error(err))
	}

	defer func(onShutdown func(ctx context.Context) error) {
		if errr := onShutdown(ctx); errr != nil {
			log.Logger.Error("error shutting down metrics provider", zap.Error(errr))
		}
	}(shutdownTraceMetrics)

Note. We are also using the otelhttp.NewTransport to wrap the default logging transport:

http.DefaultTransport = otelhttp.NewTransport(http.DefaultTransport)

If we remove tracing setup, memory usage goes back to normal. So the leak is definitely in our tracing setup.

@dmathieu
Copy link
Member

dmathieu commented Nov 8, 2024

You should open a new issue for this, as this isn't the same issue since you're configuring a meter provider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: instrumentation Related to an instrumentation package bug Something isn't working instrumentation: otelhttp
Projects
None yet
Development

No branches or pull requests

6 participants