Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: optimizes file copies to and from containers #2450

Merged
merged 3 commits into from
Apr 3, 2024

Conversation

codefromthecrypt
Copy link
Contributor

@codefromthecrypt codefromthecrypt commented Mar 27, 2024

What does this PR do?

This changes code interacting with file copies to and from the container to use optimized functions. Doing so reduces buffering and gives a chance for to use Go 1.22's optimized paths for linux.

Why is it important?

I'm using rather large images in k3s. For example, nodejs images get easily over a GB each. I found this copy logic accounts for the majority of test fixture setup, in our case sometimes over a minute is spent here even when images are available locally.

Related issues

Originally added in #347

How to test this PR

you can use k3s and its LoadImages function which will hit all the paths here.

Signed-off-by: Adrian Cole <adrian@tetrate.io>
@codefromthecrypt codefromthecrypt requested a review from a team as a code owner March 27, 2024 04:33
Copy link

netlify bot commented Mar 27, 2024

Deploy Preview for testcontainers-go ready!

Name Link
🔨 Latest commit 61a8c3d
🔍 Latest deploy log https://app.netlify.com/sites/testcontainers-go/deploys/6603aae270d3130008b05817
😎 Deploy Preview https://deploy-preview-2450--testcontainers-go.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Adrian Cole added 2 commits March 27, 2024 13:58
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Signed-off-by: Adrian Cole <adrian@tetrate.io>
@codefromthecrypt
Copy link
Contributor Author

I think the failure is a flake?

           	            	-  RyukDisabled: (bool) true,
          	            	+  RyukDisabled: (bool) false,

@mdelapenya
Copy link
Member

Hey @codefromthecrypt I've done a super quick and dirty benchmark for this improvement, and I'm sharing here the results.

I'm not doing them to demonstrate anything against the PR, but to learn myself from the process, as I've been thinking about benchmarks more and more in the recent times. So please take this as me doing an exercise to learn, and I'd love to receive feedback if possible.

Here I go!

Env

goos: darwin
goarch: arm64
GOMAXPROCS=8

Code

I created a LoadImagesOld method in the k3s module, which is a copy of the LoadImages one, but calling a copy of the old CopyFileToContainer method (before these changes). I also created a benchmark function to be run in sub-benchmarks: one for the new method, and one for the old one. Each benchmark will load the nginx image into the k3s cluster.

func BenchmarkLoadImages(b *testing.B) {
	// Give up to three minutes to run this test
	ctx, cancel := context.WithDeadline(context.Background(), time.Now().Add(3*time.Minute))
	defer cancel()

	k3sContainer, err := k3s.RunContainer(ctx,
		testcontainers.WithImage("docker.io/rancher/k3s:v1.27.1-k3s1"),
	)
	if err != nil {
		b.Fatal(err)
	}

	// Clean up the container
	defer func() {
		if err := k3sContainer.Terminate(ctx); err != nil {
			b.Fatal(err)
		}
	}()

	provider, err := testcontainers.ProviderDocker.GetProvider()
	if err != nil {
		b.Fatal(err)
	}

	// ensure nginx image is available locally
	err = provider.PullImage(ctx, "nginx")
	if err != nil {
		b.Fatal(err)
	}

	b.ResetTimer() // Reset the benchmark timer

	b.Run("Old copy method", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			ctx, cancel := context.WithDeadline(context.Background(), time.Now().Add(3*time.Minute))
			defer cancel()
			err := k3sContainer.LoadImagesOld(ctx, "nginx")
			if err != nil {
				b.Fatal(err)
			}
		}
	})

	b.Run("New copy method", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			ctx, cancel := context.WithDeadline(context.Background(), time.Now().Add(3*time.Minute))
			defer cancel()
			err := k3sContainer.LoadImages(ctx, "nginx")
			if err != nil {
				b.Fatal(err)
			}
		}
	})
}

Benchmark execution

Run benchmarks 5 times, including memory profile (bytes and allocations per operation):

go test -bench=. -benchmem -count 5 -run=^#

Benchmarks results

Benchmark ns/op B/op allocs/op
BenchmarkLoadImages/Old_copy_method-8 33,176,551,708 460,569,136 3,702
BenchmarkLoadImages/Old_copy_method-8 33,715,342,958 460,480,384 2,715
BenchmarkLoadImages/Old_copy_method-8 31,288,304,583 460,465,152 2,632
BenchmarkLoadImages/Old_copy_method-8 32,554,835,500 460,480,336 2,716
BenchmarkLoadImages/Old_copy_method-8 31,915,499,583 460,471,904 2,714
BenchmarkLoadImages/New_copy_method-8 33,774,419,125 269,623,592 2,720
BenchmarkLoadImages/New_copy_method-8 31,926,620,417 269,623,272 2,707
BenchmarkLoadImages/New_copy_method-8 31,896,055,375 269,616,432 2,720
BenchmarkLoadImages/New_copy_method-8 33,230,731,916 269,640,440 2,827
BenchmarkLoadImages/New_copy_method-8 33,030,320,250 269,622,944 2,712

Results

With the above numbers, it seems obvious that the Bytes per operation is way lower with the new method (from 460's to 270's). The other two values, ns and allocations per operation seems more or less the same: the code is not much faster nor produce less allocations, but it uses less memory.

Comment on lines +617 to +618
// In Go 1.22 os.File is always an io.WriterTo. However, testcontainers
// currently allows Go 1.21, so we need to trick the compiler a little.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We test against both versions of the language, although we always develop in the lowest one. It could be probably interesting working the other way around: always develop in the latest release, and run the tests for both (current and current -1). Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, I think we should test at least 2 versions of Go anyway, and usually things like this don't come up too often.

In some projects I tend to do the develop in latest and test the floor model, just because a lot of devs always use latest first.

Copy link
Member

@mdelapenya mdelapenya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@mdelapenya mdelapenya self-assigned this Apr 3, 2024
@mdelapenya mdelapenya added the feature New functionality or new behaviors on the existing one label Apr 3, 2024
@mdelapenya mdelapenya merged commit 697c264 into testcontainers:main Apr 3, 2024
102 checks passed
@codefromthecrypt codefromthecrypt deleted the reduce-buffering branch April 3, 2024 12:56
@codefromthecrypt
Copy link
Contributor Author

Thanks for sharing the results @mdelapenya, and also merging! I'm not too surprised about the benchmarks at the moment as most of the potential is only implemented in linux right now.

It could be neat to run the same in a linux container to see how much difference it makes where the go side is optimized, but not required on my side. Cheers!

mdelapenya added a commit to coffeegoddd/testcontainers-go that referenced this pull request Apr 12, 2024
* main: (115 commits)
  chore: create TLS certs in a consistent manner (testcontainers#2478)
  chore(deps): bump idna from 3.6 to 3.7 (testcontainers#2480)
  Elasticsearch disable CA retrieval when ssl is disabled (testcontainers#2475)
  fix: handle dockerignore exclusions properly (testcontainers#2476)
  chore: prepare for next minor development cycle (0.31.0)
  chore: use new version (v0.30.0) in modules and examples
  Fix url creation to handle query params when using HTTP wait strategy (testcontainers#2466)
  fix: data race on container run (testcontainers#2345)
  fix: logging deadlock (testcontainers#2346)
  feat(k6):Add remote test scripts (testcontainers#2350)
  feat: optimizes file copies to and from containers (testcontainers#2450)
  fix(exec): updates the `Multiplexed` opt to combine stdout and stderr (testcontainers#2452)
  Upgrade neo4j module to use features from v0.29.1 of testcontainers-go (testcontainers#2463)
  bug:Fix AMQPS url (testcontainers#2462)
  chore: more compose updates in comments
  chore: use "docker compose" (v2) instead of "docker-compose" (v1) (testcontainers#2464)
  chore(deps): bump github/codeql-action from 2.22.12 to 3.24.9 (testcontainers#2459)
  refactor: Add Weaviate modules tests (testcontainers#2447)
  feat(exitcode): Add exit code sugar method (testcontainers#2342)
  feat: add module to support InfluxDB v1.x (testcontainers#1703)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality or new behaviors on the existing one
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants