Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/stanza] Extract file reading #11076

Merged

Conversation

djaglowski
Copy link
Member

@djaglowski djaglowski commented Jun 16, 2022

This is a major step towards #10286.

I expect to follow this up with some cleanup PRs, but wanted to leave as much as possible in place so as not to further bloat an already huge PR.

  • The new package name and location should be considered temporary. I expect that this package will be exposed and available for other use cases, but would like to at least clean it up before doing this.
  • Some variable names are a very poor fit in the new design, but were intentionally left in place to minimize the diff.
  • The biggest point of deficiency in this design relates to encodings. I think the file tracking logic should either emit []byte and leave decoding to the caller, or it should manage all decoding. Currently, this is somewhat awkwardly split across the interface. This would be a good topic to address in a followup PR.
  • Many tests are duplicated, but I believe there is some value in ensuring that the higher level caller is validating expectations on its own.
  • The design document is duplicated. Later, I expect it will reside in the new package only.
  • Many of the configuration tests need to be reviewed and updated in the new package (eg those that contain operator IDs)

@djaglowski djaglowski added the Skip Changelog PRs that do not require a CHANGELOG.md entry label Jun 16, 2022
@djaglowski djaglowski marked this pull request as ready for review June 16, 2022 21:01
@djaglowski djaglowski requested review from a team and dmitryax June 16, 2022 21:01
@TylerHelmuth
Copy link
Member

@djaglowski do all the file additions actually need reviewed or are they just being moved into this repo for the first time from logs-collection?

@djaglowski
Copy link
Member Author

@TylerHelmuth, the files are moved & slightly modified from their previous location in pkg/stanza/operator/input/file.

Perhaps @jsirianni could give this a once over?

@djaglowski
Copy link
Member Author

There are some minor memory utilization improvements here. This is somewhat expected because Reader's are no longer maintaining their own decode buffers, but instantiating new ones as necessary. This was prompted by the challenge of cleaning up decode logic, mentioned above, but appears to be a notable positive impact.

main

pkg: github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/file
BenchmarkFileInput/Single-10         	 1238762	       983.8 ns/op	     672 B/op	       9 allocs/op
BenchmarkFileInput/Glob-10           	  295359	      4146 ns/op	    2691 B/op	      36 allocs/op
BenchmarkFileInput/MultiGlob-10      	  284752	      4243 ns/op	    2691 B/op	      36 allocs/op
BenchmarkFileInput/MaxConcurrent-10  	  154560	      6515 ns/op	    2690 B/op	      36 allocs/op
BenchmarkFileInput/FngrPrntLarge-10  	 1242770	       983.3 ns/op	     672 B/op	       9 allocs/op
BenchmarkFileInput/FngrPrntSmall-10  	 1294358	       961.0 ns/op	     672 B/op	       9 allocs/op
PASS
coverage: 72.0% of statements
ok  	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/input/file	54.505s

This branch:

pkg: github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/internal/fileconsumer
BenchmarkFileInput/Single-10         	 1298786	       938.1 ns/op	       0 B/op	       0 allocs/op
BenchmarkFileInput/Glob-10           	  244022	      4179 ns/op	       3 B/op	       0 allocs/op
BenchmarkFileInput/MultiGlob-10      	  246051	      4086 ns/op	       3 B/op	       0 allocs/op
BenchmarkFileInput/MaxConcurrent-10  	  134785	      7449 ns/op	       2 B/op	       0 allocs/op
BenchmarkFileInput/FngrPrntLarge-10  	 1329999	       921.2 ns/op	       0 B/op	       0 allocs/op
BenchmarkFileInput/FngrPrntSmall-10  	 1037469	       978.9 ns/op	       0 B/op	       0 allocs/op
PASS
coverage: 73.4% of statements
ok  	github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/internal/fileconsumer	53.433s

@jsirianni
Copy link
Member

@TylerHelmuth, the files are moved & slightly modified from their previous location in pkg/stanza/operator/input/file.

Perhaps @jsirianni could give this a once over?

Running great for me 👍

@djaglowski djaglowski force-pushed the pkg-stanza-extract-file-reading branch from cfa3d1e to b9de38e Compare June 17, 2022 20:04
@djaglowski
Copy link
Member Author

@TylerHelmuth, do you mind taking another look at this, now that #11119 and #11124 are merged?

@djaglowski djaglowski merged commit 6673d2f into open-telemetry:main Jun 20, 2022
@djaglowski djaglowski deleted the pkg-stanza-extract-file-reading branch June 20, 2022 15:32
@djaglowski djaglowski changed the title Pkg stanza extract file reading [pkg/stanza] Extract file reading Jun 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Skip Changelog PRs that do not require a CHANGELOG.md entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants