Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cloud_pubsub): Add support for gzip compression #13094

Merged
merged 16 commits into from
May 2, 2023
Merged

feat(cloud_pubsub): Add support for gzip compression #13094

merged 16 commits into from
May 2, 2023

Conversation

dayvar14
Copy link
Contributor

@dayvar14 dayvar14 commented Apr 14, 2023

Required for all PRs

resolves #9157

Revives a PR that was adding support for Cloud PubSub gzip compression

@telegraf-tiger
Copy link
Contributor

Thanks so much for the pull request!
🤝 ✒️ Just a reminder that the CLA has not yet been signed, and we'll need it before merging. Please sign the CLA when you get a chance, then post a comment here saying !signed-cla

@dayvar14 dayvar14 changed the title Gzip pubsub feat: Add support for gzip compression to cloud_pubsub input and output Apr 14, 2023
@dayvar14 dayvar14 changed the title feat: Add support for gzip compression to cloud_pubsub input and output feat(inputs.cloud_pubsub, outputs.cloud_pubsub): Add support for gzip compression Apr 14, 2023
@dayvar14
Copy link
Contributor Author

!signed-cla

@dayvar14 dayvar14 marked this pull request as ready for review April 15, 2023 12:54
@dayvar14 dayvar14 changed the title feat(inputs.cloud_pubsub, outputs.cloud_pubsub): Add support for gzip compression feat(cloud_pubsub): Add support for gzip compression Apr 15, 2023
Copy link
Contributor

@powersj powersj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this up! Have you been able to try the artifacts with cloud_pubsub input and output?

I have a few comments but no big concerns at this time.

@powersj powersj self-assigned this Apr 19, 2023
@dayvar14
Copy link
Contributor Author

Thanks for putting this up! Have you been able to try the artifacts with cloud_pubsub input and output?

I have a few comments but no big concerns at this time.

Currently we run a modification of this code at my company. It's running at a huge scale with no problems and tons of cost savings. Looking to test this branch directly and see if we have any issues. Will update

@dayvar14
Copy link
Contributor Author

dayvar14 commented Apr 26, 2023

I ran this in an linux amd64 container with 18k metrics input from a pubsub and output to a pubsub over a period of 6 hours (3k/hour).

Copy link
Contributor

@powersj powersj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for submitting this PR and making some changes. I left a few small comments, but I am going to mark this as approved, so Sven can take a look as well.

@powersj powersj added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Apr 27, 2023
Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dayvar14 thanks for reviving the matter! I do have a few comments...

@srebhan srebhan added feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin cloud Issues or requests around cloud environments plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels Apr 28, 2023
@srebhan srebhan added the plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins label Apr 28, 2023
Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I have three more comments that improve the formatting a bit. Nothing too big...

@dayvar14
Copy link
Contributor Author

Once this looks good, I want to test once more in a large traffic environment to see if this still works as expected.

Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more suggestion...

@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented May 2, 2023

Download PR build artifacts for linux_amd64.tar.gz, darwin_amd64.tar.gz, and windows_amd64.zip.
Downloads for additional architectures and packages are available below.

🥳 This pull request decreases the Telegraf binary size by -3.00 % for linux amd64 (new size: 167.6 MB, nightly size 172.8 MB)

📦 Click here to get additional PR build artifacts

Artifact URLs

DEB RPM TAR GZ ZIP
amd64.deb aarch64.rpm darwin_amd64.tar.gz windows_amd64.zip
arm64.deb armel.rpm darwin_arm64.tar.gz windows_arm64.zip
armel.deb armv6hl.rpm freebsd_amd64.tar.gz windows_i386.zip
armhf.deb i386.rpm freebsd_armv7.tar.gz
i386.deb ppc64le.rpm freebsd_i386.tar.gz
mips.deb riscv64.rpm linux_amd64.tar.gz
mipsel.deb s390x.rpm linux_arm64.tar.gz
ppc64el.deb x86_64.rpm linux_armel.tar.gz
riscv64.deb linux_armhf.tar.gz
s390x.deb linux_i386.tar.gz
linux_mips.tar.gz
linux_mipsel.tar.gz
linux_ppc64le.tar.gz
linux_riscv64.tar.gz
linux_s390x.tar.gz

Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @dayvar14 for taking care of this!

@srebhan srebhan merged commit 872d517 into influxdata:master May 2, 2023
@dayvar14
Copy link
Contributor Author

dayvar14 commented May 2, 2023

@srebhan @powersj Thanks for the reviews!

@dayvar14 dayvar14 deleted the gzip-pubsub branch May 2, 2023 19:04
@dayvar14
Copy link
Contributor Author

dayvar14 commented May 2, 2023

Getting this error. @srebhan Are we sure the mutex lock wasn't necessary?

2023-05-02T19:49:16Z I! Loading config: /etc/telegraf/telegraf.conf
2023-05-02T19:49:16Z I! Starting Telegraf 1.27.0-04f26f45
2023-05-02T19:49:16Z I! Available plugins: 235 inputs, 9 aggregators, 27 processors, 22 parsers, 57 outputs, 2 secret-stores
2023-05-02T19:49:16Z I! Loaded inputs: cloud_pubsub internal
2023-05-02T19:49:16Z I! Loaded aggregators: 
2023-05-02T19:49:16Z I! Loaded processors: 
2023-05-02T19:49:16Z I! Loaded secretstores: 
2023-05-02T19:49:16Z I! Loaded outputs: influxdb prometheus_client
2023-05-02T19:49:16Z I! Tags enabled: 
2023-05-02T19:49:16Z I! [agent] Config: Interval:15s, Quiet:false, Hostname:"", Flush Interval:15s
2023-05-02T19:49:16Z I! [outputs.prometheus_client] Listening on http://[::]:9273/metrics
2023-05-02T19:49:16Z I! [inputs.cloud_pubsub] Starting receiver for subscription test-subscription...
2023-05-02T19:49:17Z E! [inputs.cloud_pubsub] Error in plugin: unable to add message from subscription test-subscription: unable to decompress gzip message: flate: corrupt input before offset 75
2023-05-02T19:49:17Z E! [inputs.cloud_pubsub] Error in plugin: unable to add message from subscription test-subscription: unable to decompress gzip message: flate: corrupt input before offset 80
2023-05-02T19:49:17Z E! [inputs.cloud_pubsub] Error in plugin: unable to add message from subscription test-subscription: unable to decompress gzip message: flate: corrupt input before offset 1
2023-05-02T19:49:17Z E! [inputs.cloud_pubsub] Error in plugin: unable to add message from subscription test-subscription: unable to decompress gzip message: flate: corrupt input before offset 1
2023-05-02T19:49:18Z E! [inputs.cloud_pubsub] Error in plugin: unable to add message from subscription test-subscription: unable to decompress gzip message: flate: corrupt input before offset 10855
2023-05-02T19:49:18Z E! [inputs.cloud_pubsub] Error in plugin: unable to add message from subscription test-subscription: unable to decompress gzip message: flate: corrupt input before offset 10855

My environment consists of one telegraf instance outputting to a pubsub in gzip format and then another telegraf instance reading from that pubsub. I used the same configuration when testing in previous commits.

goroutine 203 [running]:
compress/flate.(*decompressor).Read(0xc00079cc00, {0xc007000000, 0x80000, 0x0?})
        /usr/local/go/src/compress/flate/inflate.go:338 +0x20c
compress/gzip.(*Reader).Read(0xc00094a580, {0xc007000000, 0x80000, 0x80000})
        /usr/local/go/src/compress/gzip/gunzip.go:252 +0xbb
io.(*LimitedReader).Read(0xc007c912c0, {0xc007000000?, 0x0?, 0xc0083adc18?})
        /usr/local/go/src/io/io.go:477 +0x45
bytes.(*Buffer).ReadFrom(0xc000811ec0, {0x6fcf600, 0xc007c912c0})
        /usr/local/go/src/bytes/buffer.go:202 +0x98
io.copyBuffer({0x6fbe780, 0xc000811ec0}, {0x6fcf600, 0xc007c912c0}, {0x0, 0x0, 0x0})
        /usr/local/go/src/io/io.go:413 +0x14b
io.Copy(...)
        /usr/local/go/src/io/io.go:386
io.CopyN({0x6fbe780, 0xc000811ec0}, {0x6fc1400?, 0xc00094a580}, 0x1f400000)
        /usr/local/go/src/io/io.go:362 +0x9a
github.com/influxdata/telegraf/internal.(*GzipDecoder).Decode(0xc000319f00, {0xc00812e000, 0x8143, 0xa000}, 0x1f400000)
        /go/src/github.com/influxdata/telegraf/internal/content_coding.go:230 +0xdc
github.com/influxdata/telegraf/plugins/inputs/cloud_pubsub.(*PubSub).decompressData(0x0?, {0xc00812e000?, 0xc000216680?, 0x0?})
        /go/src/github.com/influxdata/telegraf/plugins/inputs/cloud_pubsub/cloud_pubsub.go:219 +0x6f
github.com/influxdata/telegraf/plugins/inputs/cloud_pubsub.(*PubSub).onMessage(0xc00045d680, {0x7024370, 0xc006c8c900}, {0x7031ed0?, 0xc007c70620})
        /go/src/github.com/influxdata/telegraf/plugins/inputs/cloud_pubsub/cloud_pubsub.go:174 +0xe7
github.com/influxdata/telegraf/plugins/inputs/cloud_pubsub.(*PubSub).startReceiver.func1({0x7024370?, 0xc006c8c900?}, {0x7031ed0?, 0xc007c70620?})
        /go/src/github.com/influxdata/telegraf/plugins/inputs/cloud_pubsub/cloud_pubsub.go:154 +0x4f
github.com/influxdata/telegraf/plugins/inputs/cloud_pubsub.(*gcpSubscription).Receive.func1({0x7024370, 0xc006c8c900}, 0xc000316ee0)
        /go/src/github.com/influxdata/telegraf/plugins/inputs/cloud_pubsub/subscription_gcp.go:42 +0x77
cloud.google.com/go/pubsub.(*Subscription).Receive.func2.2({0x5ae4740?, 0xc000316ee0?})
        /go/pkg/mod/cloud.google.com/go/pubsub@v1.28.0/subscription.go:1184 +0x87
cloud.google.com/go/pubsub/internal/scheduler.(*ReceiveScheduler).Add.func1()
        /go/pkg/mod/cloud.google.com/go/pubsub@v1.28.0/internal/scheduler/receive_scheduler.go:84 +0x2e
created by cloud.google.com/go/pubsub/internal/scheduler.(*ReceiveScheduler).Add
        /go/pkg/mod/cloud.google.com/go/pubsub@v1.28.0/internal/scheduler/receive_scheduler.go:82 +0x352

@srebhan
Copy link
Member

srebhan commented May 3, 2023

Can you check with the mutex back in? Please open an issue anyway!

@dayvar14
Copy link
Contributor Author

dayvar14 commented May 3, 2023

Can you check with the mutex back in? Please open an issue anyway!

Mutex lock fixed the issue. Making an issue now

@dayvar14
Copy link
Contributor Author

dayvar14 commented May 3, 2023

#13236

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cloud Issues or requests around cloud environments feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins plugin/output 1. Request for new output plugins 2. Issues/PRs that are related to out plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Google Cloud Pubsub: allows messages to be gzipped for sending and receiving
4 participants