Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(inputs.gcs): Google Cloud Storage Input Plugin #8413

Merged
merged 110 commits into from
Sep 19, 2022

Conversation

gkatzioura
Copy link
Contributor

@gkatzioura gkatzioura commented Nov 15, 2020

This is a pull request for the issue #8412

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

How this Telegraf plugin works with Google Cloud Storage

GCS Telegraf integration

Google Cloud Storage list files in lexicographic order. Files are stored with a date prefix, or their lexicographic order is based on the time the files were created. A GCS crawler will list the files on GCS in order to parse them and submit them to an output. After every Gather execution the last file processed will be stored on GCS as an offset. The next Gather command will continue from that offset.

How Gather works

  • Entries on GCS are stored with a name that represents the time they got created ie. 1604148850991 or 2020-01-10T05:45:01
  • Telegraf input plugin is configured to crawl data on a GCS bucket with a specified prefix.
  • Input plugin can use an offset to list the data after that offset
  • If there is no offset will start from beginning
  • On every Gather() execution the Input plugin will save the last file it processed on GCS. The purpose is to use this file in order to retrieve the offset and to use it for the next iteration
  • The files on GCS should contain measurements so that telegraph can parse them
  • Due to a page of items fetched from GCS can be bigger than what Telegraph can process in one Gather() action user can specify a limit of files processed

Offset Diagram

gkatzioura added a commit to gkatzioura/telegraf that referenced this pull request Nov 15, 2020
@gkatzioura
Copy link
Contributor Author

gkatzioura commented Nov 16, 2020

The error are related with the protobuf update and the ericchiang/k8s@v1.2.0/watch/versioned/generated.pb.go

Actually it is described here as an issue.
ericchiang/k8s#125

What could be the options?
Swapping the generated offending file?
Maybe an external plugin can be a solution (although this is a go pull request)

@gkatzioura
Copy link
Contributor Author

gkatzioura commented Nov 17, 2020

I added those lines on a circleci build to highlight the dependency problem (on purpose to highlight the issue, don't want to change the build)

- run: chmod 777 /go/pkg/mod/github.com/ericchiang/k8s@v1.2.0/watch/versioned/generated.pb.go
- run: sed 's/github.com\/ericchiang.k8s.watch.versioned.Event/k8s.io.kubernetes.pkg.watch.versioned.Event/g' /go/pkg/mod/github.com/ericchiang/k8s@v1.2.0/watch/versioned/generated.pb.go > newfile.go
- run: mv newfile.go /go/pkg/mod/github.com/ericchiang/k8s@v1.2.0/watch/versioned/generated.pb.go

A google sdk upgrade due to the need of the new protobuf package will have this problem, due to that line ericchiang/k8s#125

@sjwang90 sjwang90 added the plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins label Jan 22, 2021
@sjwang90 sjwang90 added the cloud Issues or requests around cloud environments label Jul 14, 2021
@sspaink
Copy link
Contributor

sspaink commented Mar 1, 2022

@gkatzioura sorry no one has responded to you yet, are you still interested in working on this plugin? We recently moved away from ericchiang/k8s to kubernetes/client-go maybe the problem has been resolved in the newer repo? Creating an external plugin as you suggested might be a good option as well.

@sspaink sspaink added the waiting for response waiting for response from contributor label Mar 1, 2022
@gkatzioura
Copy link
Contributor Author

gkatzioura commented Mar 1, 2022

Hi @sspaink More than happy! Will spend some time to get back on track.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 1, 2022
@gkatzioura
Copy link
Contributor Author

Hi @sspaink, seems the issues on the build are resolved. Will do some extra testing.
I think it's good to go for a review.

@sspaink
Copy link
Contributor

sspaink commented Mar 29, 2022

!signed-cla

@sspaink sspaink changed the title Google Cloud Storage Input Plugin feat(inputs.gcs): Google Cloud Storage Input Plugin Mar 29, 2022
Copy link
Contributor

@sspaink sspaink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your continued worked on this plugin, you will also need to add a README.md that contains the sample configuration, a description of the plugin, how to use it, etc. The more detail the better.

@gkatzioura gkatzioura requested a review from sspaink April 25, 2022 06:46
gkatzioura added a commit to gkatzioura/telegraf that referenced this pull request Apr 25, 2022
gkatzioura added a commit to gkatzioura/telegraf that referenced this pull request Apr 25, 2022
@gkatzioura
Copy link
Contributor Author

Hi @sspaink . Did the changes, some renaming and registering the plugin among the other input plugins.

Copy link
Contributor

@sspaink sspaink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gkatzioura thank you for your continued work on this. Recently the Telegraf project changed how the sample configs are maintained, instead of duplicating it in the README.md and the code it is now generated from a single source of truth. I added some suggested changes to help migrate your plugin to use the new structure, hopefully that will make it easy. You will also need to add a file called "sample.conf" that contains the sample configuration.

The input plugin guidelines were updated as well if that makes it more clear: https://github.com/influxdata/telegraf/blob/master/docs/INPUTS.md#input-plugin-guidelines

@sspaink sspaink added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Aug 31, 2022
@sspaink
Copy link
Contributor

sspaink commented Aug 31, 2022

@gkatzioura do you have time to resolve the conflicts and address the sample config change? No worries if you don't, I wouldn't mind creating a separate pull request to resolve the issues.

@sspaink sspaink added waiting for response waiting for response from contributor and removed ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. labels Aug 31, 2022
@gkatzioura
Copy link
Contributor Author

Hi @sspaink checking!

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Sep 9, 2022
gkatzioura added a commit to gkatzioura/telegraf that referenced this pull request Sep 11, 2022
gkatzioura added a commit to gkatzioura/telegraf that referenced this pull request Sep 11, 2022
gkatzioura and others added 6 commits September 19, 2022 17:41
Co-authored-by: Sebastian Spaink <3441183+sspaink@users.noreply.github.com>
Co-authored-by: Sebastian Spaink <3441183+sspaink@users.noreply.github.com>
Co-authored-by: Sebastian Spaink <3441183+sspaink@users.noreply.github.com>
Co-authored-by: Sebastian Spaink <3441183+sspaink@users.noreply.github.com>
@sspaink sspaink added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Sep 19, 2022
@telegraf-tiger
Copy link
Contributor

Download PR build artifacts for linux_amd64.tar.gz, darwin_amd64.tar.gz, and windows_amd64.zip.
Downloads for additional architectures and packages are available below.

⚠️ This pull request increases the Telegraf binary size by 1.01 % for linux amd64 (new size: 154.1 MB, nightly size 152.6 MB)

📦 Click here to get additional PR build artifacts

Artifact URLs

DEB RPM TAR GZ ZIP
amd64.deb aarch64.rpm darwin_amd64.tar.gz windows_amd64.zip
arm64.deb armel.rpm darwin_arm64.tar.gz windows_i386.zip
armel.deb armv6hl.rpm freebsd_amd64.tar.gz
armhf.deb i386.rpm freebsd_armv7.tar.gz
i386.deb ppc64le.rpm freebsd_i386.tar.gz
mips.deb riscv64.rpm linux_amd64.tar.gz
mipsel.deb s390x.rpm linux_arm64.tar.gz
ppc64el.deb x86_64.rpm linux_armel.tar.gz
riscv64.deb linux_armhf.tar.gz
s390x.deb linux_i386.tar.gz
linux_mips.tar.gz
linux_mipsel.tar.gz
linux_ppc64le.tar.gz
linux_riscv64.tar.gz
linux_s390x.tar.gz
static_linux_amd64.tar.gz

@gkatzioura
Copy link
Contributor Author

@sspaink all green

@MyaLongmire MyaLongmire merged commit e5ee9e1 into influxdata:master Sep 19, 2022
dba-leshop pushed a commit to dba-leshop/telegraf that referenced this pull request Oct 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cloud Issues or requests around cloud environments new plugin plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants