Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add use_batch_format for HTTP output plugin #8184

Merged
merged 7 commits into from
Oct 29, 2021
Merged

feat: Add use_batch_format for HTTP output plugin #8184

merged 7 commits into from
Oct 29, 2021

Conversation

HeikoSchlittermann
Copy link
Contributor

@HeikoSchlittermann HeikoSchlittermann commented Sep 27, 2020

Code shameless stolen from file output plugin.

Co-authored-by: Heiko Schlittermann hs@schlittermann.de

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

On unit tests: there is no unit test for the default for the batched output format in any place. But I did manual testing using a simple "echo" HTTP server.

It would be great if you can include this patch, as we need this for production, and you would make our life a lot easier. Please do not ask why we need to send the records non-batched, as this is a request from the side processing the data.

@reimda
Copy link
Contributor

reimda commented Sep 28, 2020

If a change doesn't benefit many users it doesn't make sense for us to merge it.

Why do you need to send records non-batched? It must bad for performance. I saw that you don't want me to ask but we won't merge unless there's a good reason that is useful generally. Does it help you debug or log something?

@HeikoSchlittermann
Copy link
Contributor Author

HeikoSchlittermann commented Sep 29, 2020

We deliver the metrics to a business unit that uses logstash/elasticsearch. They do not want to process the metrics from a JSON array, as this would be an exception for their input processing. (They (and we) accept the performance penalty, as we are not going to send a significant amount of data.)

The default (batched format) isn't touched, and the code change is small, actually its one additional condition that shouldn't sacrifice too much performance for users that do not use it. And this change makes the HTTP output plugin more compatible with the FILE output plugin.

And yes, it even helped me in understanding and debugging HTTP related issues.

@M0rdecay
Copy link
Contributor

M0rdecay commented Oct 1, 2020

Why not try to move the data input from the telegraf into a dedicated pipeline and already in it split one event (incoming JSON array) into separate events (metric)?

Logstash seems to be a better place to do this. Moreover, this is a fairly simple task - https://www.elastic.co/guide/en/logstash/current/plugins-filters-split.html.

@HeikoSchlittermann
Copy link
Contributor Author

From technical point of view I fully agree, and would never have had the idea to use a single HTTP request per event (metric). But the integration into an existing infrastructure requires us to do so. For this reason we implemented (copied from file output plugin) the unbatched output. I believe, the comment in the README and the example config should make clear, that this unbatched output should not be used without any reason.

The impact of this change for existing usecases (batched output) should be minimal (one additional conditional).

Merging this change would greatly improve the chance of integrating telegraph into an existing large-scale infrastructure.

@HeikoSchlittermann
Copy link
Contributor Author

Is there any chance to get this PR merged into the upstream sources? It would greatly improve the chance for the integration of telegraf into a legacy environment.

@donvipre
Copy link
Contributor

I think Heiko is only trying to have a similar behaviour like here:
https://www.elastic.co/guide/en/logstash/current/plugins-outputs-http.html

format string, one of ["json", "json_batch", "form", "message"]

The json "unbatched" seems a valid use case that is already implemented in telegraf via the file plugin.
This imitates only already existing code and already existing behaviour.

I would argue to merge that as it works quite well for the intention.

@sjwang90 sjwang90 added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Nov 25, 2020
@HeikoSchlittermann
Copy link
Contributor Author

I added a unit test for http batched/unbatched output. Hopefully this increases the chance to get this enhancement merged into the upstream.

@HeikoSchlittermann
Copy link
Contributor Author

Rebased to the latest master. go test in plugins/outputs/http/ runs successful. We still see the "unbatched" http output as a useful extension and we would be more than happy if this could be included in the upstream, as it makes integration of telegraf into a large scale (but legacy) infrastructure easier.

@HeikoSchlittermann HeikoSchlittermann changed the title Add use_batch_format for HTTP output plugin feat: Add use_batch_format for HTTP output plugin Aug 6, 2021
@HeikoSchlittermann
Copy link
Contributor Author

Please review and merge into upstream master, we still use this code in production and would be more than happy having it included in your upstream source, as this would greatly simplify our build pipeline.

And, besides this, we still think, this is a valuable option for debugging and for integrating into other environments (which can't handle batched HTTP output)

@HeikoSchlittermann
Copy link
Contributor Author

If #8229 is applied before this (#8184) the unit tests will fail, as there is a circular dependency which I will fix as soon as #8229 is applied.

Otherwise both PRs (this and the #8229) are independent on each other.

@HeikoSchlittermann
Copy link
Contributor Author

HeikoSchlittermann commented Sep 22, 2021

Hi, @sspaink I'd like to bring this PR to your attention, it is rebased onto the current master now. Having it integrated would help the deployment of Telegraf in a large environment, where for several (legacy) reasons the HTTP data have to be sent unbatched.

@HeikoSchlittermann
Copy link
Contributor Author

Rebased onto the current master and did a successful testrun.

@HeikoSchlittermann
Copy link
Contributor Author

Hi, is there anything I can do?

@HeikoSchlittermann
Copy link
Contributor Author

Hi Upstream,
I rebased my proposed changes onto the current master, the make test succeeds locally and the CI tests are in progress.
We (here at "big tech company") would be more than happy if you would integrate our enhancement.

@sspaink sspaink added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Oct 13, 2021
@sspaink
Copy link
Contributor

sspaink commented Oct 13, 2021

@HeikoSchlittermann sorry for the delay in response, I've marked the pr as ready for final review and we will review it soon.

@HeikoSchlittermann
Copy link
Contributor Author

HeikoSchlittermann commented Oct 13, 2021 via email

Copy link
Contributor

@powersj powersj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment about naming, but @reimda needs to re-review as well

@powersj
Copy link
Contributor

powersj commented Oct 21, 2021

@HeikoSchlittermann - I realize this has taken a year to get reviewed, but I am also still questioning who and where this would actually get used outside of your single use-case. In which, I would suggest you carry this as an external plugin rather than merging this in. I want to avoid creating confusion and potential performance issues for users.

What do you think?

@HeikoSchlittermann
Copy link
Contributor Author

@powersj Thank you for your response.

  • I do not see any performance impact as long as the batched output is enabled (which should be by default)
  • The option is can be clearly documented as debugging aid, performance killer, last resort, so the average user should be advised not to use it
  • The unbatched output can be found in a similar manner in the file output plugin, if I remember well

Creating an external plugin for a small enhancement to an existing plugin seems to be wasted effort, at least to me. Of course, if you reject that extension we either need to maintain our own fork of telegraf, or we can spend the time to turn it into an external plugin, depending on what our management decides. But in either case, the enhancement won't be available to the public (for internal management/licensing reasons) then. (I'm not predicting the future, but I can imagine, that one day somebody else will do about the same job, and IHMO that's not how Open Source is supposed to work.

(Management/Licensing: I can easily convince the management that we're contributing to open source and therefore have to make the change public. If we build this as an "own piece of software", I can't force them to release this to the public.)

@HeikoSchlittermann
Copy link
Contributor Author

HeikoSchlittermann commented Oct 29, 2021

Thanks, @reimda , I updated the docs as required, rebased onto master and ran the testsuite. Looks good from my POV

@reimda reimda merged commit 8552c11 into influxdata:master Oct 29, 2021
@HeikoSchlittermann
Copy link
Contributor Author

Thank you for merging and thank you for your great work @telegraf.

@HeikoSchlittermann HeikoSchlittermann deleted the http-unbatched branch November 1, 2021 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants