-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script processor #10850
Add script processor #10850
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love this!
I think it may be worth aligning option names with the equivalent names in ingest node. E.g. "code" => "source" & so on.
// session is a javascript runtime environment used throughout the life of | ||
// the processor instance. | ||
type session struct { | ||
vm *goja.Runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the doc
Is it goroutine-safe?
No. An instance of goja.Runtime can only be used by a single goroutine at a time. You can create as many instances of Runtime as you like but it's not possible to pass object values between runtimes.
IIRC, I think this might be fine because when we call publish on the client we actually have a mutex around the processor calls. But if we ever change the contract that could cause issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change that assumption we'll have to take a look at each processor to make sure they are thread-safe.
} | ||
rtn, err := run(event) | ||
return rtn, annotateError(p.ID, err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are actually opening the door for all kinds of scripts here, what comes to my mind are really long execution times or ReDoS cases, looking at the readme the VM support interruption
So I think we should set a default timeout and allow people to overrides that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea.
I love the idea of this! However, it would be great if you could elaborate more on the motivation for adding this. I think taking on 120k lines of code and a Javascript interpreter is a pretty big deal. Especially considering:
How about making this processor a "real module" that beats can import if they need it? |
Big +1 on this from me. I'd rather give each Beat the choice to opt in to importing the processor. |
2b6fb06
to
67c0820
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made updates to address the comments.
- Added a timeout option.
- Removed automatic inclusion of the processor by libbeat. I imported it in the log processing beats and made the documentation conditional.
- Added a test for
Get
ing andObject
and mentioned this in the docs.
// session is a javascript runtime environment used throughout the life of | ||
// the processor instance. | ||
type session struct { | ||
vm *goja.Runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change that assumption we'll have to take a look at each processor to make sure they are thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy with the decision of keeping this experimental for now, I understand the usefulness of having the power of a scripting language in beats but I am also worried about the activity around the javascript engine that we vendor.
NIT: I would probably change the options to be close with the ingest node instead of the ruby filter, since we are usually interacting more often with pipelines.
t := time.AfterFunc(s.timeout, func() { | ||
s.vm.Interrupt(timeoutError) | ||
}) | ||
defer t.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewkroh By curiosity, what was the impact of this change in your benchmark? I am asking this because I know there is channel involved when under the hood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see the resulting benchmark in this commit:
427d5d9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BenchmarkBeatEventV0/timeout_Put-12 1000000 1510 ns/op
BenchmarkBeatEventV0/Object_Put_Key-12 2000000 631 ns/op
OK let's keep it as is, for now, we might be able to optimize it later.
Yep. We can try it out. If we find a major show stopper bug with the engine that would take significant effort to fix we “cancel” the experiment. And I’ll make the change to align the config more with the painless script processor in ingest node. |
The script processor executes Javascript code to process an event. The processor uses a pure Go implementation of ECMAScript 5.1. This can be useful in situations where one of the other processors doesn’t provide the functionality you need to filter events. The processor can be configured by embedding Javascript in your configuration file or by pointing the processor at external file(s). See the included documentation for the full details. processors: - script: type: javascript code: > function process(event) { event.Tag("js"); } For observability it can add metrics with a histogram of the execution time and a counter for the number of exceptions that occur. Benchmark Results Here's a benchmark to get a rough idea of how long it takes to process an event in the processor's runtime goos: darwin goarch: amd64 pkg: github.com/elastic/beats/libbeat/processors/script/javascript BenchmarkBeatEventV0/Put-12 2000000 712 ns/op BenchmarkBeatEventV0/Object_Put_Key-12 2000000 615 ns/op BenchmarkBeatEventV0/Get-12 2000000 769 ns/op BenchmarkBeatEventV0/Get_Undefined_Key-12 1000000 1010 ns/op BenchmarkBeatEventV0/fields_get_key-12 2000000 836 ns/op BenchmarkBeatEventV0/Get_@metadata-12 2000000 676 ns/op BenchmarkBeatEventV0/Put_@metadata-12 2000000 736 ns/op BenchmarkBeatEventV0/Delete_@metadata-12 2000000 651 ns/op BenchmarkBeatEventV0/Cancel-12 3000000 565 ns/op BenchmarkBeatEventV0/Tag-12 2000000 865 ns/op BenchmarkBeatEventV0/AppendTo-12 2000000 630 ns/op
This adds an optional timeout configuration option to the processor that will timeout the execution by interrupting the JS code. I updated the benchmark to test each case with and without the timeout enabled. BenchmarkBeatEventV0/Put-12 2000000 707 ns/op BenchmarkBeatEventV0/timeout_Put-12 1000000 1510 ns/op BenchmarkBeatEventV0/Object_Put_Key-12 2000000 631 ns/op BenchmarkBeatEventV0/timeout_Object_Put_Key-12 1000000 1252 ns/op BenchmarkBeatEventV0/Get-12 2000000 750 ns/op BenchmarkBeatEventV0/timeout_Get-12 1000000 1495 ns/op BenchmarkBeatEventV0/Get_Undefined_Key-12 1000000 1039 ns/op BenchmarkBeatEventV0/timeout_Get_Undefined_Key-12 1000000 2108 ns/op BenchmarkBeatEventV0/fields_get_key-12 2000000 1044 ns/op BenchmarkBeatEventV0/timeout_fields_get_key-12 1000000 2051 ns/op BenchmarkBeatEventV0/Get_@metadata-12 2000000 750 ns/op BenchmarkBeatEventV0/timeout_Get_@metadata-12 1000000 1745 ns/op BenchmarkBeatEventV0/Put_@metadata-12 1000000 1048 ns/op BenchmarkBeatEventV0/timeout_Put_@metadata-12 500000 2623 ns/op BenchmarkBeatEventV0/Delete_@metadata-12 2000000 842 ns/op BenchmarkBeatEventV0/timeout_Delete_@metadata-12 1000000 1629 ns/op BenchmarkBeatEventV0/Cancel-12 2000000 759 ns/op BenchmarkBeatEventV0/timeout_Cancel-12 1000000 1329 ns/op BenchmarkBeatEventV0/Tag-12 1000000 1189 ns/op BenchmarkBeatEventV0/timeout_Tag-12 1000000 1973 ns/op BenchmarkBeatEventV0/AppendTo-12 2000000 644 ns/op BenchmarkBeatEventV0/timeout_AppendTo-12 1000000 1347 ns/op
Rather than have libbeat force the inclusion of the script processor in all Beats, make each Beat opt into the processor.
Module is an overloaded term.
e8477db
to
ea47f77
Compare
jenkins, test this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, let's make a followup PR to reject it from CM.
This sounds really interesting! I have a question related to that, will it make their way to v6 or it will be v7 only? |
@JCMais It will be a feature in v7+ only. |
Follows the strategy used in #10850 to check for goroutine leaks on tests in some places related to the leaks investigated as part of #9302. Several things here: * New helper to reuse the goroutine checker, it also prints now the goroutines dump on failure. * Add goroutine checks for autodiscover tests * Add goroutine checks for CloseOnSignal and SubOutlet tests (they detect the issues solved by #11263) * Add goroutine checks for log input tests (they detect issues solved by #12125 and #12164)
The script processor executes Javascript code to process an event. The processor uses a pure Go implementation of ECMAScript 5.1. This can be useful in situations where one of the other processors doesn’t provide the functionality you need to filter events. The processor can be configured by embedding Javascript in your configuration file or by pointing the processor at external file(s). See the included documentation for the full details. processors: - script: type: javascript code: > function process(event) { event.Tag("js"); } For observability it can add metrics with a histogram of the execution time and a counter for the number of exceptions that occur. There is an optional timeout configuration option to the processor that will timeout the execution by interrupting the JS code. Rather than have libbeat force the inclusion of the script processor in all Beats, make each Beat opt into the processor. It is imported by the log processing Beats (filebeat, journalbeat, and winlogbeat). Benchmark Results Here's a benchmark to get a rough idea of how long it takes to process an event in the processor's runtime I updated the benchmark to test each case with and without the timeout enabled. BenchmarkBeatEventV0/Put-12 2000000 707 ns/op BenchmarkBeatEventV0/timeout_Put-12 1000000 1510 ns/op BenchmarkBeatEventV0/Object_Put_Key-12 2000000 631 ns/op BenchmarkBeatEventV0/timeout_Object_Put_Key-12 1000000 1252 ns/op BenchmarkBeatEventV0/Get-12 2000000 750 ns/op BenchmarkBeatEventV0/timeout_Get-12 1000000 1495 ns/op BenchmarkBeatEventV0/Get_Undefined_Key-12 1000000 1039 ns/op BenchmarkBeatEventV0/timeout_Get_Undefined_Key-12 1000000 2108 ns/op BenchmarkBeatEventV0/fields_get_key-12 2000000 1044 ns/op BenchmarkBeatEventV0/timeout_fields_get_key-12 1000000 2051 ns/op BenchmarkBeatEventV0/Get_@metadata-12 2000000 750 ns/op BenchmarkBeatEventV0/timeout_Get_@metadata-12 1000000 1745 ns/op BenchmarkBeatEventV0/Put_@metadata-12 1000000 1048 ns/op BenchmarkBeatEventV0/timeout_Put_@metadata-12 500000 2623 ns/op BenchmarkBeatEventV0/Delete_@metadata-12 2000000 842 ns/op BenchmarkBeatEventV0/timeout_Delete_@metadata-12 1000000 1629 ns/op BenchmarkBeatEventV0/Cancel-12 2000000 759 ns/op BenchmarkBeatEventV0/timeout_Cancel-12 1000000 1329 ns/op BenchmarkBeatEventV0/Tag-12 1000000 1189 ns/op BenchmarkBeatEventV0/timeout_Tag-12 1000000 1973 ns/op BenchmarkBeatEventV0/AppendTo-12 2000000 644 ns/op BenchmarkBeatEventV0/timeout_AppendTo-12 1000000 1347 ns/op
I recommend reading the docs then reviewing the non-vendor changes which is ~1400 lines.
The script processor executes Javascript code to process an event. The processor uses a pure
Go implementation of ECMAScript 5.1. This can be useful in situations where one of the other
processors doesn’t provide the functionality you need to filter events.
The processor can be configured by embedding Javascript in your configuration file or by pointing
the processor at external file(s). See the included documentation for the full details.
For observability it can add metrics with a histogram of the execution time and a counter
for the number of exceptions that occur.
Benchmark Results
Here's a benchmark to get a rough idea of how long it takes to process an event in the processor's runtime.