-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GrokTests unit tests fail with "Unable to find pattern [] in Grok's pattern dictionary" #43673
Comments
Pinging @elastic/es-search |
Pinging @elastic/es-core-features |
It is related to the ingest grok processor, @talevy can you take a look and reassign if needed ? |
I investigated this issue in the past and wasn't sure how to resolve it. I'll take a look. |
"testExponentialExpressions" failed for the same reason on CI: |
I hit this locally on master but wasn't able to reproduce.
|
ugh sorry folks. I'll get back to this. I still have no clue how this is happening |
A while ago jcodings was upgraded (#43334), but joni wasn't. I've opened a pr to upgrade joni (#47374). This may not be causing the grok failures, but it isn't ideal either. The version jcodings ( |
thanks Jack, I was hoping some code I changed would expose the error here, but it doesn't look like it |
Add more information to the error message to figure out what is going on. Relates to elastic#43673
Looks like this failure is caused by the fact that the thread that executes this test suite is interrupted. The I found this, because I opened #48284, that adds more info to an exception message. I think that this failure happens, because the test framework is reusing threads and perhaps the interrupted status isn't reset, which causes this test to fail (joni checks whether a thread is interrupted and then stops searching for matches). I think we need to check the current thread's interrupted status before executing these tests and unset the interrupted status. |
There is a watchdog in order to avoid long running (and expensive) grok expressions. Currently the watchdog is thread based, threads that run grok expressions are registered and after completion unregister. If these threads stay registered for too long then the watch dog interrupts these threads. Joni (the library that powers grok expressions) has a mechanism that checks whether the current thread is interrupted and if so abort the pattern matching. Newer versions have an additional method to abort long running pattern matching inside joni. Instead of checking the thread's interrupted flag, joni now also checks a volatile field that can be set via a `Matcher` instance. This is more efficient method for aborting long running matches. (joni checks each 30k iterations whether interrupted flag is set vs. just checking a volatile field) Recently we upgraded to a recent joni version (elastic#47374), and this PR is a followup of that PR. This change should also fix elastic#43673, since it appears when unit tests are ran the a test runner thread's interrupted flag may already have been set, due to some thread reuse.
…48346) There is a watchdog in order to avoid long running (and expensive) grok expressions. Currently the watchdog is thread based, threads that run grok expressions are registered and after completion unregister. If these threads stay registered for too long then the watch dog interrupts these threads. Joni (the library that powers grok expressions) has a mechanism that checks whether the current thread is interrupted and if so abort the pattern matching. Newer versions have an additional method to abort long running pattern matching inside joni. Instead of checking the thread's interrupted flag, joni now also checks a volatile field that can be set via a `Matcher` instance. This is more efficient method for aborting long running matches. (joni checks each 30k iterations whether interrupted flag is set vs. just checking a volatile field) Recently we upgraded to a recent joni version (#47374), and this PR is a followup of that PR. This change should also fix #43673, since it appears when unit tests are ran the a test runner thread's interrupted flag may already have been set, due to some thread reuse.
…48346) There is a watchdog in order to avoid long running (and expensive) grok expressions. Currently the watchdog is thread based, threads that run grok expressions are registered and after completion unregister. If these threads stay registered for too long then the watch dog interrupts these threads. Joni (the library that powers grok expressions) has a mechanism that checks whether the current thread is interrupted and if so abort the pattern matching. Newer versions have an additional method to abort long running pattern matching inside joni. Instead of checking the thread's interrupted flag, joni now also checks a volatile field that can be set via a `Matcher` instance. This is more efficient method for aborting long running matches. (joni checks each 30k iterations whether interrupted flag is set vs. just checking a volatile field) Recently we upgraded to a recent joni version (#47374), and this PR is a followup of that PR. This change should also fix #43673, since it appears when unit tests are ran the a test runner thread's interrupted flag may already have been set, due to some thread reuse.
This problem occurred in the 7.5 branch in https://gradle-enterprise.elastic.co/s/rck3jol2wuye6 Is it worth backporting the fix to 7.5? |
Tests are failing regularly on 7.5 branch so I muted the whole GrokTests class that branch in a508b10. Here is the build scan of today's failure (https://gradle-enterprise.elastic.co/s/y4nfudrjljl3s) but you can find ~10 for the last 30 days in the build stats. Also, these tests are timing out regularly too in both 7.5 and 7.5 branches (builds scans:https://gradle-enterprise.elastic.co/s/mhly7mq3el6vy, https://gradle-enterprise.elastic.co/s/2xdryigg2jsn6) |
I'm going to close it as there are no failures since February and we are done with 7.5 branch |
Example build failure
https://scans.gradle.com/s/iepue6nvnpjco/tests/vmr3zh2fll5n6-jtrpes23gv6dm
Reproduction line
does not reproduce locally
Example relevant log:
Frequency
reproduces ~ 6 times in the past 30 days
/cc @talevy
The text was updated successfully, but these errors were encountered: