Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix task event handler simul multi warnings #2807

Merged

Conversation

matthewrmshin
Copy link
Contributor

@matthewrmshin matthewrmshin commented Oct 17, 2018

Fix #2806.

(Plus new functionality: task output based events).

@matthewrmshin matthewrmshin added the bug Something is wrong :( label Oct 17, 2018
@matthewrmshin matthewrmshin added this to the next release milestone Oct 17, 2018
@matthewrmshin matthewrmshin self-assigned this Oct 17, 2018
The bug: on receiving multiple simultaneous warning messages from a task
job, only 1 warning event handler gets triggered.

This fix ensures that multiple warning event handlers are triggered.
@matthewrmshin matthewrmshin force-pushed the fix-task-event-handler-multi-simul-warnings branch from 170f3d2 to 8fa52c5 Compare October 17, 2018 14:50
Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, tests as working - thanks - just one question.

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me, but alas I don't know much about event handlers in Cylc to give a more useful review here (yet!!!). Feel free to request others to review it too if you need anybody else besides @hjoliver reviewing it 👍

@hjoliver
Copy link
Member

alas I don't know much about event handlers in Cylc to give a more useful review here (yet!!!).

(@matthewrmshin - we're planning to remedy this situation tomorrow 💥 ... maybe no need to reassign)

@kinow
Copy link
Member

kinow commented Oct 19, 2018

@matthewrmshin had a session on event handlers and on this pull request with @hjoliver (thanks!!!). Feeling like I know a bit more about how that works in Cylc and that I can try to help with this review again (bear with me a little longer).

Successfully reproduced the problem on master (left; note that it printed cat and fish, so the output in master is not deterministic), then used the fix in this pull request and it produced the correct output (right).

ok

But then decided to log fish twice... maybe because I had two files with the same name, or just wanted to raise that message twice. And it was logged once.

other

@matthewrmshin , from what I understood, the solution in this pull request is now using the message as an identifier of the event? If that's correct, then I think this is the desired output.

But would there be cases where users still want to submit the same message multiple times?

Will review the code with more care from home either tonight or during this weekend (a break for when the weather is not so nice, or am too sunburned 🌤 ).

Cheers
Bruno

@hjoliver
Copy link
Member

@matthewrmshin - actually I think @kinow raises a good point. I think I can imagine use cases where the same WARNING event, for instance, happens multiple times within the same job, potentially with the same message (e.g. "database offline", or whatever). So maybe we do need to handle every single message as a unique event.

@hjoliver
Copy link
Member

(@kinow - don't feel obliged to work on the weekend! 🌞)

@matthewrmshin
Copy link
Contributor Author

In which case, I'll try to use a counter to number the non-unique events.

@kinow
Copy link
Member

kinow commented Oct 19, 2018

Tested, and works as expected @matthewrmshin ! Thanks for the quick fix! Here's my output

kinow@ranma:/tmp/foo$ cylc cat-log foo | grep WARNING
2018-10-20T09:16:55+13 WARNING - [foo.1] -(current:running)> cat at 2018-10-20T09:16:54+13
2018-10-20T09:16:55+13 WARNING - [foo.1] -(current:running)> dog at 2018-10-20T09:16:54+13
2018-10-20T09:16:55+13 WARNING - [foo.1] -(current:running)> fish at 2018-10-20T09:16:55+13
2018-10-20T09:16:55+13 WARNING - [foo.1] -(current:running)> fish at 2018-10-20T09:16:55+13
kinow@ranma:/tmp/foo$ cylc cat-log -f a foo foo.1
[jobs-submit ret_code] 0
[jobs-submit out] 2018-10-20T09:15:59+13|1/foo/01|0|6273
2018-10-20T09:15:59+13 [STDOUT] 6273
[(('event-handler-00', 'warning-1'), 1) cmd] echo "HANDLED cat"
[(('event-handler-00', 'warning-1'), 1) ret_code] 0
[(('event-handler-00', 'warning-1'), 1) out] HANDLED cat
[(('event-handler-00', 'warning-3'), 1) cmd] echo "HANDLED fish"
[(('event-handler-00', 'warning-3'), 1) ret_code] 0
[(('event-handler-00', 'warning-3'), 1) out] HANDLED fish
[(('event-handler-00', 'warning-2'), 1) cmd] echo "HANDLED dog"
[(('event-handler-00', 'warning-2'), 1) ret_code] 0
[(('event-handler-00', 'warning-2'), 1) out] HANDLED dog
[jobs-submit ret_code] 0
[jobs-submit out] 2018-10-20T09:16:54+13|1/foo/01|0|6524
2018-10-20T09:16:54+13 [STDOUT] 6524
[(('event-handler-00', 'warning-1'), 1) cmd] echo "HANDLED cat"
[(('event-handler-00', 'warning-1'), 1) ret_code] 0
[(('event-handler-00', 'warning-1'), 1) out] HANDLED cat
[(('event-handler-00', 'warning-3'), 1) cmd] echo "HANDLED fish"
[(('event-handler-00', 'warning-3'), 1) ret_code] 0
[(('event-handler-00', 'warning-3'), 1) out] HANDLED fish
[(('event-handler-00', 'warning-2'), 1) cmd] echo "HANDLED dog"
[(('event-handler-00', 'warning-2'), 1) ret_code] 0
[(('event-handler-00', 'warning-2'), 1) out] HANDLED dog
[(('event-handler-00', 'warning-4'), 1) cmd] echo "HANDLED fish"
[(('event-handler-00', 'warning-4'), 1) ret_code] 0
[(('event-handler-00', 'warning-4'), 1) out] HANDLED fish

Minor note, is that in the docs I think we would have to mention that the message would happen twice, but in no guaranteed order. if a user needed messages in order, we would have to include the time in the key? But there is a workaround of asking users to generate a message with the timestamp and sort out the ordering in their side, which could be a messaging system or log handler system.

So LGTM +1 🎉

@hjoliver
Copy link
Member

hjoliver commented Oct 21, 2018

@kinow - regarding ordering, I think this is fine as-is. Normally (in real jobs) there would be a sufficient delay between messages that they will be processed in order by virtue of the fact that they are received one at a time by the suite daemon. But if messages are emitted near-simultaneously, users should probably not rely on the order. Plus I can't think of a situation where that would matter.

@kinow
Copy link
Member

kinow commented Oct 21, 2018

Sure thing @hjoliver , only mentioned because a co-worker in a previous company was telling me about an interesting case he had to use Kafka and keep the order of messages (can't recall what for though). I think it is one of the guarantees in Kafka. But agree it may not be worth worrying about it now 👍 what's more important is that this fix appears to be working OK 🎉

@matthewrmshin
Copy link
Contributor Author

(Note: event handlers are currently invoked using a parallel pool of subprocesses - so order cannot be guaranteed unless we impose a workflow on them.)

@hjoliver
Copy link
Member

Merging 🎆

@hjoliver hjoliver merged commit 7f56b7b into cylc:master Oct 22, 2018
@matthewrmshin matthewrmshin deleted the fix-task-event-handler-multi-simul-warnings branch October 23, 2018 08:33
@@ -18,6 +18,7 @@

"""Provide a class to represent a task proxy in a running suite."""

from collections import Counter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collections.Counter is a Python2.7 feature!

https://docs.python.org/2/library/collections.html

@hjoliver
Copy link
Member

(Retroactively approving this - I meant to/thought I had before merging ... @oliver-sanders' comment on Counter notwithstanding).

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved (I guess the official approval switch does not appear, post merge!)

@hjoliver
Copy link
Member

collections.Counter is a Python2.7 feature!

Good spotting (very). Can we just require 2.7 at this point 🐑-😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is wrong :(
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants