-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix ruby2 trap logging and worker row status update (log writing failed. can't be called from trap context) #2386
Conversation
@matthewd @tenderlove @Fryguy Please toss 🍅 I was going to try forking to test the server/worker stuff, but I'd need to rewrite too much code to get that working. I chose brittle stubbing for now. Note, I still have to add worker_base specs and make sure no subclasses modify |
could you rebase this? there is a fix on master to fix this build error |
this is great joe. |
b71223f
to
e161028
Compare
This makes testing the EvmServer class possible. https://bugzilla.redhat.com/show_bug.cgi?id=1205407
Among other things, you can't log in the trap signal handler in ruby 2.0. https://bugs.ruby-lang.org/issues/7917 https://bugzilla.redhat.com/show_bug.cgi?id=1205407
Among other things, you can't log in the trap signal handler in ruby 2.0. https://bugs.ruby-lang.org/issues/7917 https://bugzilla.redhat.com/show_bug.cgi?id=1205407
Because thin traps interrupts, SignalException/Interrupt exceptions are not raised to the UI and Web Service workers, causing them to exit without logging and updating it's worker status. The latter causes the server to not restart the worker as quickly. https://bugzilla.redhat.com/show_bug.cgi?id=1205407
e161028
to
1d5ecf9
Compare
@@ -63,12 +59,18 @@ def start | |||
end | |||
|
|||
at_exit { | |||
# TODO: should this be called on SOFT ints and SystemExit, not SIGINT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kbrock I think this is the cause of some of the "shutdown_and_exit" messages on the queue. We run MiqServer.stop
on SIGINT... something for another day.
@jrafanie This looks good to me. Anything special I should keep in mind before merging? |
Not really @kbrock. Only things to look for is that rails server still honors ctrl c, rake evm:start works as before, and that kill, kill -TERM pid, kill -INT pid all work gracefully on the evm_server and workers processes. They log their exit and update their status. Note, Ui/ws workers now exit and get restarted quickly on these interrupts. Also, there's also a lingering bug to do later in the at_exit for the server where it logs it's exit but some other stuff it shouldn't on SIGINT. |
Checked commits jrafanie@b125f45 .. jrafanie@de72d60 with rubocop 0.27.1 vmdb/lib/workers/evm_server.rb
vmdb/lib/workers/worker_base.rb
|
Note, pushed a new commit to remove the unused shebang lines in the worker/evm_server launching scripts since they're called with a custom command line and weren't marked as executable so they could never be launched without specifying the ruby/runner location. |
👍 Will merge when green. |
Fix ruby2 trap logging and worker row status update
Add `extend ActiveSupport::Concern` to fix `wrong number of arguments (0 for 1) (ArgumentError)` Follow up to: ManageIQ#2386 https://bugzilla.redhat.com/show_bug.cgi?id=1205407
Since ruby 2.0, you cannot use a mutex in a trap context such as writing a log message. irb(main):001:0> require 'logger' => true irb(main):002:0> trap(:TERM) { Logger.new("stdout").info "EventMonitor#start: ignoring SIGTERM" } => "DEFAULT" irb(main):003:0> `kill #{Process.pid}` log writing failed. can't be called from trap context => "" Move this logic up to the caller of start and rescue a SignalException with "SIGTERM" message instead. See also: ManageIQ/manageiq#2386 I wrote a "throwaway test" to verify this...it's not worth keeping though: diff --git a/spec/models/manageiq/providers/kubernetes/container_manager/event_catcher_mixin_spec.rb b/spec/models/manageiq/providers/kubernetes/container_manager/event_catcher_mixin_spec.rb index 1dfbdfb..b51eaba 100644 --- a/spec/models/manageiq/providers/kubernetes/container_manager/event_catcher_mixin_spec.rb +++ b/spec/models/manageiq/providers/kubernetes/container_manager/event_catcher_mixin_spec.rb @@ -25,6 +25,17 @@ describe ManageIQ::Providers::Kubernetes::ContainerManager::EventCatcherMixin do end end + describe "something" do + it "does things" do + MyClass = test_class + $kube_log = Logger.new("logger.log") + cm = test_class.new(ems) + handle = cm.event_monitor_handle + allow(handle).to receive(:start).and_raise(SignalException.new("SIGTERM")) + cm.monitor_events + end + end + describe '#extract_event_data' do context 'given container event (Pod event with fieldPath)' do let(:kubernetes_event) do Results in this: 08:51:14 ~/Code/manageiq-providers-kubernetes (cannot_log_in_trap_context) (2.4.1) + cat logger.log I, [2017-08-17T08:45:30.523852 #51050] INFO -- : MyClass#monitor_events: ignoring SIGTERM
Note, this is nearly identical to ManageIQ/manageiq-providers-kubernetes#95 Since ruby 2.0, you cannot use a mutex in a trap context such as writing a log message. irb(main):001:0> require 'logger' => true irb(main):002:0> trap(:TERM) { Logger.new("stdout").info "EventMonitor#start: ignoring SIGTERM" } => "DEFAULT" irb(main):003:0> `kill #{Process.pid}` log writing failed. can't be called from trap context => "" Move this logic up to the caller of start and rescue a SignalException with "SIGTERM" message instead. See also: ManageIQ/manageiq#2386
It was added originally back in 2011 in manageiq: 946657b11a0a65abbcdca9975206d71f7c2b7afc This was actually copied from the vmware vim monitoring events code from 2009: 22a911716b40428b6d822f0f1e95aabd022e59c9 ``` def monitorEvents raise "monitorEvents: no block given" if !block_given? trap(:TERM) { $log.info "monitorEvents: ignoring SIGTERM" } ... ``` First of all, trap with logging doesn't work since ruby 2.0+: ``` irb(main):001:0> require 'logger' => true irb(main):002:0> trap(:TERM) { Logger.new("stdout").info "EventMonitor#start: ignoring SIGTERM" } => "DEFAULT" irb(main):003:0> `kill #{Process.pid}` log writing failed. can't be called from trap context => "" ``` See ManageIQ/manageiq#2386 Second, we need to TERM processes in container land.
See the same change in ovirt: ManageIQ/manageiq-providers-ovirt#80 It was added originally back in 2011 in manageiq: 946657b11a0a65abbcdca9975206d71f7c2b7afc This was actually copied from the vmware vim monitoring events code from 2009: 22a911716b40428b6d822f0f1e95aabd022e59c9 ``` def monitorEvents raise "monitorEvents: no block given" if !block_given? trap(:TERM) { $log.info "monitorEvents: ignoring SIGTERM" } ... ``` First of all, trap with logging doesn't work since ruby 2.0+: ``` irb(main):001:0> require 'logger' => true irb(main):002:0> trap(:TERM) { Logger.new("stdout").info "EventMonitor#start: ignoring SIGTERM" } => "DEFAULT" irb(main):003:0> `kill #{Process.pid}` log writing failed. can't be called from trap context => "" ``` See ManageIQ/manageiq#2386 Second, we need to TERM processes in container land.
EDIT: Added worker_base tests and added signal handling of 'thin' based workers (UI/WebService workers).
Note, these changes to the 'thin' based workers also fixes an issue that was a problem with ruby 1.9.3, namely, UI/Webservice workers that are sent SIGINT/SIGTERM don't gracefully log their "exit" or update their worker row. This caused them the server to wait for them to timeout, instead of restarting them immediately.
Among other things, you can't log in the trap signal handler in ruby 2.0.
https://bugs.ruby-lang.org/issues/7917
Results in the following ugly messages: 'Log writing failed. can't be called from trap context'
Add a EvmServer binstub so EvmServer can be required directly.
Don't try to trap SIGKILL since it can't be trapped/ignored.
Cleanup WorkerBase initialization to simplify test setup.
Convert server signal 'trap' to normal SignalException handling.
Convert worker 'trap' to normal SignalException handling.
Add at_exit for thin based workers since thin traps interrupts.
Extract all of the commmon WS/UI worker script code to a mixin.
https://bugzilla.redhat.com/show_bug.cgi?id=1205407