Fix ruby2 trap logging and worker row status update (log writing failed. can't be called from trap context) #2386

jrafanie · 2015-03-26T22:31:34Z

EDIT: Added worker_base tests and added signal handling of 'thin' based workers (UI/WebService workers).

Note, these changes to the 'thin' based workers also fixes an issue that was a problem with ruby 1.9.3, namely, UI/Webservice workers that are sent SIGINT/SIGTERM don't gracefully log their "exit" or update their worker row. This caused them the server to wait for them to timeout, instead of restarting them immediately.

Among other things, you can't log in the trap signal handler in ruby 2.0.
https://bugs.ruby-lang.org/issues/7917

Results in the following ugly messages: 'Log writing failed. can't be called from trap context'

Add a EvmServer binstub so EvmServer can be required directly.
Don't try to trap SIGKILL since it can't be trapped/ignored.
Cleanup WorkerBase initialization to simplify test setup.
Convert server signal 'trap' to normal SignalException handling.
Convert worker 'trap' to normal SignalException handling.
Add at_exit for thin based workers since thin traps interrupts.
Extract all of the commmon WS/UI worker script code to a mixin.

https://bugzilla.redhat.com/show_bug.cgi?id=1205407

jrafanie · 2015-03-26T22:33:29Z

@matthewd @tenderlove @Fryguy Please toss 🍅 I was going to try forking to test the server/worker stuff, but I'd need to rewrite too much code to get that working. I chose brittle stubbing for now. Note, I still have to add worker_base specs and make sure no subclasses modify start.

kbrock · 2015-03-27T13:58:05Z

could you rebase this? there is a fix on master to fix this build error
thanks

kbrock · 2015-03-27T13:59:54Z

this is great joe.
👍 LGTM (even without that spec)

This makes testing the EvmServer class possible. https://bugzilla.redhat.com/show_bug.cgi?id=1205407

See ManageIQ#2313 https://bugzilla.redhat.com/show_bug.cgi?id=1205407

https://bugzilla.redhat.com/show_bug.cgi?id=1205407

Among other things, you can't log in the trap signal handler in ruby 2.0. https://bugs.ruby-lang.org/issues/7917 https://bugzilla.redhat.com/show_bug.cgi?id=1205407

Because thin traps interrupts, SignalException/Interrupt exceptions are not raised to the UI and Web Service workers, causing them to exit without logging and updating it's worker status. The latter causes the server to not restart the worker as quickly. https://bugzilla.redhat.com/show_bug.cgi?id=1205407

https://bugzilla.redhat.com/show_bug.cgi?id=1205407

jrafanie · 2015-03-27T20:27:47Z

vmdb/lib/workers/evm_server.rb

@@ -63,12 +59,18 @@ def start
    end

    at_exit {
+      # TODO: should this be called on SOFT ints and SystemExit, not SIGINT?


@kbrock I think this is the cause of some of the "shutdown_and_exit" messages on the queue. We run MiqServer.stop on SIGINT... something for another day.

jrafanie · 2015-03-27T20:29:30Z

@Fryguy @kbrock @matthewd updated, ready for review.

kbrock · 2015-03-27T22:49:44Z

@jrafanie This looks good to me. Anything special I should keep in mind before merging?

jrafanie · 2015-03-27T23:02:34Z

Not really @kbrock. Only things to look for is that rails server still honors ctrl c, rake evm:start works as before, and that kill, kill -TERM pid, kill -INT pid all work gracefully on the evm_server and workers processes. They log their exit and update their status. Note, Ui/ws workers now exit and get restarted quickly on these interrupts.

Also, there's also a lingering bug to do later in the at_exit for the server where it logs it's exit but some other stuff it shouldn't on SIGINT.

https://bugzilla.redhat.com/show_bug.cgi?id=1205407

miq-bot · 2015-03-30T14:12:28Z

Checked commits jrafanie@b125f45 .. jrafanie@de72d60 with rubocop 0.27.1
9 files checked, 2 offenses detected

vmdb/lib/workers/evm_server.rb

🔶 - Line 5, Col 28 - Style/WordArray - Use %w or %W for array of words.

vmdb/lib/workers/worker_base.rb

🔶 - Line 8, Col 23 - Style/WordArray - Use %w or %W for array of words.

jrafanie · 2015-03-30T14:27:25Z

Note, pushed a new commit to remove the unused shebang lines in the worker/evm_server launching scripts since they're called with a custom command line and weren't marked as executable so they could never be launched without specifying the ruby/runner location.

Fryguy · 2015-03-30T14:51:27Z

👍 Will merge when green.

Fix ruby2 trap logging and worker row status update

Follow up to: ManageIQ#2386 https://bugzilla.redhat.com/show_bug.cgi?id=1205407

Add `extend ActiveSupport::Concern` to fix `wrong number of arguments (0 for 1) (ArgumentError)` Follow up to: ManageIQ#2386 https://bugzilla.redhat.com/show_bug.cgi?id=1205407

Since ruby 2.0, you cannot use a mutex in a trap context such as writing a log message. irb(main):001:0> require 'logger' => true irb(main):002:0> trap(:TERM) { Logger.new("stdout").info "EventMonitor#start: ignoring SIGTERM" } => "DEFAULT" irb(main):003:0> `kill #{Process.pid}` log writing failed. can't be called from trap context => "" Move this logic up to the caller of start and rescue a SignalException with "SIGTERM" message instead. See also: ManageIQ/manageiq#2386 I wrote a "throwaway test" to verify this...it's not worth keeping though: diff --git a/spec/models/manageiq/providers/kubernetes/container_manager/event_catcher_mixin_spec.rb b/spec/models/manageiq/providers/kubernetes/container_manager/event_catcher_mixin_spec.rb index 1dfbdfb..b51eaba 100644 --- a/spec/models/manageiq/providers/kubernetes/container_manager/event_catcher_mixin_spec.rb +++ b/spec/models/manageiq/providers/kubernetes/container_manager/event_catcher_mixin_spec.rb @@ -25,6 +25,17 @@ describe ManageIQ::Providers::Kubernetes::ContainerManager::EventCatcherMixin do end end + describe "something" do + it "does things" do + MyClass = test_class + $kube_log = Logger.new("logger.log") + cm = test_class.new(ems) + handle = cm.event_monitor_handle + allow(handle).to receive(:start).and_raise(SignalException.new("SIGTERM")) + cm.monitor_events + end + end + describe '#extract_event_data' do context 'given container event (Pod event with fieldPath)' do let(:kubernetes_event) do Results in this: 08:51:14 ~/Code/manageiq-providers-kubernetes (cannot_log_in_trap_context) (2.4.1) + cat logger.log I, [2017-08-17T08:45:30.523852 #51050] INFO -- : MyClass#monitor_events: ignoring SIGTERM

Note, this is nearly identical to ManageIQ/manageiq-providers-kubernetes#95 Since ruby 2.0, you cannot use a mutex in a trap context such as writing a log message. irb(main):001:0> require 'logger' => true irb(main):002:0> trap(:TERM) { Logger.new("stdout").info "EventMonitor#start: ignoring SIGTERM" } => "DEFAULT" irb(main):003:0> `kill #{Process.pid}` log writing failed. can't be called from trap context => "" Move this logic up to the caller of start and rescue a SignalException with "SIGTERM" message instead. See also: ManageIQ/manageiq#2386

It was added originally back in 2011 in manageiq: 946657b11a0a65abbcdca9975206d71f7c2b7afc This was actually copied from the vmware vim monitoring events code from 2009: 22a911716b40428b6d822f0f1e95aabd022e59c9 ``` def monitorEvents raise "monitorEvents: no block given" if !block_given? trap(:TERM) { $log.info "monitorEvents: ignoring SIGTERM" } ... ``` First of all, trap with logging doesn't work since ruby 2.0+: ``` irb(main):001:0> require 'logger' => true irb(main):002:0> trap(:TERM) { Logger.new("stdout").info "EventMonitor#start: ignoring SIGTERM" } => "DEFAULT" irb(main):003:0> `kill #{Process.pid}` log writing failed. can't be called from trap context => "" ``` See ManageIQ/manageiq#2386 Second, we need to TERM processes in container land.

See the same change in ovirt: ManageIQ/manageiq-providers-ovirt#80 It was added originally back in 2011 in manageiq: 946657b11a0a65abbcdca9975206d71f7c2b7afc This was actually copied from the vmware vim monitoring events code from 2009: 22a911716b40428b6d822f0f1e95aabd022e59c9 ``` def monitorEvents raise "monitorEvents: no block given" if !block_given? trap(:TERM) { $log.info "monitorEvents: ignoring SIGTERM" } ... ``` First of all, trap with logging doesn't work since ruby 2.0+: ``` irb(main):001:0> require 'logger' => true irb(main):002:0> trap(:TERM) { Logger.new("stdout").info "EventMonitor#start: ignoring SIGTERM" } => "DEFAULT" irb(main):003:0> `kill #{Process.pid}` log writing failed. can't be called from trap context => "" ``` See ManageIQ/manageiq#2386 Second, we need to TERM processes in container land.

jrafanie added ruby2 wip appliance labels Mar 26, 2015

jrafanie force-pushed the fix_ruby2_trap_logging branch 2 times, most recently from b71223f to e161028 Compare March 27, 2015 20:16

jrafanie added 7 commits March 27, 2015 16:21

Add a EvmServer binstub so EvmServer can be required directly.

b125f45

This makes testing the EvmServer class possible. https://bugzilla.redhat.com/show_bug.cgi?id=1205407

Don't try to trap SIGKILL since it can't be trapped/ignored.

400f9d9

See ManageIQ#2313 https://bugzilla.redhat.com/show_bug.cgi?id=1205407

Cleanup WorkerBase initialization to simplify test setup.

2222b20

https://bugzilla.redhat.com/show_bug.cgi?id=1205407

Convert server signal 'trap' to normal SignalException handling.

859b8a7

Among other things, you can't log in the trap signal handler in ruby 2.0. https://bugs.ruby-lang.org/issues/7917 https://bugzilla.redhat.com/show_bug.cgi?id=1205407

Convert worker 'trap' to normal SignalException handling.

5d67a7b

Among other things, you can't log in the trap signal handler in ruby 2.0. https://bugs.ruby-lang.org/issues/7917 https://bugzilla.redhat.com/show_bug.cgi?id=1205407

Extract all of the commmon WS/UI worker script code to a mixin.

1d5ecf9

https://bugzilla.redhat.com/show_bug.cgi?id=1205407

jrafanie force-pushed the fix_ruby2_trap_logging branch from e161028 to 1d5ecf9 Compare March 27, 2015 20:22

jrafanie reviewed Mar 27, 2015
View reviewed changes

jrafanie changed the title ~~[WIP] Fix ruby2 trap logging~~ Fix ruby2 trap logging Mar 27, 2015

jrafanie changed the title ~~Fix ruby2 trap logging~~ Fix ruby2 trap logging and worker row status update Mar 27, 2015

jrafanie removed the wip label Mar 27, 2015

These shebang lines are wrong and not used so remove them.

de72d60

https://bugzilla.redhat.com/show_bug.cgi?id=1205407

Fryguy added a commit that referenced this pull request Mar 30, 2015

Merge pull request #2386 from jrafanie/fix_ruby2_trap_logging

f826715

Fix ruby2 trap logging and worker row status update

Fryguy merged commit f826715 into ManageIQ:master Mar 30, 2015

Fryguy deleted the fix_ruby2_trap_logging branch March 30, 2015 15:30

Fryguy added this to the Sprint 21 Ending Mar 30, 2015 milestone Mar 30, 2015

jrafanie added a commit to jrafanie/manageiq that referenced this pull request Mar 30, 2015

Rename worker model mixin to be consistent with worker models.

3ded28b

Follow up to: ManageIQ#2386 https://bugzilla.redhat.com/show_bug.cgi?id=1205407

jrafanie mentioned this pull request Mar 30, 2015

Fix module collision and concern #2444

Merged

jrafanie added the needs changelog label Mar 31, 2015

jrafanie changed the title ~~Fix ruby2 trap logging and worker row status update~~ Fix ruby2 trap logging and worker row status update (log writing failed. can't be called from trap context) Mar 31, 2015

jrafanie mentioned this pull request Sep 9, 2015

Pass TERM Signal to VixDiskLibServer #4277

Closed

jrafanie mentioned this pull request Nov 8, 2016

Run a cockpit-ws as a worker #12506

Merged

jrafanie mentioned this pull request Aug 17, 2017

Move logging in trap context to a normal signal handler ManageIQ/manageiq-providers-kubernetes#95

Closed

jrafanie mentioned this pull request Aug 17, 2017

Move logging in trap context to a normal signal handler ManageIQ/manageiq-providers-ovirt#79

Closed

jrafanie mentioned this pull request Aug 17, 2017

Remove seemingly unnecessary ignoring of SIGTERM ManageIQ/manageiq-providers-ovirt#80

Merged

jrafanie mentioned this pull request Aug 17, 2017

Remove seemingly unnecessary ignoring of SIGTERM ManageIQ/manageiq-providers-kubernetes#96

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ruby2 trap logging and worker row status update (log writing failed. can't be called from trap context) #2386

Fix ruby2 trap logging and worker row status update (log writing failed. can't be called from trap context) #2386

jrafanie commented Mar 26, 2015

jrafanie commented Mar 26, 2015

kbrock commented Mar 27, 2015

kbrock commented Mar 27, 2015

jrafanie Mar 27, 2015

jrafanie commented Mar 27, 2015

kbrock commented Mar 27, 2015

jrafanie commented Mar 27, 2015

miq-bot commented Mar 30, 2015

jrafanie commented Mar 30, 2015

Fryguy commented Mar 30, 2015

Fix ruby2 trap logging and worker row status update (log writing failed. can't be called from trap context) #2386

Fix ruby2 trap logging and worker row status update (log writing failed. can't be called from trap context) #2386

Conversation

jrafanie commented Mar 26, 2015

jrafanie commented Mar 26, 2015

kbrock commented Mar 27, 2015

kbrock commented Mar 27, 2015

jrafanie Mar 27, 2015

Choose a reason for hiding this comment

jrafanie commented Mar 27, 2015

kbrock commented Mar 27, 2015

jrafanie commented Mar 27, 2015

miq-bot commented Mar 30, 2015

jrafanie commented Mar 30, 2015

Fryguy commented Mar 30, 2015