Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ruby2 trap logging and worker row status update (log writing failed. can't be called from trap context) #2386

Merged
merged 8 commits into from
Mar 30, 2015

Conversation

jrafanie
Copy link
Member

EDIT: Added worker_base tests and added signal handling of 'thin' based workers (UI/WebService workers).

Note, these changes to the 'thin' based workers also fixes an issue that was a problem with ruby 1.9.3, namely, UI/Webservice workers that are sent SIGINT/SIGTERM don't gracefully log their "exit" or update their worker row. This caused them the server to wait for them to timeout, instead of restarting them immediately.

Among other things, you can't log in the trap signal handler in ruby 2.0.
https://bugs.ruby-lang.org/issues/7917

Results in the following ugly messages: 'Log writing failed. can't be called from trap context'

  • Add a EvmServer binstub so EvmServer can be required directly.

  • Don't try to trap SIGKILL since it can't be trapped/ignored.

  • Cleanup WorkerBase initialization to simplify test setup.

  • Convert server signal 'trap' to normal SignalException handling.

  • Convert worker 'trap' to normal SignalException handling.

  • Add at_exit for thin based workers since thin traps interrupts.

  • Extract all of the commmon WS/UI worker script code to a mixin.

    https://bugzilla.redhat.com/show_bug.cgi?id=1205407

@jrafanie
Copy link
Member Author

@matthewd @tenderlove @Fryguy Please toss 🍅 I was going to try forking to test the server/worker stuff, but I'd need to rewrite too much code to get that working. I chose brittle stubbing for now. Note, I still have to add worker_base specs and make sure no subclasses modify start.

@kbrock
Copy link
Member

kbrock commented Mar 27, 2015

could you rebase this? there is a fix on master to fix this build error
thanks

@kbrock
Copy link
Member

kbrock commented Mar 27, 2015

this is great joe.
👍 LGTM (even without that spec) :shipit:

@jrafanie jrafanie force-pushed the fix_ruby2_trap_logging branch 2 times, most recently from b71223f to e161028 Compare March 27, 2015 20:16
@jrafanie jrafanie force-pushed the fix_ruby2_trap_logging branch from e161028 to 1d5ecf9 Compare March 27, 2015 20:22
@@ -63,12 +59,18 @@ def start
end

at_exit {
# TODO: should this be called on SOFT ints and SystemExit, not SIGINT?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbrock I think this is the cause of some of the "shutdown_and_exit" messages on the queue. We run MiqServer.stop on SIGINT... something for another day.

@jrafanie jrafanie changed the title [WIP] Fix ruby2 trap logging Fix ruby2 trap logging Mar 27, 2015
@jrafanie jrafanie changed the title Fix ruby2 trap logging Fix ruby2 trap logging and worker row status update Mar 27, 2015
@jrafanie jrafanie removed the wip label Mar 27, 2015
@jrafanie
Copy link
Member Author

@Fryguy @kbrock @matthewd updated, ready for review.

@kbrock
Copy link
Member

kbrock commented Mar 27, 2015

@jrafanie This looks good to me. Anything special I should keep in mind before merging?

@jrafanie
Copy link
Member Author

Not really @kbrock. Only things to look for is that rails server still honors ctrl c, rake evm:start works as before, and that kill, kill -TERM pid, kill -INT pid all work gracefully on the evm_server and workers processes. They log their exit and update their status. Note, Ui/ws workers now exit and get restarted quickly on these interrupts.

Also, there's also a lingering bug to do later in the at_exit for the server where it logs it's exit but some other stuff it shouldn't on SIGINT.

@miq-bot
Copy link
Member

miq-bot commented Mar 30, 2015

Checked commits jrafanie@b125f45 .. jrafanie@de72d60 with rubocop 0.27.1
9 files checked, 2 offenses detected

vmdb/lib/workers/evm_server.rb

vmdb/lib/workers/worker_base.rb

@jrafanie
Copy link
Member Author

Note, pushed a new commit to remove the unused shebang lines in the worker/evm_server launching scripts since they're called with a custom command line and weren't marked as executable so they could never be launched without specifying the ruby/runner location.

@Fryguy
Copy link
Member

Fryguy commented Mar 30, 2015

👍 Will merge when green.

Fryguy added a commit that referenced this pull request Mar 30, 2015
Fix ruby2 trap logging and worker row status update
@Fryguy Fryguy merged commit f826715 into ManageIQ:master Mar 30, 2015
@Fryguy Fryguy deleted the fix_ruby2_trap_logging branch March 30, 2015 15:30
@Fryguy Fryguy added this to the Sprint 21 Ending Mar 30, 2015 milestone Mar 30, 2015
jrafanie added a commit to jrafanie/manageiq that referenced this pull request Mar 30, 2015
Add `extend ActiveSupport::Concern` to fix `wrong number of arguments (0 for 1) (ArgumentError)`

Follow up to: ManageIQ#2386
https://bugzilla.redhat.com/show_bug.cgi?id=1205407
@jrafanie jrafanie changed the title Fix ruby2 trap logging and worker row status update Fix ruby2 trap logging and worker row status update (log writing failed. can't be called from trap context) Mar 31, 2015
jrafanie added a commit to jrafanie/manageiq-providers-kubernetes that referenced this pull request Aug 17, 2017
Since ruby 2.0, you cannot use a mutex in a trap context such as writing
a log message.

irb(main):001:0> require 'logger'
=> true
irb(main):002:0> trap(:TERM) { Logger.new("stdout").info "EventMonitor#start: ignoring SIGTERM" }
=> "DEFAULT"
irb(main):003:0> `kill #{Process.pid}`
log writing failed. can't be called from trap context
=> ""

Move this logic up to the caller of start and rescue a SignalException
with "SIGTERM" message instead.

See also: ManageIQ/manageiq#2386

I wrote a "throwaway test" to verify this...it's not worth keeping
though:

diff --git a/spec/models/manageiq/providers/kubernetes/container_manager/event_catcher_mixin_spec.rb b/spec/models/manageiq/providers/kubernetes/container_manager/event_catcher_mixin_spec.rb
index 1dfbdfb..b51eaba 100644
--- a/spec/models/manageiq/providers/kubernetes/container_manager/event_catcher_mixin_spec.rb
+++ b/spec/models/manageiq/providers/kubernetes/container_manager/event_catcher_mixin_spec.rb
@@ -25,6 +25,17 @@ describe ManageIQ::Providers::Kubernetes::ContainerManager::EventCatcherMixin do
     end
   end

+  describe "something" do
+    it "does things" do
+      MyClass = test_class
+      $kube_log = Logger.new("logger.log")
+      cm = test_class.new(ems)
+      handle = cm.event_monitor_handle
+      allow(handle).to receive(:start).and_raise(SignalException.new("SIGTERM"))
+      cm.monitor_events
+    end
+  end
+
   describe '#extract_event_data' do
     context 'given container event (Pod event with fieldPath)' do
       let(:kubernetes_event) do

Results in this:
08:51:14 ~/Code/manageiq-providers-kubernetes (cannot_log_in_trap_context) (2.4.1) + cat logger.log
I, [2017-08-17T08:45:30.523852 #51050]  INFO -- : MyClass#monitor_events: ignoring SIGTERM
jrafanie added a commit to jrafanie/manageiq-providers-ovirt that referenced this pull request Aug 17, 2017
Note, this is nearly identical to
ManageIQ/manageiq-providers-kubernetes#95

Since ruby 2.0, you cannot use a mutex in a trap context such as writing
a log message.

irb(main):001:0> require 'logger'
=> true
irb(main):002:0> trap(:TERM) { Logger.new("stdout").info "EventMonitor#start: ignoring SIGTERM" }
=> "DEFAULT"
irb(main):003:0> `kill #{Process.pid}`
log writing failed. can't be called from trap context
=> ""

Move this logic up to the caller of start and rescue a SignalException
with "SIGTERM" message instead.

See also: ManageIQ/manageiq#2386
jrafanie added a commit to jrafanie/manageiq-providers-ovirt that referenced this pull request Aug 17, 2017
It was added originally back in 2011 in manageiq:
946657b11a0a65abbcdca9975206d71f7c2b7afc

This was actually copied from the vmware vim monitoring events
code from 2009: 22a911716b40428b6d822f0f1e95aabd022e59c9

```
def monitorEvents
  raise "monitorEvents: no block given" if !block_given?

 trap(:TERM) { $log.info "monitorEvents: ignoring SIGTERM" }
...
```

First of all, trap with logging doesn't work since ruby 2.0+:

```
irb(main):001:0> require 'logger'
=> true
irb(main):002:0> trap(:TERM) { Logger.new("stdout").info
"EventMonitor#start: ignoring SIGTERM" }
=> "DEFAULT"
irb(main):003:0> `kill #{Process.pid}`
log writing failed. can't be called from trap context
=> ""
```

See ManageIQ/manageiq#2386

Second, we need to TERM processes in container land.
jrafanie added a commit to jrafanie/manageiq-providers-kubernetes that referenced this pull request Aug 17, 2017
See the same change in ovirt:
ManageIQ/manageiq-providers-ovirt#80

It was added originally back in 2011 in manageiq:
946657b11a0a65abbcdca9975206d71f7c2b7afc

This was actually copied from the vmware vim monitoring events
code from 2009: 22a911716b40428b6d822f0f1e95aabd022e59c9

```
def monitorEvents
  raise "monitorEvents: no block given" if !block_given?

 trap(:TERM) { $log.info "monitorEvents: ignoring SIGTERM" }
...
```

First of all, trap with logging doesn't work since ruby 2.0+:

```
irb(main):001:0> require 'logger'
=> true
irb(main):002:0> trap(:TERM) { Logger.new("stdout").info
"EventMonitor#start: ignoring SIGTERM" }
=> "DEFAULT"
irb(main):003:0> `kill #{Process.pid}`
log writing failed. can't be called from trap context
=> ""
```

See ManageIQ/manageiq#2386

Second, we need to TERM processes in container land.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants