-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass TERM Signal to VixDiskLibServer #4277
Conversation
Catch TERM signals in the SmartProxyWorker code that starts the VixDiskLibServer so that if the worker gets the signal, it will be sent to the VixDiskLibServer prior to exiting. The VixDiskLibServer already has a handler for TERM signals which causes it to shut down. This is one of a couple of PRs required to handle the following BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1258985
@jerryk55 Can you verify this works? We had to remove traps with ruby 2.0 because logging in the signal handler was having this issue:
See: https://bugs.ruby-lang.org/issues/7917 I had to fix it here Note, you should be able to set the constant in the Smart Proxy worker: INTERRUPT_SIGNALS = ["SIGINT"] I believe, then the WorkerBase#start will raise the SignalException in the else below: def start
prepare
run
rescue SignalException => e
if e.kind_of?(Interrupt) || self.class.interrupt_signals.include?(e.message)
do_exit("Interrupt signal (#{e}) received.")
else
raise
end
end I guess you can do something like this then in the smart_proxy_worker.rb: INTERRUPT_SIGNALS = ["SIGINT"]
...
def start
super
rescue SignalException => e
if e.message.include?("SIGTERM")
# Handle the kill of the sub-thread/process
else
raise
end
end |
Checked commit jerryk55@7a90ea7 with ruby 1.9.3, rubocop 0.33.0, and haml-lint 0.13.0 gems/pending/VixDiskLib/VixDiskLib.rb
|
It'd be safer to use SIGPIPE. Instead of the child relying on the parent to tell it when it's dying, let the child actively detect the death of the parent. |
@matthewd one of the issues is that the child isn't behaving correctly - there is already a timeout that is supposed to happen after x amount of time without any new requests. I'm trying to actively instruct the child to shut down. The signal we use to tell the child to shut down should not matter assuming it is one that may be caught and handled. SIGPIPE is relevant when there is a pipe between the processes. We are using DRb for interprocess communication in this case. |
@jerryk55 sorry, I meant create a pipe between the processes. Pass a new pipe in as stdin, and then have the child die when stdin closes. |
@matthewd this would also require that the parent and child processes be actively communicating via the pipe. Closing one end of a pipe in and of itself would not generate the signal. So yes, we could then set up a separate pair of threads responsible for heartbeating across the pipe, but I'm not sure we are making anything simpler by doing this. We already have a timing thread in the server (child) that seems to be having issues waking back up and killing the process appropriately in certain situations. I'd rather concentrate on getting that timer to work successfully, and handle propragating signals that the worker gets to the child appropriately. In this way, should we choose to handle different signals in different ways we would be prepared for it. |
Thread.new { $stdin.read; $stderr.puts "Parent went away"; exit } (okay, that's not a SIGPIPE.. but still.) |
That neither wakes up the thread nor generates a SIGPIPE in a test program I've written which spawns a child and sets up a DRb communication channel as well as using the pipe. The child does get the SIGINT when I Control-C out of the parent (which I am handling by printing a message but not exiting) but the child continues to run. |
😕 This does what I expect: a, b = IO.pipe
fork do
trap('INT') { puts "ignoring interrupt" }
b.close
Thread.new do
a.read
puts "parent went away"
exit
end
puts "child #$$ going to sleep"
sleep 120
end
a.close
puts "parent #$$ going to sleep"
sleep 60 |
Sorry yes. We use Kernel.spawn in this scenario and pass the FD as an environment variable so there were several other steps involved but this does work. |
Woah, holy complication batman! We, umm... we don't need to do that. |
See #4318 for... a less unhelpful comment. |
@matthewd just some feedback - after successfully writing some test programs that do what we want, I implemented this change, using most of your suggestions in #4318. Unfortunately something is closing the write end of the pipe in the parent - the thread in the child (VixDiskLibServer.rb) is gaining control and shutting down the process after about 15-20 seconds, even though the SmartProxyWorker is in the midst of fleecing the VM. This is sad for all involved. |
@jerryk55 are we keeping a reference to it somewhere? Otherwise it'll just get closed when the parent GCs. |
Yup. That was it. On Thu, Sep 17, 2015 at 1:37 PM, Matthew Draper notifications@github.com
Jerry Keselman |
Added a thread to the VixDiskLibServer to read one end of a pipe that the SmartProxyWorker has opened. When the read returns the worker process has exited so the VixDiskLibServer shuts down. This fixes the BZ https://bugzilla.redhat.com/show_bug.cgi?id=1258985 This fix was implemented in lieu of previous PR ManageIQ#4277 - ManageIQ#4277 which passed SIGTERM from the worker process to the server process. The new implementation will handle any signal that has caused the worker to exit including SIGKILL. The previous PR will be closed. Instead of using an environment variable to let VixDiskLibServer know which FD contains the pipe we are using a known FD. In addition we have changed the previous logic for the other pipe being used by these two processes which is used to hand the DRb URI back to the worker so that it also uses a known FD. Finally we have fixed the retry logic in the worker to read the URI before attempting to establish the DRb connection. The "known FD" and retry logic code come from PR ManageIQ#4318 by @matthewd which will be closed in lieu of this new one.
This PR will be closed in lieu of #4411 |
Catch TERM signals in the SmartProxyWorker code that starts the
VixDiskLibServer so that if the worker gets the signal, it will
be sent to the VixDiskLibServer prior to exiting. The VixDiskLibServer
already has a handler for TERM signals which causes it to shut down.
This is one of a couple of PRs required to handle the following BZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1258985
@roliveri @Fryguy @jrafanie Please review.