Pass TERM Signal to VixDiskLibServer #4277

jerryk55 · 2015-09-09T13:39:01Z

Catch TERM signals in the SmartProxyWorker code that starts the
VixDiskLibServer so that if the worker gets the signal, it will
be sent to the VixDiskLibServer prior to exiting. The VixDiskLibServer
already has a handler for TERM signals which causes it to shut down.

This is one of a couple of PRs required to handle the following BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1258985

@roliveri @Fryguy @jrafanie Please review.

Catch TERM signals in the SmartProxyWorker code that starts the VixDiskLibServer so that if the worker gets the signal, it will be sent to the VixDiskLibServer prior to exiting. The VixDiskLibServer already has a handler for TERM signals which causes it to shut down. This is one of a couple of PRs required to handle the following BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1258985

jrafanie · 2015-09-09T14:21:58Z

@jerryk55 Can you verify this works? We had to remove traps with ruby 2.0 because logging in the signal handler was having this issue:

log writing failed. can't be called from trap context

See: https://bugs.ruby-lang.org/issues/7917

I had to fix it here

Note, you should be able to set the constant in the Smart Proxy worker:
lib/workers/smart_proxy_worker.rb
INTERRUPT_SIGNALS = ["SIGINT", "SIGTERM"]
To:

INTERRUPT_SIGNALS = ["SIGINT"]

I believe, then the WorkerBase#start will raise the SignalException in the else below:

  def start
    prepare
    run

  rescue SignalException => e
    if e.kind_of?(Interrupt) || self.class.interrupt_signals.include?(e.message)
      do_exit("Interrupt signal (#{e}) received.")
    else
      raise
    end
  end

I guess you can do something like this then in the smart_proxy_worker.rb:

  INTERRUPT_SIGNALS = ["SIGINT"]
  ...

  def start
    super
  rescue SignalException => e
    if e.message.include?("SIGTERM")
     # Handle the kill of the sub-thread/process
    else
      raise
    end
  end

miq-bot · 2015-09-09T15:59:03Z

Checked commit jerryk55@7a90ea7 with ruby 1.9.3, rubocop 0.33.0, and haml-lint 0.13.0
1 file checked, 1 offense detected

gems/pending/VixDiskLib/VixDiskLib.rb

🔶 - Line 139, Col 37 - Style/Semicolon - Do not use semicolons to terminate expressions.

matthewd · 2015-09-09T16:41:15Z

It'd be safer to use SIGPIPE. Instead of the child relying on the parent to tell it when it's dying, let the child actively detect the death of the parent.

jerryk55 · 2015-09-09T17:38:19Z

@matthewd one of the issues is that the child isn't behaving correctly - there is already a timeout that is supposed to happen after x amount of time without any new requests. I'm trying to actively instruct the child to shut down. The signal we use to tell the child to shut down should not matter assuming it is one that may be caught and handled. SIGPIPE is relevant when there is a pipe between the processes. We are using DRb for interprocess communication in this case.

matthewd · 2015-09-09T18:23:25Z

@jerryk55 sorry, I meant create a pipe between the processes.

Pass a new pipe in as stdin, and then have the child die when stdin closes.

jerryk55 · 2015-09-09T19:13:22Z

@matthewd this would also require that the parent and child processes be actively communicating via the pipe. Closing one end of a pipe in and of itself would not generate the signal. So yes, we could then set up a separate pair of threads responsible for heartbeating across the pipe, but I'm not sure we are making anything simpler by doing this. We already have a timing thread in the server (child) that seems to be having issues waking back up and killing the process appropriately in certain situations. I'd rather concentrate on getting that timer to work successfully, and handle propragating signals that the worker gets to the child appropriately. In this way, should we choose to handle different signals in different ways we would be prepared for it.

matthewd · 2015-09-09T19:15:00Z

Thread.new { $stdin.read; $stderr.puts "Parent went away"; exit }

(okay, that's not a SIGPIPE.. but still.)

jerryk55 · 2015-09-09T20:45:41Z

That neither wakes up the thread nor generates a SIGPIPE in a test program I've written which spawns a child and sets up a DRb communication channel as well as using the pipe. The child does get the SIGINT when I Control-C out of the parent (which I am handling by printing a message but not exiting) but the child continues to run.

matthewd · 2015-09-09T21:29:15Z

😕

This does what I expect:

a, b = IO.pipe

fork do
  trap('INT') { puts "ignoring interrupt" }

  b.close

  Thread.new do
    a.read
    puts "parent went away"
    exit
  end

  puts "child #$$ going to sleep"
  sleep 120
end

a.close
puts "parent #$$ going to sleep"
sleep 60

jerryk55 · 2015-09-10T14:36:30Z

Sorry yes. We use Kernel.spawn in this scenario and pass the FD as an environment variable so there were several other steps involved but this does work.

matthewd · 2015-09-11T07:59:30Z

and pass the FD as an environment variable

Woah, holy complication batman! We, umm... we don't need to do that.

matthewd · 2015-09-11T08:41:48Z

See #4318 for... a less unhelpful comment.

jerryk55 · 2015-09-17T17:22:59Z

@matthewd just some feedback - after successfully writing some test programs that do what we want, I implemented this change, using most of your suggestions in #4318. Unfortunately something is closing the write end of the pipe in the parent - the thread in the child (VixDiskLibServer.rb) is gaining control and shutting down the process after about 15-20 seconds, even though the SmartProxyWorker is in the midst of fleecing the VM. This is sad for all involved.

matthewd · 2015-09-17T17:37:18Z

@jerryk55 are we keeping a reference to it somewhere? Otherwise it'll just get closed when the parent GCs.

jerryk55 · 2015-09-17T18:09:35Z

Yup. That was it.

On Thu, Sep 17, 2015 at 1:37 PM, Matthew Draper notifications@github.com
wrote:

@jerryk55 https://github.com/jerryk55 are we keeping a reference to it
somewhere? Otherwise it'll just get closed when the parent GCs.

—
Reply to this email directly or view it on GitHub
#4277 (comment).

Jerry Keselman
Red Hat Virtualization R&Djerryk@redhat.com | p. 212 530 7873

@matthewd

Added a thread to the VixDiskLibServer to read one end of a pipe that the SmartProxyWorker has opened. When the read returns the worker process has exited so the VixDiskLibServer shuts down. This fixes the BZ https://bugzilla.redhat.com/show_bug.cgi?id=1258985 This fix was implemented in lieu of previous PR ManageIQ#4277 - ManageIQ#4277 which passed SIGTERM from the worker process to the server process. The new implementation will handle any signal that has caused the worker to exit including SIGKILL. The previous PR will be closed. Instead of using an environment variable to let VixDiskLibServer know which FD contains the pipe we are using a known FD. In addition we have changed the previous logic for the other pipe being used by these two processes which is used to hand the DRb URI back to the worker so that it also uses a known FD. Finally we have fixed the retry logic in the worker to read the URI before attempting to establish the DRb connection. The "known FD" and retry logic code come from PR ManageIQ#4318 by @matthewd which will be closed in lieu of this new one.

jerryk55 · 2015-09-17T19:28:17Z

This PR will be closed in lieu of #4411

chessbyte added enhancement core/smart state labels Sep 9, 2015

jerryk55 mentioned this pull request Sep 17, 2015

Add Monitoring Thread to Watch for Worker Shutdown #4411

Merged

jerryk55 closed this Sep 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass TERM Signal to VixDiskLibServer #4277

Pass TERM Signal to VixDiskLibServer #4277

jerryk55 commented Sep 9, 2015

jrafanie commented Sep 9, 2015

miq-bot commented Sep 9, 2015

matthewd commented Sep 9, 2015

jerryk55 commented Sep 9, 2015

matthewd commented Sep 9, 2015

jerryk55 commented Sep 9, 2015

matthewd commented Sep 9, 2015

jerryk55 commented Sep 9, 2015

matthewd commented Sep 9, 2015

jerryk55 commented Sep 10, 2015

matthewd commented Sep 11, 2015

matthewd commented Sep 11, 2015

jerryk55 commented Sep 17, 2015

matthewd commented Sep 17, 2015

jerryk55 commented Sep 17, 2015

jerryk55 commented Sep 17, 2015

Pass TERM Signal to VixDiskLibServer #4277

Pass TERM Signal to VixDiskLibServer #4277

Conversation

jerryk55 commented Sep 9, 2015

jrafanie commented Sep 9, 2015

miq-bot commented Sep 9, 2015

matthewd commented Sep 9, 2015

jerryk55 commented Sep 9, 2015

matthewd commented Sep 9, 2015

jerryk55 commented Sep 9, 2015

matthewd commented Sep 9, 2015

jerryk55 commented Sep 9, 2015

matthewd commented Sep 9, 2015

jerryk55 commented Sep 10, 2015

matthewd commented Sep 11, 2015

matthewd commented Sep 11, 2015

jerryk55 commented Sep 17, 2015

matthewd commented Sep 17, 2015

jerryk55 commented Sep 17, 2015

jerryk55 commented Sep 17, 2015