Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass TERM Signal to VixDiskLibServer #4277

Closed

Conversation

jerryk55
Copy link
Member

@jerryk55 jerryk55 commented Sep 9, 2015

Catch TERM signals in the SmartProxyWorker code that starts the
VixDiskLibServer so that if the worker gets the signal, it will
be sent to the VixDiskLibServer prior to exiting. The VixDiskLibServer
already has a handler for TERM signals which causes it to shut down.

This is one of a couple of PRs required to handle the following BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1258985

@roliveri @Fryguy @jrafanie Please review.

Catch TERM signals in the SmartProxyWorker code that starts the
VixDiskLibServer so that if the worker gets the signal, it will
be sent to the VixDiskLibServer prior to exiting.  The VixDiskLibServer
already has a handler for TERM signals which causes it to shut down.

This is one of a couple of PRs required to handle the following BZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1258985
@jrafanie
Copy link
Member

jrafanie commented Sep 9, 2015

@jerryk55 Can you verify this works? We had to remove traps with ruby 2.0 because logging in the signal handler was having this issue:

log writing failed. can't be called from trap context

See: https://bugs.ruby-lang.org/issues/7917

I had to fix it here

Note, you should be able to set the constant in the Smart Proxy worker:
lib/workers/smart_proxy_worker.rb
INTERRUPT_SIGNALS = ["SIGINT", "SIGTERM"]
To:

INTERRUPT_SIGNALS = ["SIGINT"]

I believe, then the WorkerBase#start will raise the SignalException in the else below:

  def start
    prepare
    run

  rescue SignalException => e
    if e.kind_of?(Interrupt) || self.class.interrupt_signals.include?(e.message)
      do_exit("Interrupt signal (#{e}) received.")
    else
      raise
    end
  end

I guess you can do something like this then in the smart_proxy_worker.rb:

  INTERRUPT_SIGNALS = ["SIGINT"]
  ...

  def start
    super
  rescue SignalException => e
    if e.message.include?("SIGTERM")
     # Handle the kill of the sub-thread/process
    else
      raise
    end
  end

@miq-bot
Copy link
Member

miq-bot commented Sep 9, 2015

Checked commit jerryk55@7a90ea7 with ruby 1.9.3, rubocop 0.33.0, and haml-lint 0.13.0
1 file checked, 1 offense detected

gems/pending/VixDiskLib/VixDiskLib.rb

@matthewd
Copy link
Contributor

matthewd commented Sep 9, 2015

It'd be safer to use SIGPIPE. Instead of the child relying on the parent to tell it when it's dying, let the child actively detect the death of the parent.

@jerryk55
Copy link
Member Author

jerryk55 commented Sep 9, 2015

@matthewd one of the issues is that the child isn't behaving correctly - there is already a timeout that is supposed to happen after x amount of time without any new requests. I'm trying to actively instruct the child to shut down. The signal we use to tell the child to shut down should not matter assuming it is one that may be caught and handled. SIGPIPE is relevant when there is a pipe between the processes. We are using DRb for interprocess communication in this case.

@matthewd
Copy link
Contributor

matthewd commented Sep 9, 2015

@jerryk55 sorry, I meant create a pipe between the processes.

Pass a new pipe in as stdin, and then have the child die when stdin closes.

@jerryk55
Copy link
Member Author

jerryk55 commented Sep 9, 2015

@matthewd this would also require that the parent and child processes be actively communicating via the pipe. Closing one end of a pipe in and of itself would not generate the signal. So yes, we could then set up a separate pair of threads responsible for heartbeating across the pipe, but I'm not sure we are making anything simpler by doing this. We already have a timing thread in the server (child) that seems to be having issues waking back up and killing the process appropriately in certain situations. I'd rather concentrate on getting that timer to work successfully, and handle propragating signals that the worker gets to the child appropriately. In this way, should we choose to handle different signals in different ways we would be prepared for it.

@matthewd
Copy link
Contributor

matthewd commented Sep 9, 2015

Thread.new { $stdin.read; $stderr.puts "Parent went away"; exit }

(okay, that's not a SIGPIPE.. but still.)

@jerryk55
Copy link
Member Author

jerryk55 commented Sep 9, 2015

That neither wakes up the thread nor generates a SIGPIPE in a test program I've written which spawns a child and sets up a DRb communication channel as well as using the pipe. The child does get the SIGINT when I Control-C out of the parent (which I am handling by printing a message but not exiting) but the child continues to run.

@matthewd
Copy link
Contributor

matthewd commented Sep 9, 2015

😕

This does what I expect:

a, b = IO.pipe

fork do
  trap('INT') { puts "ignoring interrupt" }

  b.close

  Thread.new do
    a.read
    puts "parent went away"
    exit
  end

  puts "child #$$ going to sleep"
  sleep 120
end

a.close
puts "parent #$$ going to sleep"
sleep 60

@jerryk55
Copy link
Member Author

Sorry yes. We use Kernel.spawn in this scenario and pass the FD as an environment variable so there were several other steps involved but this does work.

@matthewd
Copy link
Contributor

and pass the FD as an environment variable

Woah, holy complication batman! We, umm... we don't need to do that.

@matthewd
Copy link
Contributor

See #4318 for... a less unhelpful comment.

@jerryk55
Copy link
Member Author

@matthewd just some feedback - after successfully writing some test programs that do what we want, I implemented this change, using most of your suggestions in #4318. Unfortunately something is closing the write end of the pipe in the parent - the thread in the child (VixDiskLibServer.rb) is gaining control and shutting down the process after about 15-20 seconds, even though the SmartProxyWorker is in the midst of fleecing the VM. This is sad for all involved.

@matthewd
Copy link
Contributor

@jerryk55 are we keeping a reference to it somewhere? Otherwise it'll just get closed when the parent GCs.

@jerryk55
Copy link
Member Author

Yup. That was it.

On Thu, Sep 17, 2015 at 1:37 PM, Matthew Draper notifications@github.com
wrote:

@jerryk55 https://github.com/jerryk55 are we keeping a reference to it
somewhere? Otherwise it'll just get closed when the parent GCs.


Reply to this email directly or view it on GitHub
#4277 (comment).

Jerry Keselman
Red Hat Virtualization R&Djerryk@redhat.com | p. 212 530 7873

jerryk55 added a commit to jerryk55/manageiq that referenced this pull request Sep 17, 2015
Added a thread to the VixDiskLibServer to read one end of a pipe that
the SmartProxyWorker has opened.  When the read returns the worker process
has exited so the VixDiskLibServer shuts down.

This fixes the BZ https://bugzilla.redhat.com/show_bug.cgi?id=1258985

This fix was implemented in lieu of previous PR ManageIQ#4277 -
ManageIQ#4277 which passed SIGTERM
from the worker process to the server process.  The new implementation
will handle any signal that has caused the worker to exit including
SIGKILL. The previous PR will be closed.

Instead of using an environment variable to let VixDiskLibServer know which
FD contains the pipe we are using a known FD.  In addition we have changed
the previous logic for the other pipe being used by these two
processes which is used to hand the DRb URI back to the worker so that it
also uses a known FD.

Finally we have fixed the retry logic in the worker to read the URI before
attempting to establish the DRb connection.

The "known FD" and retry logic code come from PR ManageIQ#4318 by @matthewd which
will be closed in lieu of this new one.
@jerryk55
Copy link
Member Author

This PR will be closed in lieu of #4411

@jerryk55 jerryk55 closed this Sep 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants