Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug Errno: @message="SSL_shutdown: Operation now in progress" #3168

Closed
jmoriau opened this issue Aug 17, 2016 · 22 comments
Closed

Possible bug Errno: @message="SSL_shutdown: Operation now in progress" #3168

jmoriau opened this issue Aug 17, 2016 · 22 comments
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib

Comments

@jmoriau
Copy link

jmoriau commented Aug 17, 2016

Hi,

I have a long running process which controls the lights in my apartment, a small endpoint to receive IFTTT outgoing calls, and a loop inside a fiber (so it can run concurrently with the endpoint) which polls the lights status and sometimes update some of them with a POST call to LIFX's servers.

Every once in a while (like every few days) I noticed the fiber which is supposed to run the polling loop stops and the rescued exceptions prints this for e.inspect:

#<Errno:0xa59c40 @message="SSL_shutdown: Operation now in progress", @cause=nil, @callstack=CallStack(@callstack=[Pointer(Void)@0x414fbe, Pointer(Void)@0x449ac7, Pointer(Void)@0x48ddc5, Pointer(Void)@0x4f33b0, Pointer(Void)@0x505895, Pointer(Void)@0x417072, Pointer(Void).null], @backtrace=nil), @errno=115>"

Note the loop is supposed to always run since everything inside the loop is wrapped in the begin rescue block. But it seems this errors breaks out of the loop and then the fiber is done.

I'm sorry I don't have a proper backtrace yet (it usually takes a few days to reproduce this) I'll post one next time I get one.

@jhass jhass added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib labels Aug 17, 2016
@RX14
Copy link
Contributor

RX14 commented Aug 17, 2016

See: http://stackoverflow.com/questions/10204134/tcp-connect-error-115-operation-in-progress-what-is-the-cause

Seems like a bug in the interaction between the stdlib and openssl.

@jmoriau
Copy link
Author

jmoriau commented Aug 17, 2016

I just got a stack trace, you'll appreciate how useful it is 😄

[2016-08-17 17:40:44 +0200]:[LIFX]:[Errno]:[#<Errno:0x9f53c0 @message="SSL_shutdown: Operation now in progress", @cause=nil, @callstack=CallStack(@callstack=[Pointer(Void)@0x414fbe, Pointer(Void)@0x449b17, Pointer(Void)@0x48dea5, Pointer(Void)@0x4f3600, Pointer(Void)@0x505ce5, Pointer(Void)@0x417073, Pointer(Void).null], @backtrace=nil), @errno=115>]
SSL_shutdown: Operation now in progress (Errno)

[4280254] ???

[4496151] ???

[4775589] ???

[5191168] ???

[5266661] ???

[4288627] ???

[0] ???

I'll try to build without --release this time 😄

@jmoriau
Copy link
Author

jmoriau commented Aug 18, 2016

The only thing it printed without the --release flag is this:


[4947480] *raise<Errno>:NoReturn +24

@luislavena
Copy link
Contributor

@jmoriau can you try running your process with gdb?

@jmoriau
Copy link
Author

jmoriau commented Aug 18, 2016

@luislavena I used to use GDB cli back when I was still writing C (like 5 years ago), is it the same for crystal?
Do I need to compile with some flags to get the symbols added?
Also What would I be looking for since the process keeps running (it simply kills one of the fibers running, and the more I see it happen the more I think the fiber being killed has nothing to do with this, because sometimes the error happens and the fiber is not killed)?
Can I easily set a breakpoint at a file/line? Then what?

Sorry for all these questions, never used a debugger with crystal before (didn't even know it could be done). Thanks!

@ysbaddaden
Copy link
Contributor

Crystal compiles to native code and can be somewhat debugged with any native debugger tool like gdb or lldb. Except for displaying variable (we don't genrerate the necessary data yet) you can set breakpoints, catch signals, show backtraces, use gdb -tui (with some line errors), ...

File and line and symbols can be added with the --debug flag.

@jmoriau
Copy link
Author

jmoriau commented Aug 19, 2016

@ysbaddaden ok thanks, then I should know how to use it.
But I still wouldn't know what I'm looking for once I break in the rescue block.

@matiasgarciaisaia
Copy link
Member

StackOverflow's shell reported having this issue.

I was able to easily reproduce the bug with this program (based on his):

require "http/client"
require "json"

status = -1

10.times do
  spawn do
    while true
      puts "Pinging..."
      response = HTTP::Client.get "https://randomapi.com/api/6de6abfedb24f889e0b5f675edc50deb?fmt=raw&sole"
      puts "Pong"
      response_to_hash = JSON.parse response.body
      status = response_to_hash[0]["first"].to_s.size
      pp status
    end
  end
end

Fiber.yield

puts "Post yield"

sleep 20000.seconds # so the main thread doesn't exit yet

Running Crystal 0.18.7 (2016-08-05) from brew on OSX 10.11.6, I got this exception:

Unhandled exception:
SSL_shutdown: Operation now in progress (Errno)
[4440073698] *CallStack::unwind:Array(Pointer(Void)) +82
[4440073601] *CallStack#initialize:Array(Pointer(Void)) +17
[4440073560] *CallStack::new:CallStack +40
[4440034617] *raise<Errno>:NoReturn +25
[4440200954] *OpenSSL::SSL::Socket+@OpenSSL::SSL::Socket#close:Nil +346
[4440259434] *HTTP::Client#close:Nil +122
[4440256106] *HTTP::Client::exec<String, String, Nil, Nil, Nil>:HTTP::Client::Response +650
[4440255439] *HTTP::Client::get<String>:HTTP::Client::Response +31
[4440052194] ~procProc(Nil)@./tmp/test.cr:7 +66
[4440070932] *Fiber#run:(Int64 | Nil) +164
[4440042917] ~proc2Proc(Fiber, (Int64 | Nil))@/usr/local/Cellar/crystal-lang/0.18.7_1/src/fiber.cr:28 +21

You can see the whole output (until I've interrupted the execution) at this gist.

@sdogruyol
Copy link
Member

I can confirm that this also happens in master Crystal 0.19.4+142 [6b09c3b] (2016-11-19)

@andersondias
Copy link

I had the very same issue here:

WARN: Errno: SSL_shutdown: Operation now in progress
WARN: [4401960353] *raise<Exception+>:NoReturn +81
[4402184488] *Sidekiq::Processor#process<Sidekiq::BasicFetch::UnitOfWork>:Bool +1432
[4402153711] *Sidekiq::Processor#run:(Sidekiq::Processor | Nil) +95
[4402152488] ~procProc(Nil)@./libs/sidekiq/sidekiq/server/processor.cr:39 +296
[4402134017] ~procProc(Nil)@./libs/sidekiq/sidekiq/server/util.cr:20 +33
[4401345567] *Fiber#run:(IO::FileDescriptor | Nil) +47

@denysvitali
Copy link

Can confirm, I was trying Gangwolf/hncr and got this error too.
Crystal 0.20.1 (2016-12-06)
Linux denvit-ws1 4.8.13-1-ARCH #1 SMP PREEMPT Fri Dec 9 07:24:34 CET 2016 x86_64 GNU/Linux

@nobilik
Copy link

nobilik commented Feb 27, 2017

I've hit the same bug today. Crystal 0.20.5 [ccf46c0] (2017-01-20)

@ysbaddaden
Copy link
Contributor

The bug has been acknownledged, further "me too" messages won't help resolve the issue.

We need a lead to understand why and when OpenSSL fails with "operation in progress", if the behavior is expected, and what happens in the stdlib that generates this OpenSSL error. Further investigation, a reduced failing example or spec that always reproduces the bug (quickly) are more than welcomed.

@denysvitali
Copy link

denysvitali commented Feb 27, 2017

Minimal code to reproduce this issue:

require "http/client"
require "json"

2.times do
  spawn do
    i = 1;
    while i!=2
      response = HTTP::Client.get "https://google.com"
      i += 1;
    end
  end
end

sleep 10.seconds # so the main thread doesn't exit yet

Tested with Crystal 0.20.5 (2017-01-25), Linux denvit-ws1 4.9.11-1-ARCH and openssl 1.0.2.k-1

Output

dvitali@denvit-ws1:crystal/issue-3168 $ crystal run main.cr
Unhandled exception in spawn:
SSL_shutdown: Operation now in progress (Errno)
0x4afcf9: *OpenSSL::SSL::Socket+ at /usr/lib/crystal/openssl/ssl/socket.cr 148:13
0x4ba555: *HTTP::Client#close:Nil at /usr/lib/crystal/http/client.cr 635:19
0x4b7975: *HTTP::Client::exec<String, String, Nil, Nil, Nil>:HTTP::Client::Response at /usr/lib/crystal/http/client.cr 615:5
0x4b767d: *HTTP::Client::get<String>:HTTP::Client::Response at /home/dvitali/.cache/crystal/macro64371184.cr 37:7
0x45cd6a: *Fiber#run:(IO::FileDescriptor | Nil) at /usr/lib/crystal/fiber.cr 114:3
0x44d096: ~proc2Proc(Fiber, (IO::FileDescriptor | Nil)) at /usr/lib/crystal/concurrent.cr 60:3
0x0: ??? at ??

@nobilik
Copy link

nobilik commented Feb 27, 2017

Lucky me) Sorry, when I learn to write tests, I'll post the results.

@zatherz
Copy link
Contributor

zatherz commented Mar 12, 2017

Bump.

@akzhan
Copy link
Contributor

akzhan commented May 19, 2017

@ysbaddaden looks like we should retry on Errno::EINPROGRESS errors on unbuffered close.

Just like EAGAIN. Pull request #4433 sent.

@RX14
Copy link
Contributor

RX14 commented May 20, 2017

@denysvitali Just FYI, your example contains while i!=2 which is parsed as while i! = 2 which is the same as while 2 and while true. Please use crystal tool format to reduce human error and so that your code examples can be further reduced.

require "http/client"

2.times do
  spawn do
    loop do
      HTTP::Client.get "https://google.com"
    end
  end
end

sleep 2.seconds

@denysvitali
Copy link

denysvitali commented May 20, 2017

Thanks for the tip!

require "http/client"
require "json"

2.times do
  spawn do
    i = 1
    while i != 5
      response = HTTP::Client.get "https://google.com"
      i += 1
    end
  end
end

sleep 10.seconds # so the main thread doesn't exit yet
dvitali@denvit-ws1:crystal/test $ crystal run http_problem.cr                                                                                                                            -1-
Unhandled exception in spawn:
SSL_shutdown: Operation now in progress (Errno)
0x46e2cb: *CallStack::unwind:Array(Pointer(Void)) at ??
0x4eda2b: unbuffered_close at /usr/lib/crystal/openssl/ssl/socket.cr 150:13
0x4ed87f: close at /usr/lib/crystal/io/buffered.cr 197:5
0x4f4e75: close at /usr/lib/crystal/http/client.cr 634:19
0x4f1952: exec at /usr/lib/crystal/http/client.cr 614:5
0x4f164d: get at /usr/lib/crystal/http/client.cr 329:3
0x46a472: ~procProc(Nil) at /usr/lib/crystal/base64.cr 0:1
0x47fa7a: run at /usr/lib/crystal/fiber.cr 255:3
0x467d96: ~proc2Proc(Fiber, (IO::FileDescriptor | Nil)) at /usr/lib/crystal/concurrent.cr 0:3
0x0: ??? at ??
^C%                                                                                                                                                                                          dvitali@denvit-ws1:crystal/test $ uname -a                                                                                                                                               -1-
Linux denvit-ws1 4.10.13-1-ARCH #1 SMP PREEMPT Thu Apr 27 12:15:09 CEST 2017 x86_64 GNU/Linux

dvitali@denvit-ws1:crystal/test $ pacman -Q openssl 
openssl 1.1.0.e-1

dvitali@denvit-ws1:crystal/test $ crystal -v
Crystal 0.22.0 (2017-04-22) LLVM 4.0.0

@RX14
Copy link
Contributor

RX14 commented May 20, 2017

@denysvitali FYI #4433 fixes the bug.

@denysvitali
Copy link

You're right, just tested #4433 , it works!

dvitali@denvit-ws1:crystal/bin ‹retry-EINPROGRESS-on-unbuffered-close*›$ ./crystal ~/Documents/crystal/test/http_problem.cr 
Using compiled compiler at `.build/crystal'
dvitali@denvit-ws1:crystal/bin ‹retry-EINPROGRESS-on-unbuffered-close*›$ ./crystal -v                                      
Using compiled compiler at `.build/crystal'
Crystal 0.22.0+? [30c7410] (2017-05-20) LLVM 4.0.0

@RX14
Copy link
Contributor

RX14 commented May 20, 2017

@denysvitali For future reference, compiling the compiler isn't neccesary because it's a stdlib fix. Using bin/crystal is enough.

akzhan added a commit to akzhan/crystal that referenced this issue May 20, 2017
Just like EAGAIN handling behavior but due to another reason.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:stdlib
Projects
None yet
Development

No branches or pull requests