Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grape is not threadsafe #517

Closed
bwalex opened this issue Nov 27, 2013 · 31 comments
Closed

grape is not threadsafe #517

bwalex opened this issue Nov 27, 2013 · 31 comments
Labels

Comments

@bwalex
Copy link
Contributor

bwalex commented Nov 27, 2013

When running grape on puma, all sorts of weirdness happens to the activerecord connection pool - things fail randomly and at times all the connections stay checked out, not leaving any connections in the pool.

The same code run on unicorn works just fine. It's worth noting that doing the same before/after thing with sinatra works fine on both puma and unicorn, so I think the problem is grape (rather than ActiveRecord or the DB driver).

Another odd thing happens when moving the 'after' block after the 'resource' block - then it is not run at all; but that's probably a separate issue.

class API < Grape::API
  version 'v1', using: :header, vendor: 'foobar'
  format :json
  prefix '/api'


  before do
    ActiveRecord::Base.connection_pool.connections.map(&:verify!)
  end

  after do
    ActiveRecord::Base.clear_active_connections!
  end


  resource :comments do
    ...
  end
end
@dblock
Copy link
Member

dblock commented Dec 2, 2013

I'd like to fix this, whatever it is. It would be great to have some infrastructure where some errors are reproducible. This could be a simple demo that uses Puma that we can ab against and see some of the responses start failing. If the problem is in Grape, you should be able to do this w/o AR.

@bwalex
Copy link
Contributor Author

bwalex commented Dec 4, 2013

I'll try to extract a test case, but it probably won't be until the weekend.

@bwalex
Copy link
Contributor Author

bwalex commented Dec 16, 2013

Well, I've been trying to reproduce this problem stand-alone but couldn't. However:

  1. there seems to be a threading issue with the mysql2 adapter - see mysql2 doesn't seem to be threadsafe brianmario/mysql2#461

  2. the pg adapter seems to expose an issue in grape - running with sinatra works fine, pretty much in line with my original report.

As for (2), see: https://github.com/bwalex/ar_threads - Towards the end of the ab run against grape, the last few requests fail with ActiveRecord::ConnectionTimeoutError. If you then run ab again (without restarting puma), all of the requests fail with that.

Running ab against sinatra everything's fine. Similarly, running with unicorn everything's fine as well.

@robertjpayne
Copy link

Grape's thread safety isn't going to affect AR's connection pool. This issue is strictly with AR and/or the way it's connection pool is managed, while Grape may be doing something weird here that makes the management of AR's connection pool wonky it's not thread safety.

Remember when Puma thread's die you loose that connection until the full process restarts. Have you turned on AR 4.0 connection reaper?

Your mysql2 issue seems unrelated completely. The segfault is happening inside the C code and is likely due to a blow up on memory due to high concurrency.

I hope when you are testing on your local machine you're using both 64 connections for Sinatra and Grape unlike your repo which shows 32 for Sinatra and 64 for Grape. Concurrency will matter here because AR only has 5 seconds to obtain a connection or it will fail.

It'd be really great to see a small setup of the issue, alternatively if you could swap AR out with something like Sequel and see if it's connection pool clobbers up?

@bwalex
Copy link
Contributor Author

bwalex commented Dec 23, 2013

It fails with both 32 and 64 - so the concurrency level doesn't really matter. I've tried both with both.

Grape's thread safety can perfectly affect AR's connection pool if, say, the before or after block aren't called the right number of times in the right order. As I said, it works perfectly fine with any concurrency level with sinatra.

The connection reaper isn't necessary - there's exactly one connection being checked out by validating, and then cleared again in the after block.

The mysql2 issue is unrelated indeed - as I mentioned, that's a separate issue over at mysql2 now, and very much looks like a use-after-free case. But as I said, the issue can be reproduced with the pg adapter.

@robertjpayne
Copy link

@bwalex The connection reaper is totally necessary. Puma threads will die undesirably at random times when application load is high. Any time this happens you're going to have a checked out connection that will never be checked in. The Reaper ensures these eventually make it back into the pool. I'm not saying this is the solution if there is a bug to be found but any application will eventually have dead DB connections over time.

You should be able to log off in every before/after hook with the thread ID and verify the ordering and ensure the pairing even if your production stack.

Remember that Sinatra boots a lot different than Grape does, are you sure no where is establishing an AR connection before a puma thread starts or a puma worker starts? I've load tested Grape with Sequel + Puma and never had an issue with the connection pool.

@bwalex
Copy link
Contributor Author

bwalex commented Dec 26, 2013

You are probably right about the connection reaper, thanks for the hint.

However, I get this to fail with something as low as -n 8 -c 8, so there really must be something terribly wrong here.

I've tried abstracting the issue with the connpool.rb thing, but that doesn't seem to exhibit the problem. I don't discard there being a problem with AR, but somehow, with sinatra working fine, it looks more like an issue with grape.

@robertjpayne
Copy link

@bwalex Is it possible for you to even just list your setup? How are you mounting grape when it's failing, are you inside of Rails? What is in front of the WebServer if anything? etc.. etc..

@bwalex
Copy link
Contributor Author

bwalex commented Dec 26, 2013

https://github.com/bwalex/ar_threads is exactly what I'm reproducing this with now. I don't use Rails - I just use ActiveRecord. Nothing is in front of the web server. config.ru is where Grape is mounted.

@bwalex
Copy link
Contributor Author

bwalex commented Dec 26, 2013

I've been instrumenting the relevant AR code, and I'm a bit closer to what's going on.

With grape, it tries to release the connection before checking it out, so it's never actually released. The ordering of the before and after block looks per se looks correct though.

With sinatra, things do happen in the right order - the checkout runs first, and then the checkin.

I get the impression this is some nasty interaction between synchronization blocks.

@bwalex
Copy link
Contributor Author

bwalex commented Dec 26, 2013

With sinatra:

[27334] - Worker 27337 booted, phase: 0
--verify 11784860--verify 70141179127780
++verify 70141179127780
--verify 4390520
++verify 4390520

++verify 11784860--verify 4394260
++verify 4394260

--verify 70141179127440
++verify 70141179127440
--verify 70141179128120
++verify 70141179128120
--verify 70141179128460--verify 11853180

++verify 70141179128460++verify 11853180

+checkout 70141179127780
--clear 70141179127780
+checkout 4390520
--clear 4390520
+checkout 11784860
--clear 11784860
+checkout 4394260
--clear 4394260
+checkout 70141179127440
+checkin  70141179127780
++clear 70141179127780--clear 70141179127440
+checkin  4390520

[27337] 127.0.0.1 - - [26/Dec/2013 09:13:07] "GET /test HTTP/1.0" 200 1003 0.0629
++clear 4390520+checkin  11784860

[27337] 127.0.0.1 - - [26/Dec/2013 09:13:07] "GET /test HTTP/1.0" 200 1003 0.0598
++clear 11784860+checkin  4394260

[27337] 127.0.0.1 - - [26/Dec/2013 09:13:07] "GET /test HTTP/1.0" 200 1003 0.1254
++clear 4394260
+checkin  70141179127440
++clear 70141179127440
[27337] 127.0.0.1 - - [26/Dec/2013 09:13:07] "GET /test HTTP/1.0" 200 1003 0.0593
+checkout 70141179128120
+checkout 11853180[27337] 127.0.0.1 - - [26/Dec/2013 09:13:07] "GET /test HTTP/1.0" 200 1003 0.0603

+checkout 70141179128460
--clear 70141179128120
+checkin  70141179128120--clear 11853180
--clear 70141179128460
++clear 70141179128120
[27337] 127.0.0.1 - - [26/Dec/2013 09:13:07] "GET /test HTTP/1.0" 200 1003 0.0637

+checkin  11853180
++clear 11853180
+checkin  70141179128460
++clear 70141179128460[27337] 127.0.0.1 - - [26/Dec/2013 09:13:07] "GET /test HTTP/1.0" 200 1003 0.0636

[27337] 127.0.0.1 - - [26/Dec/2013 09:13:07] "GET /test HTTP/1.0" 200 1003 0.0641
^C[27334] - Gracefully shutting down workers...

With grape:

--verify 11533120
++verify 11533120--verify 18634280
++verify 18634280--verify 18626200

((use 18634280))
--clear 18634280
++clear 18634280++verify 18626200
((use 18626200))--verify 11495620
++verify 11495620
((use 11495620))
--clear 11495620
++clear 11495620--verify 18633040
++verify 18633040
((use 11533120))
--clear 11533120
++clear 11533120
--verify 11527000
++verify 11527000


--verify 18635360((use 11527000))
--clear 18626200
++clear 18626200

++verify 18635360((use 18633040))
--clear 18633040
--clear 11527000

((use 18635360))
--clear 18635360


--verify 18632340
++verify 18632340
((use 18632340))
--clear 18632340
+checkout 11495620
++clear 18633040++clear 11527000

+checkout 18626200
[27199] 127.0.0.1 - - [26/Dec/2013 09:09:41] "GET /api/tests/open HTTP/1.0" 200 1003 0.0480
[27199] 127.0.0.1 - - [26/Dec/2013 09:09:41] "GET /api/tests/open HTTP/1.0" 200 1003 0.0339
+checkout 18634280
++clear 18635360
[27199] 127.0.0.1 - - [26/Dec/2013 09:09:41] "GET /api/tests/open HTTP/1.0" 200 1003 0.0530
+checkout 11533120
++clear 18632340
[27199] 127.0.0.1 - - [26/Dec/2013 09:09:41] "GET /api/tests/open HTTP/1.0" 200 1003 0.0949
+checkout 11527000
[27199] 127.0.0.1 - - [26/Dec/2013 09:09:41] "GET /api/tests/open HTTP/1.0" 200 1003 0.0527
2013-12-26 09:09:46 +0000: Rack app error: #<ActiveRecord::ConnectionTimeoutError: could not obtain a database connection within 5.000 seconds (waited 5.000 seconds)>

+checkin and +checkout are the actual AR connection pool checkin and checkout.

--verify is just before the verify call in the before block, ++verify is just after. Similarly, --clear,++clear are around the clear_active_connections in the after block.

In summary, the order is at follows, with grape somehow causing reordering of clear and checkout.
Note: 'use' is not so much when the connection is actually used, but rather when sinatra's/grape's 'get' block is executed.

sinatra:

  1. verify
  2. use
  3. checkout
  4. clear
  5. checkin

grape:

  1. verify
  2. use
  3. clear
  4. checkout
  5. ...

@bwalex
Copy link
Contributor Author

bwalex commented Dec 26, 2013

Ok, I think I found the smoking gun: the issue is due to AR's lazy evaluation - if instead of returning something like Test.all I return Test.all.to_a, forcing evaluation, everything works.

My conclusion from that is that Grape runs the 'after' block before forcing evaluation of the return value of the 'get' block. Not sure if there's something that could be done in Grape - otherwise the solution is obviously to force evaluation inside the get, etc blocks.

@robertjpayne
Copy link

Well that explains a lot more, how are you formatting your responses in Sinatra? Grape's formatter works inside middleware and will happen after any after hooks are fired thus your connection pool issues if AR keeps it open or opens a new one.

Even grape-entity is lazy evaluated in the formatter level so you wont escape it there either.

@bwalex
Copy link
Contributor Author

bwalex commented Dec 26, 2013

In sinatra I use to_json which obviously forces the evaluation.

Is there any way for grape to deal with this transparently or do you really have to call Relation#load on each relation before returning from the get, etc, blocks?

@robertjpayne
Copy link

You'll have to bypass Grape's formatters and render the response yourself something like:

Helper:

def render(object)
  body object.to_json # grape body method
  content_type 'application/json' # grape content type method
end

Endpoint:

get / do
    render <AR Model>
end

Obviously could do a bit more or whatever you need to do but somehow you need to ensure AR has done all it's lazy loading.

@bwalex
Copy link
Contributor Author

bwalex commented Dec 26, 2013

Another aspect of grape that has just bitten me is that 'after' blocks don't seem to be run after an error! invocation. Is that intended?

@bwalex
Copy link
Contributor Author

bwalex commented Dec 27, 2013

I also tried using a decorator, but even then the part after the @app.call(env) is not run. To be honest it feels wrong that there is no (simple?) way of running something after every route, not just after every route that returns successfully.


class GrapeARWrapper
  def initialize(app)
    @app = app
  end

  def call(env)
    puts "---"
    ActiveRecord::Base.connection_pool.connections.map(&:verify!)
    status, headers, bodies = @app.call(env)
    puts "+++"
    ActiveRecord::Base.clear_active_connections!
    [status, headers, bodies]
  end
end

class API < Grape::API
  use GrapeARWrapper
 ...

@robertjpayne
Copy link

Yea error! does a "throw" and unwinds the entire API call dead in it's tracks. I've created a "abort!" helper which raises an exception instead as it's more manageable.

@bwalex
Copy link
Contributor Author

bwalex commented Dec 27, 2013

This is what I've come up with that seems to do the trick, but it isn't particularly elegant. It relies on error! throwing a hash, unlike the normal call, which returns an array.

class GrapeARWrapper
  def initialize(app)
    @app = app
  end

  def call(env)
    ActiveRecord::Base.connection_pool.connections.map(&:verify!)
    status, headers, bodies = catch(:error) do
      @app.call(env)
    end
    ActiveRecord::Base.clear_active_connections!
    if status.is_a?(Hash)
      throw :error, status
    else
      [status, headers, bodies]
    end
  end
end

@robertjpayne
Copy link

If you're resorting to wrapping the call method you're better making Middleware which is exactly that but a lot cleaner.

Middleware will always execute ( unless you have a massive error in your app code ).

@bwalex
Copy link
Contributor Author

bwalex commented Dec 27, 2013

Well, I don't think a plain middleware without overwriting the call method works, as it'll still unwind to the catch(:error) in the error middleware, so it isn't any cleaner - I'll just get the helper methods for free (content_type, response, etc).

@dblock
Copy link
Member

dblock commented Dec 29, 2013

FWIW, I would welcome any change in Grape that helps with this and would appreciate documentation, maybe an FAQ, blog post, whatever, that describes how to resolve the issue here.

I think having an after block that runs every time, even after error!, would be a good addition, although you should be able to accomplish the same with a middleware that's not mounted inside a Grape API.

@idyll
Copy link
Contributor

idyll commented Jan 18, 2014

Where did this end up @bwalex? Is it working for you now and if so what did you end up doing to resolve it?

@bwalex
Copy link
Contributor Author

bwalex commented Jan 21, 2014

I solved it using a rack middleware. I'll write up a few words about it at some point in the near future.

@dblock
Copy link
Member

dblock commented Mar 7, 2014

@bwalex Looking forward to some docs/README.

Should this still be open or should we close this issue?

@dblock
Copy link
Member

dblock commented Mar 7, 2014

Related, #585

@bwalex
Copy link
Contributor Author

bwalex commented Mar 9, 2014

The most minimal solution to this is to use the ActiveRecord::ConnectionAdapters::ConnectionManagement middleware, e.g. something like this in config.ru

use ActiveRecord::ConnectionAdapters::ConnectionManagement

I think that's pretty much all to it if you want to use AR with grape safely.

@dblock
Copy link
Member

dblock commented Mar 10, 2014

@bwalex Mind PRing a new section in README on ActiveRecord? Maybe within something general for data, much like we have for API formats?

@dblock dblock closed this as completed Mar 10, 2014
@jhollinger
Copy link

tl;dr To future readers, please note that you must use ActiveRecord::ConnectionAdapters::ConnectionManagement before you run/mount your Grape app, not inside of your Grape app. I usually put it near the top of config.ru.

When I was using ConnectionManagement inside of my Grape app, I noticed that anytime I used error! my connection was nevery checked back in. That's because Grape's error middleware was run first and the catch it used prevented AR's connection middleware from ever finishing. Don't think that's exactly a bug, but it's unexpected behavior and easy to fall prey to.

@dblock
Copy link
Member

dblock commented Oct 30, 2014

@jhollinger I think this is documentation-worthy. We do document Rails integration, so maybe an ActiveRecord section?

@jhollinger
Copy link

Yes, I think that would definately be helpful.

bwalex added a commit to bwalex/grape that referenced this issue Nov 2, 2014
bwalex added a commit to bwalex/grape that referenced this issue Nov 2, 2014
dblock added a commit that referenced this issue Nov 3, 2014
README - update for AR without Rails (see #517)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants