This repository has been archived by the owner on Jan 15, 2024. It is now read-only.
Message for pool saturated, shutdown pool instead of disconnect, logging around failover, etc #338
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In most scenarios,
node.disconnect
doesn't make much sense (ie connection pool is saturated, not going to be very helpful to try to get a connection again just to disconnect it).This is especially true given the way the new underlying connection pool works, which means if you are operating on separate
pool.with
blocks (not nested), the connection pool may have already returned the connection you were working with to the general pool (if it's the last on the stack of connections it goes back in the available pool and sincepool.with
always ensures that a connection is checked in, we would have to do a checkout and ensure checkin on our own if we want to work on the same connection or have disconnect happen within the existing block).It seems much cleaner to just shutdown the pool (as that is the only API we really have from the connection pool to disconnect all connections) and create a new one, so we can ensure we have a fresh pool and connections. Otherwise we need to change the underlying way a node deals with connections and the pool. In general, it's a bit frustrating the way sessions/clusters/nodes are not shared at all across threads, as it prevents them from really communicating events/failovers/state etc without all of them discovering it separately? Correct me if I'm wrong, but this seems at least true with Mongoid's implementation via the Session::SessionFactory when you call Mongoid.default_session? I was working on another PR to move the pools to the node objects to simplify that logic, when I realized every thread had a different copy of the node, which is clearly why ConnectionManager is there in the first place? If the current architecture remains, does it make sense to have a NodeManager that can keep the state across them to record and mark them down, etc? IMHO, there are 3 or 4 layers now of thread protections that seem redundant, but I'm not familiar enough with all of the pipelining etc that is happening, even though it already seems like that's built into the Executable module.
I've made some more comments in the code below. But in general, this PR rescues a Timeout::Error which might be thrown by ConnectionPool in response to either not being able to get a connection within pool_timeout because they are all in use and/or it couldn't create any new connection instances because there are already pool_size connections created (this Timeout::Error has nothing to do with whether a connection can be established to a host). It raises a new exception Errors::PoolSaturated (both Errors::PoolSaturated and Errors::PoolTimeout already existed in the Moped codebase, I chose the former) with a message about not being able to get a new connection in certain amount of time and a suggestion to increase pool_size (might be nice in the future to have an auto-increment for pool size?). In combination with this PR for Mongoid, https://github.com/mongoid/mongoid/pull/3884, exceptions that happen DURING a command/query should be logged with them so they don't look like they are succeeding.
I'm also adding logging around events that seem like they are crucial, especially given failover events that might need to be recorded. I'm going to continue expanding logging/debugging in PR's, as Mongoid/Moped are rather devoid of it, and while 99% of the time we have zero issues with either, we've recently run into some situations where it would have been extremely helpful.