-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
httplib2.Http is not thread-safe #1214
Comments
Also -- this is using gcloudoem, which most (all?) of the datastore code was taken from gcloud-python. If you do think this is only an issue with gcloudoem and not gcloud-python, let's close this issue and reopen it for the gcloudoem owner. |
Unfortunately, even if we did find and fix a bug in gcloud-python, gcloud-datastore-oem / gcloudoem copied, pasted and modified the code, so our changes won't fix anything for this particular situation. I think it's worth a quick shot at trying to reproduce on our side, but we still need to open a bug at gcloud-datastore-oem. |
True -- but based on the type of error I'd expect it to also be an error in gcloud-python (unless we've already fixed it). If we are fairly confident it isn't, we can just open a bug against gcloud-datastore-oem and forget about it(may be worthwhile to do this anyways! -- the bug, not the forgetting) |
Can you provide code to repro? Like, loop 5,000 times and then it happens probably? |
I originally reported this problem and I can provide a little more context. The problem happens quite regularly when I am developing code from my local machine using the production datastore. However, if I use the gcd tool locally instead of actual datastore, it never happens. It happens less regularly in production from a compute instance. It might also be worth noting that the code is run in a docker container. I've tried installing pyOpenSSL to see if the problem is rectified but it appears to have no affect. I will write some code using only gcloud-python to see if I can reproduce the problem. I will set it running locally first because I'm slightly concerned about the potential cost it could accumulate on gcloud. I'll report back with results. |
Here is the code I am running. I will report back when something goes wrong. |
Thanks @rstuart85! It looks like a very simple script 👍 |
The script has been running for a couple of days with no errors. So there appears to be something related to my setup that is causing the SSL errors. I'll keep investigating. |
Another update. It isn't just Datastore that gets the problem. Here is a stacktrace from trying to access a bucket using gcloud-python.
And here is the code:
As you can see, I've tried to sidestep the errors using retires. |
Just incase it is worth while, here is the Dockerfile I use to build the environment this runs from.
And the requirements.txt:
|
It's worrisome. I'm curious if the error is
Maybe we should call in @jcgregorio to see if he has encountered this anywhere else with |
I'm curious if turning off stdin buffering ( Have you tried catching the |
No, but I'll give that a try. It happens both inside and outside of docker (on the production server which uses docker and on my local laptop which doesn't), so I doubt it is docker related. My local machine is OSX and the server is created using docker-machine. |
Very good to know! Local laptop is running which OS? |
OSX |
Nice. And the docker instance is running on OS X or on a linux VM somewhere? |
Linux VM on Google Compute created using docker-machine. |
Good. This (almost) certainly means the issue is in the libraries. |
I've seen issues related to SSL with httplib2, but they are almost always related to httplib2 using its own bundled certificate store. I can't find anything related to these issues. |
"these issues" meaning the "bundled certificate store" or the "SSL: WRONG_VERSION_NUMBER" |
The two exceptions I get which are "SSL: WRONG_VERSION_NUMBER" and "SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC" |
Did you ever find a stracktrace for Some places where the error occurs (they don't really illuminate): What versions of Alternate theory: this error occurs because you are finding a Google server which doesn't speak the right SSL protocol. From an openssl mailing list:
UPDATE: Some sections of |
You could use Wireshark to capture the traffic and make a Capture Filter It seems that Python # SSLv3 has problematic security and is only required for really old
# clients such as IE6 on Windows XP
context.options |= OP_NO_SSLv3 Double-check that file on your systems to make sure SSLv3 isn't the problem? |
I haven't had another stacktrace yet for The post from the mailing list is interesting but the only people that could elaborate on that would be Google Engineers I assume? Do all your servers speak SSL2? |
I will check SSLv3 support. |
After POODLE, SSLv1, SSLv2 and SSLv3 are deprecated (SSLv1 isn't even implemented). SSLv23 is actually the default in I checked on my machine with Python 2.7.6 (default on Ubuntu 14.04) and SSLv3 is still enabled. |
@rstuart85 Were you able to check your I may also try to reproduce on my machine and monitor with Wireshark to find out what was happening. |
@krisrogers is getting this problem regularly from his development machine so he is going to debug it and report back here. |
Great thanks. I've been running an infinite loop with just getting an object from a bucket and having no error (not even a 500) after
UPDATE: I killed the script after no failures in
This was on my local machine and it was connected to the web via my apartment WiFi. |
I'm pretty sure it is a client side issue because it happens a lot more in development rather than production. Either way, @krisrogers should be able to give some more info shortly. |
Is there any progress to report on this issue? I use this library in my flexible environment instances (which are required to be configured as threadsafe) and see a lot of possibly related errors in the logs. I've tried some of the workarounds listed here, but none of them resolved the issues. |
@eric-optimizely I would suggest you use the library in a thread-safe manner, e.g, construct a new client instance per-request. |
@eric-optimizely, @jonparrott holding a client instance per thread, assuming your application can be set up like that, would work as well. |
@jonparrott @tseaver Using the
|
@eric-optimizely That error occurs when multiple threads read bytes from a payload at once, which gives invalid crypto bits. This likely means multiple threads have access to the same |
@dhermes Ok, thanks for confirming. I've used the |
Yeah, your current best bet is to create a thread-local |
@eric-optimizely Just in case it's useful: I fixed this for my stuff a while ago by making the connection object threadsafe. You can see the changes here. Hope it's useful. |
@rstuart85 Cool, thanks! This looks like it's based on #1274 which should probably just be merged by @dhermes and @jonparrott , unless it's still incomplete or there's some other blocker. I'm not particularly interested in maintaining a fork of this library -- I just want it to work correctly on Google's own cloud platform. |
@eric-optimizely I think the outcome of the discussion here was that it should be fixed upstream. Maybe @dhermes can confirm or deny that. |
Yes I am taking this on right now, the goal is to get a divorce from |
I just re-deployed my app and I went from never seeing this issue, to getting thousands of them. Did anything recently change? They come in a number of flavors:
I was previously sharing http connections across threads. I'll make them theadsafe and see if that helps. |
Making my use of httplib thread-safe fixed the issue. Not sure what I did to cause this error to pop up, but I was tinkering with some related code, so I may have caused it. If it helps anyone else, below is the code that I use to get the authentication token. It relies on a dictionary PRIVATE_KEY = {'my-project' : {'email':'serviceaccount@foo.com', 'key':'my_key.pem'}}
auth_cache = threading.local()
auth_cache_lock = threading.Lock()
def get_auth(project, force = False, scopes = None):
global auth_cache, auth_cache_lock
# At some point oauth2client had breaking changes. There are different
# copies of this library floating around, so handle whichever default imports.
try:
from oauth2client.client import SignedJwtAssertionCredentials
except ImportError as e:
SignedJwtAssertionCredentials = None
from oauth2client import crypt as oauth_crypt
from oauth2client.service_account import ServiceAccountCredentials
if project not in PRIVATE_KEY:
raise Exception("Cannot connect to %s"%project)
with auth_cache_lock:
if not hasattr(auth_cache, 'auths'):
auth_cache.auths = {}
if not force and project in auth_cache.auths:
return auth_cache.auths[project]
f = None
key = PRIVATE_KEY[project]['key']
for path in ('updater/', '../updater/', ''):
pfile = path + key
try:
f = file(pfile, 'rb')
except IOError:
continue
else:
break
if not f:
raise Exception("Could not find key file.")
pem_contents = f.read()
f.close()
if scopes == None:
scopes = DEFAULT_SCOPES
# The first parameter, service_account_name, is the Email address created
# for the Service account. It must be the email address associated with
# the key that was created.
if SignedJwtAssertionCredentials:
# The old way of doing it.
credential = SignedJwtAssertionCredentials(
PRIVATE_KEY[project]['email'], pem_contents, scope = scopes)
else:
# The new way (improvement?)
signer = oauth_crypt.Signer.from_string(pem_contents)
credential = ServiceAccountCredentials(
PRIVATE_KEY[project]['email'], signer, scopes = scopes)
credential._private_key_pkcs8_pem = pem_contents
http = httplib2.Http()
http = credential.authorize(http)
auth_cache.auths[project] = http
return http |
Superseded by the discussion in #1346 |
For those still following this issue, #3674 is putting the final nail in our usage of |
I have the issues with The workaround for me was to clear the connections before making each request: |
This is from a StackOverflow post. I've done some debugging from the Datastore side but I don't think these requests ever make it into the Datastore part of the stack. I'd appreciate if someone look at this from the gcloud-python team. Note that this issue was originally reported via gcd-discuss@google.com In July.
I have a Python Django application running on a Google Compute instance. It is using gcloudoem to interface from Django to Google Datastore. gcloudoem uses the same underlying code to communicate with Datastore as gcloud-python 0.5.x
At what seems to be completely random times, I will get SSL errors happening when trying to talk to Datastore. There is no pattern in where in my application code these happen. It's just during a random call to Datastore. Here are the two flavours of errors:
Unfortunately, for the second, I don't have a full stacktrace handy:
These errors don't happen when I am using the GCD tool. Does anyone have any idea what is happening here? Is this some sort of networking problem?
The text was updated successfully, but these errors were encountered: