Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subscription does not recover from authentication credentials error (token expiry?) #318

Closed
eug48 opened this issue Nov 5, 2018 · 27 comments
Assignees
Labels
api: pubsub Issues related to the googleapis/nodejs-pubsub API. priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. triaged for GA type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@eug48
Copy link

eug48 commented Nov 5, 2018

Environment details

  • OS: Linux, GKE container
  • Node.js version: 8.12.0
  • npm version: 6.4.1
  • @google-cloud/pubsub version: 0.20.1

Steps to reproduce

  1. Created a subscription
import PubSub = require('@google-cloud/pubsub');
const gcpPubsubClient = new PubSub({ projectId });
const gcpPubsubSubscription = gcpPubsubClient.subscription(subscriptionId);
gcpPubsubSubscription.on('message', pubsubMsg => { ... })
gcpPubsubSubscription.on('error', err => { log(...) })
  1. Subscription works fine for many hours
  2. Suddenly in the middle of the night I get an uncaughtException:
uncaughtException { Error: 16 UNAUTHENTICATED: Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.
    at Object.exports.createStatusError (/app/node_modules/grpc/src/common.js:87:15)
    at Object.onReceiveStatus (/app/node_modules/grpc/src/client_interceptors.js:1188:28)
    at InterceptingListener._callNext (/app/node_modules/grpc/src/client_interceptors.js:564:42)
    at InterceptingListener.onReceiveStatus (/app/node_modules/grpc/src/client_interceptors.js:614:8)
    at callback (/app/node_modules/grpc/src/client_interceptors.js:841:24)

Bunyan also logged some metadata:

 error: {
   code: 16    
   details: "Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project."    
   
metadata: {
_internal_repr: {  
www-authenticate: [
      0: "Bearer realm="https://accounts.google.com/", error="invalid_token""       
     ]
    }
   }
  }

The gcpPubsubSubscription error handler also fires:

err: {
   code: 16    
   message: "Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project."    
   name: "Error"    
   stack: "Error: Request had invalid authentication credentials. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.
    at Immediate.onConnectionStatus (/app/node_modules/@google-cloud/pubsub/src/connection-pool.js:225:25)
    at runCallback (timers.js:812:20)
    at tryOnImmediate (timers.js:768:5)
    at processImmediate [as _immediateCallback] (timers.js:745:5)"    
  }
  1. After this the subscription object no longer receive messages & the application is in a broken state. A publisher created from the same gcpPubsubClient continues to work.

It would be nice if the subscription could try to recover from this automatically. I would also appreciate any recommendations about the best ways to recover from such errors.

@JustinBeckwith JustinBeckwith added triage me I really want to be triaged. 🚨 This issue needs some love. labels Nov 5, 2018
@JustinBeckwith JustinBeckwith added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Dec 2, 2018
@callmehiphop
Copy link
Contributor

@stephenplusplus Do you have an opinions on how we should handle this? Part of me thinks this might be a problem for google-gax to solve - attempt to re-auth on this error. But it might also be pretty easy for us to delete any cached gapic clients when we see an error like this.

@JustinBeckwith
Copy link
Contributor

This feels like the sort of thing that should be handled in gax. google-auth-library has all the retry logic for failed auth attempts when doing HTTP requests, but can't really help here in the case of a grpc channel. I have to imagine other clients would deal with this as well.

@stephenplusplus
Copy link
Contributor

@callmehiphop that sounds right to me.

@nikkatalnikov
Copy link

any news on this issue?
we suffer exactly same problem with GKE and node.

@callmehiphop
Copy link
Contributor

callmehiphop commented Feb 26, 2019

I've been trying to dig into this and I'm not entirely sure this is something we actually can fix in gax (not very easily at least). AFAIK grpc channels don't actually let you supply new credentials to a pre-existing clients, instead it looks like if the credentials expire you have to make a new client altogether. I think if we wanted gax to handle this, either grpc would need to expose a method for supplying new credentials or else gapics would have to add logic to create new grpc clients.

@alexander-fenster does this sound correct? Do you have any thoughts here?

@sduskis
Copy link

sduskis commented Apr 9, 2019

The credentials don't expire, but the auth token might be revoked before the expiry date of the token. There are two solutions to that:

  1. use JWT instead
  2. change GAX to refresh oauth2 tokens when the server returns UNAUTHENTICATED

@JustinBeckwith
Copy link
Contributor

Taking a closer look - this is a really old version of the @google-cloud/pubsub npm module. Could you upgrade to 0.28.1 and let us know if you have the same issue?

@nikkatalnikov
Copy link

For 0.28.1 it is the same

@sduskis
Copy link

sduskis commented Apr 24, 2019

Here's what I think is happening, based on some research I did in the Cloud Bigtable java client:

  1. The service account credentials or compute credentials can be used to get an oauth2 token. The client caches the token, which includes an expiration timeout.
  2. A bunch of requests get the oauth2 token in the header, and everything works fine
  3. Something happens on the server-side to cause the cached oauth2 token to be revoked.
  4. The requests that used to work don't work anymore, and the server sends UNAUTHORIZED for all requests. The client keeps trying with the same token to no avail.
  5. I'd guess that if you keep the client going long enough, specifically until after the timeout in 1) passes, the client will start working again.

I wrote a custom interceptor for the Bigtable client that intercepts situations like step 4, and automatically refreshes the token. You can see the java code here.

@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label May 4, 2019
@sduskis
Copy link

sduskis commented May 9, 2019

Ajay thinks that the issue relates to having a subscriber that's open for a long time. At some point, the expiration will occur, and the stream will fail with UNAUTHORIZED. In that case, the resolution is basically retrying the stream.

@ajaaym
Copy link

ajaaym commented Jun 7, 2019

@eug48 can you please use authentication as below? Closing this for now, please feel free to reopen if this doesnt help.

const { GoogleAuth } = require( 'google-auth-library');
const pubsub = new PubSub({auth : new GoogleAuth()});

@joshughes
Copy link

joshughes commented Jul 8, 2019

@ajaaym

Our team currently has to wrap the pubSub client library in our own retry logic to recover from auth issues.

I was hoping for some explanation on why you believe the fix you suggest above will help.

const { GoogleAuth } = require( 'google-auth-library');
const pubsub = new PubSub({auth : new GoogleAuth()});

It is not clear why this would be any different than what is going on in this library.

this.auth = new GoogleAuth(this.options);

It is also not clear how your proposed fix changes how this would behave.

Please consider reopening the ticket, as we continue to see the auth issue in production which can cause messages to not be delivered and never retired.

@ajaaym
Copy link

ajaaym commented Jul 8, 2019

@joshughes by passing auth object externally and setting credential to service account, it uses JWT authentication. In this case we observed that server sends closes stream with status code as 2(UNKNOWN) with text "stream removed" when token is expired. Are you using external auth object as described above?

@joshughes
Copy link

@ajaaym

Can you explain how passing that auth option overrides this? It seems like the library will still create its own auth client.

this.auth = new GoogleAuth(this.options);

@ajaaym
Copy link

ajaaym commented Jul 8, 2019

@joshughes please see here, its using the options which contains the auth object thats already passed in.

gaxClient = new v1[config.client](this.options) as gax.ClientStub;

@joshughes
Copy link

@ajaaym thanks... I think I have parsed through far enough to find what you mean

this.auth = gaxGrpc.auth;

I am still confused why the default for this library is going to remain broken with this bug. How do you see this being fixed in the long term?

It seems like for everyone not digging through the closed issues, this will mean when they use pubsub they are going to hit this auth error and lose messages.

@JustinBeckwith JustinBeckwith reopened this Jul 9, 2019
@JustinBeckwith
Copy link
Contributor

We should dig into this one a little more.

@xoraingroup
Copy link

Hi,
I have been observing this issue from quiet long and its a kind of disaster for us. I have tried what @ajaaym mentioned using new GoogleAuth but this does not work other than my local machine. The message is neither sent nor received, it just hangs there. Is there a way to debug this full process.

Test same code for sending and receiving messages with p12 json creds file.
How I am initialising?
const pubsub = new PubSub({ auth: new GoogleAuth({projectId: projectId, keyFilename: keyFilename}) });

  1. tested on a KVM vm. (does not work)
  2. tested on 2 bare metal servers with public ip. (does not work)
  3. tested on local machine MAC OS . (Works)

without GoogleAuth, same code works on all above machines.
const pubsub = new PubSub({ projectId: projectId, keyFilename: keyFilename });
Please some body help.

@callmehiphop
Copy link
Contributor

@xoraingroup are you attaching an error handler to the Subscription as well? You can try enabling grpc logging with the following environment variables

export GRPC_TRACE="http"
export GRPC_VERBOSITY="DEBUG"

@xoraingroup
Copy link

xoraingroup commented Jul 19, 2019

@callmehiphop thank you. Yes there is error handler, but no errors are thrown. It just hangs out there. Let me try with this debug mode. I was searching for how to enable debug mode to see whats going. Will update you with trace.

@xoraingroup
Copy link

xoraingroup commented Jul 19, 2019

@callmehiphop

So here are my findings with trace. I added api to grpc trace as well.

This is with normal pubsub without GoogleAuth and it works.

D0719 18:42:34.794252105   26691 dns_resolver.cc:334]        Using native dns resolver
I0719 18:42:34.794341921   26691 init.cc:155]                grpc_init(void)
I0719 18:42:34.794645063   26691 completion_queue.cc:481]    grpc_completion_queue_create_internal(completion_type=0, polling_type=0)
I0719 18:42:34.889978321   26691 ssl_credentials.cc:128]     grpc_ssl_credentials_create(pem_root_certs=(null), pem_key_cert_pair=(nil), verify_options=0x7ffce1751820, reserved=(nil))
I0719 18:42:34.891233257   26691 ssl_credentials.cc:128]     grpc_ssl_credentials_create(pem_root_certs=(null), pem_key_cert_pair=(nil), verify_options=0x7ffce1751820, reserved=(nil))
I0719 18:42:34.909548503   26691 ssl_credentials.cc:128]     grpc_ssl_credentials_create(pem_root_certs=(null), pem_key_cert_pair=(nil), verify_options=0x7ffce1750b90, reserved=(nil))
I0719 18:42:34.909742388   26691 ssl_credentials.cc:128]     grpc_ssl_credentials_create(pem_root_certs=(null), pem_key_cert_pair=(nil), verify_options=0x7ffce1750b90, reserved=(nil))
I0719 18:42:34.913697265   26691 plugin_credentials.cc:250]  grpc_metadata_credentials_create_from_plugin(reserved=(nil))
I0719 18:42:34.913776938   26691 composite_credentials.cc:206] grpc_composite_channel_credentials_create(channel_creds=0x3fbc160, call_creds=0x3fd2f40, reserved=(nil))
I0719 18:42:34.914062761   26691 secure_channel_create.cc:198] grpc_secure_channel_create(creds=0x3fd3320, target=pubsub.googleapis.com:443, args=0x3fd2820, reserved=(nil))
I0719 18:42:34.914126656   26691 channel.cc:285]             grpc_channel_get_target(channel=0x406fc20)
I0719 18:42:34.914385118   26691 plugin_credentials.cc:250]  grpc_metadata_credentials_create_from_plugin(reserved=(nil))
I0719 18:42:34.914410986   26691 composite_credentials.cc:206] grpc_composite_channel_credentials_create(channel_creds=0x40708f0, call_creds=0x3fd27d0, reserved=(nil))
I0719 18:42:34.914483962   26691 secure_channel_create.cc:198] grpc_secure_channel_create(creds=0x3ff6a40, target=pubsub.googleapis.com:443, args=0x3fd1990, reserved=(nil))
I0719 18:42:34.914527754   26691 channel.cc:285]             grpc_channel_get_target(channel=0x407d9c0)
I0719 18:42:34.914653548   26691 plugin_credentials.cc:250]  grpc_metadata_credentials_create_from_plugin(reserved=(nil))
I0719 18:42:34.914666070   26691 composite_credentials.cc:206] grpc_composite_channel_credentials_create(channel_creds=0x402df50, call_creds=0x4007210, reserved=(nil))
I0719 18:42:34.914791098   26691 secure_channel_create.cc:198] grpc_secure_channel_create(creds=0x40337b0, target=pubsub.googleapis.com:443, args=0x4070140, reserved=(nil))
I0719 18:42:34.914832816   26691 channel.cc:285]             grpc_channel_get_target(channel=0x40e95f0)
I0719 18:42:34.914965255   26691 plugin_credentials.cc:250]  grpc_metadata_credentials_create_from_plugin(reserved=(nil))
I0719 18:42:34.914977643   26691 composite_credentials.cc:206] grpc_composite_channel_credentials_create(channel_creds=0x4027da0, call_creds=0x40700f0, reserved=(nil))
I0719 18:42:34.915043961   26691 secure_channel_create.cc:198] grpc_secure_channel_create(creds=0x4027490, target=pubsub.googleapis.com:443, args=0x40457b0, reserved=(nil))
I0719 18:42:34.915071995   26691 channel.cc:285]             grpc_channel_get_target(channel=0x3fb4530)
I0719 18:42:34.916043114   26691 channel.cc:285]             grpc_channel_get_target(channel=0x3fb4530)
I0719 18:42:34.916494478   26691 metadata_array.cc:29]       grpc_metadata_array_init(array=0x3fd1998)
I0719 18:42:34.916514443   26691 metadata_array.cc:29]       grpc_metadata_array_init(array=0x40450a8)
I0719 18:42:34.916520942   26691 call.cc:1905]               grpc_call_start_batch(call=0x40dc950, ops=0x40273b0, nops=2, tag=0x40e9430, reserved=(nil))
I0719 18:42:34.916527827   26691 call.cc:1511]               ops[0]: SEND_INITIAL_METADATA(nil)
I0719 18:42:34.916532560   26691 call.cc:1511]               ops[1]: RECV_INITIAL_METADATA ptr=0x40450a8
D0719 18:42:34.9165402

When added GoogleAuth here is the trace. The last line in the trace where it hangs indefinitely.

D0719 18:46:06.122478049   28952 dns_resolver.cc:334]        Using native dns resolver
I0719 18:46:06.122548178   28952 init.cc:155]                grpc_init(void)
I0719 18:46:06.122857304   28952 completion_queue.cc:481]    grpc_completion_queue_create_internal(completion_type=0, polling_type=0)
I0719 18:46:06.218477847   28952 ssl_credentials.cc:128]     grpc_ssl_credentials_create(pem_root_certs=(null), pem_key_cert_pair=(nil), verify_options=0x7fffe87b9570, reserved=(nil))
I0719 18:46:06.219746199   28952 ssl_credentials.cc:128]     grpc_ssl_credentials_create(pem_root_certs=(null), pem_key_cert_pair=(nil), verify_options=0x7fffe87b9570, reserved=(nil))
I0719 18:46:06.242209889   28952 plugin_credentials.cc:250]  grpc_metadata_credentials_create_from_plugin(reserved=(nil))
I0719 18:46:06.242322095   28952 composite_credentials.cc:206] grpc_composite_channel_credentials_create(channel_creds=0x2ba7e40, call_creds=0x2bb3280, reserved=(nil))
I0719 18:46:06.242629101   28952 secure_channel_create.cc:198] grpc_secure_channel_create(creds=0x2b33570, target=pubsub.googleapis.com:443, args=0x2bb32f0, reserved=(nil))
I0719 18:46:06.242682825   28952 channel.cc:285]             grpc_channel_get_target(channel=0x2b883b0)
I0719 18:46:06.242948904   28952 plugin_credentials.cc:250]  grpc_metadata_credentials_create_from_plugin(reserved=(nil))
I0719 18:46:06.242981705   28952 composite_credentials.cc:206] grpc_composite_channel_credentials_create(channel_creds=0x2ba4190, call_creds=0x2be6ce0, reserved=(nil))
I0719 18:46:06.243059902   28952 secure_channel_create.cc:198] grpc_secure_channel_create(creds=0x2b89440, target=pubsub.googleapis.com:443, args=0x2aabb80, reserved=(nil))
I0719 18:46:06.243092981   28952 channel.cc:285]             grpc_channel_get_target(channel=0x2ba5620)
I0719 18:46:08.254372279   28952 credentials.cc:43]          grpc_channel_credentials_release(creds=0x2ba7e40)
I0719 18:46:08.254401576   28952 credentials.cc:43]          grpc_channel_credentials_release(creds=0x2ba4190)
I0719 18:46:08.254408719   28952 credentials.cc:49]          grpc_call_credentials_release(creds=0x2bb3280)
I0719 18:46:08.254424983   28952 credentials.cc:49]          grpc_call_credentials_release(creds=0x2be6ce0)
I0719 18:46:08.254430087   28952 credentials.cc:43]          grpc_channel_credentials_release(creds=0x2b89440)
I0719 18:46:08.441746312   28952 credentials.cc:43]          grpc_channel_credentials_release(creds=0x2b33570)

@xoraingroup
Copy link

Also with latest version v0.30.1 of pubsub the trace is not printing.

@xoraingroup
Copy link

Ok, if I try to export GOOGLE_APPLICATION_CREDENTIALS with service json file, then it works. But If i use new GoogleAuth({projectId: myGCPProjectId, keyFilename: serviceFilePath}), then it does not work in that vm or other servers, except my local computer.

@stephenplusplus
Copy link
Contributor

Related issue: googleapis/nodejs-speech#243

@eug48 @joshughes @xoraingroup We have recently made a fix that more eagerly refreshes an access token. Have you still been seeing this error after updating to the latest release?

@murgatroid99
Copy link

I have just been made aware of this issue so I want to clarify the information in #318 (comment) about how grpc handles credentials. It is true that if you construct a channel using credentials, those credentials do not change for the lifetime of the channel. But that credentials object can provide the corresponding metadata dynamically in a function that is called for every new request. The intention for a situation like this where credential information can expire is that that function will track the expiration time and automatically refresh the credentials information as necessary. I was under the impression that the Google Auth Library had that functionality.

@bcoe
Copy link
Contributor

bcoe commented Oct 8, 2019

@murgatroid99 the Google library does have this functionality, and we recently fixed a major bug with it (such that it actually works 😆). This is great to know.

@callmehiphop
Copy link
Contributor

Howdy folks! We think that this has been fixed by the latest auth library release. If anyone here could let us know if they're still experiencing this issue after updating dependencies, that would be great. Otherwise I'm planning on closing this out. Thanks!

@google-cloud-label-sync google-cloud-label-sync bot added the api: pubsub Issues related to the googleapis/nodejs-pubsub API. label Jan 31, 2020
feywind pushed a commit to feywind/nodejs-pubsub that referenced this issue Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the googleapis/nodejs-pubsub API. priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. triaged for GA type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests