-
-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting "invalid short sasl lifetime millis" periodically with AWS MSK using SASL authentication #136
Comments
@sharonx what milli is being logged there? It surprises me that AWS would send back that a token is valid for <5s. Also, does this seem to happen every 15 hours? |
Thanks @twmb for the quick reply. The milli logged are anywhere between 1000 and 5000. This happens irregularly from 2 hours - 12 hours. |
Have you seen <1000? I can remove the restriction on tokens being at least 5s. I just took a skim of the Java code, it looks like they use no restriction on the token lifetime, but choose somewhere between 85% and 95% of the lifetime to reauthenticate. It's pretty odd that the broker returns such short lived tokens IMO, but besides added latency, it's ok. My main worry would be spin looping a bit if I try to reauthenticate and get such a short lifetime that I reauthenticate again immediately, and then again. Are you open to trying the v1.4 branch? I'd rather put this in v1.4 rather than have another v1.3 path release, and I'm planning to tag v1.4 either tonight or tomorrow (waiting for some feedback on one commit in it). |
I haven't seen any log with < 1000. The smallest I've seen is I'm open to try the v1.4 branch. |
Cool, I'll push a commit to v1.4 in 40 minutes to an hour, this will be promoted to stable shortly. |
Awesome! Thanks again @twmb |
Mind trying dc51630? |
I just pulled the commit off and running the test now. I'll let it run for a while report back tomorrow. |
@sharonx actually I just realized a small mistake with that commit and have squashed into it. The commit as linked previously used the lifetime as given from the broker directly, rather than subtracting anything off of it (bad variable reuse, and the new linters I use mostly would have noticed this), if you can use 6e4c5b5, that now uses the new |
Interesting, I wonder if if reauthentication wasn't quick enough due to being 80% of the session lifetime if the session lifetime is <5s. Maybe I could change it to just reauthenticate immediately if <5s. |
I think reauthenticate immediately if <5s is a better idea. It'll be more flexible if I can configure the time but not a big deal. Another option would be to retry authentication when |
Per KIP-368, AWS should be killing the connection, not accepting requests and rejecting them as invalid auth. I'll take a look at another form of patching this and squash into place again (one benefit of an unmerged branch, cleaner commits...). Ideally, <1hr eta. I'm also working on a new feature or two that I want to squeeze into v1.4, so an official tag may not come until next week. I'll post here when I squash into place. |
New commit to try, 3e6f0fd Now, we always take 2.5s off the lifetime (or 1.1x the reauth latency, whichever is larger, but I expect 2.5s to always be larger). If this causes an immediate reauth, we sleep 100ms to avoid hot looping. If we reauth 15x in a row (which is overkill), the client will kill the connection itself and retry on a new connection. This should also avoid the TOPIC_AUTHORIZATION_FAILED error, which is indistinguishable to if you at runtime actually did remove permissions from this account (so we cannot react to that error reliably). |
Sounds good. I'm running the test with the new commit. I'll report back by tomorrow. |
@twmb I think this new commit is stable. I have two pods that are running for over 20 hours without any restarts 🥳! I think the next step is to make a new release with this commit. Is this something you can pick up soon? |
I'm currently testing whether it makes sense to serialize calling all promises, as well as touching up a few things in how promises are currently called. I expect 1.4 to be tagged sometime next week (preferably the earlier half of the week). |
1.4 is going to be a large release, so I'm going to split this into v1.3.5 if you'd rather avoid bumping to v1.4 (although I appreciate more testing of 1.4 😁 ) |
This has been released in v1.3.5! |
(also v1.4.0 is tagged now) |
Thank you again. I upgraded to 1.3.5 and everything work well so far! |
I'm getting "invalid short sasl lifetime millis" periodically (after a few hours) with AWS MSK using SASL authentication.
Looking at the code, it errors out when the
lifetimeMillis
is less than 5 sec but would re-authenticate if the time is greater than 5 sec.The periodical re-authentication works most of the time because AWS auth token expires every 15min and I only see this issue after a few hours. When it happens,
lifetimeMillis
become less than 5s. Could this be a clock sync issue? Any idea how I can prevent this error from happening?The text was updated successfully, but these errors were encountered: