-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduced timeout and log output for IMDS timeout #192
Conversation
@@ -46,7 +47,7 @@ | |||
AMI_ID, | |||
} | |||
|
|||
private static final int TIMEOUT_MILLIS = 2000; | |||
private static final int TIMEOUT_MILLIS = 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about keeping a higher (maybe not this high) read timeout and only lowering connect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering wether there's a better detection mechanism for EC2, before we actually start up the plug-in... similar to how we detect Lambda environments from env vars. Failing fast is good too though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling it's possible to speed up the positive case but we may be stuck on the negative case. EC2 provides low level customization of everything so env cars, file system can't be relied. I found a post where the hostname will end with an Amazon suffix by default but that's only if it's not overridden...
@willarmiros Out of curiosity what JVM did you test the timeout on? Java 11+ have a completely revamped network stack and I wonder if this may have improved. I think most tools fail fast when an IP is completely unreachable, which should be what's happening with metadata service which uses a link local, not public, IP address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@awssandra as Rag said, can't rely on much other than IMDS. There's no single file to check for that exists on all instances on Windows & Linux. Hostname is overridden in VPCs which many EC2 Customers use.
@anuraaga I tested the timeout using Java 8. I can change it to have say a 100 ms Connect timeout and 1 second read timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most tools fail fast when an IP is completely unreachable
Do you mean they throw an exception before even timing out? Would this reliably happen in Java 8 & 11?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I just meant that newer network stack might through an exception right away regardless of the timeout value. Not a big deal mostly just curious :)
Description of changes:
Currently,
EC2Plugin.isEnabled
calls the IMDS endpoint to determine whether it is operating on EC2. This is fine when we are operating on EC2, since from my tests it takes <3 milliseconds to reach IMDS on an instance.However, this is problematic as we move toward an opt-out plugin model. In this model, we check
isEnabled
for each plugin regardless of environment, so both failing and passing should be fast. CurrentlyEC2Plugin.isEnabled()
takes 4 seconds to fail, because the timeout for the IMDS endpoint check is 2 seconds and it checks IMDSv1 and v2 before failing. This is far too long for non-EC2 customers using opt-out plugins.This PR lowers the timeout quite a bit and suppresses logging if we hit it to be friendlier to non-EC2, opt-out plugin customers. Given how fast successful IMDS requests are, the lower timeout should not impact the plugin's detection ability. To be even faster, we could avoid checking the second check to IMDSv1 when the v2 request times out, but that would require a bit of weird throwing logic.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.