-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design Doc to redesign sys IP determination and promotion #6793
Comments
This is now ready for review! |
@nellshamrell the IP address is actually that of the Supervisor itself, so it's not anything that package authors can influence. Instead, I think we'd need to have some kind of CLI option to allow a user to (optionally) specify an interface to use for the Supervisor's communication. There is probably some unfortunate conflation of system and service information in the code, which just ends up confusing things If there's a desire to specify which interfaces individual services can use for their communication, that would likely be a separate issue. |
I think this may not be the way to go, mirroring @christophermaier's thoughts. This couples machine-specific configuration with Habitat Packages, and I think would overall make it less portable. At the very least I think having this at the build-time/plan level may not be ideal. A quick thought on this would draw on my example again, as I was testing a package that would require At the very least this introduces another scenario:
|
Quoting @christophermaier :
I think this is the way to go. |
Good points, all, reworking it to be a CLI option to use for the Supervisor's communication. |
This has been updated and is ready for review again! |
I see that the A few testing scenarios that prove the functionality on Linux and Windows will be great to have. Aside from that, this is looking pretty nice. I like the idea of failing if we can't come up with a sensible address, as opposed to falling back to |
👍
Last thought I have left is if it might make more sense to default to I might be overthinking this, and this would also result in a much greater downstream impact. Perhaps someone with more of a background on the use cases for |
@sirajrauff It's a little difficult to trace in the code, but I think that If you're able to confirm or deny any of this while you're in this code, @nellshamrell, this distinction would be nice to add to the relevant documentation (both user-facing and developer-facing 😅 ) As the distinction and use of these addresses is refined, a potential future enhancement might be to allow an operator to specify which interface / IP address a given service should advertise as its own (e.g., you want to get gossip on one interface, and service traffic on another). This would be something specified at |
Another thing to consider is that even for a given network interface, there can be multiple IP addresses. Something that's not entirely clear to me from reading the design (or the docs) is the intended use of Additionally, there's the question of whether this information belongs in the |
Thank you for the comments, all! @baumanj - while I had originally thought that the interface should be defined on the package level, it makes much more sense to define it on the supervisor level for the reasons you mention - network interface names will almost certainly change across development/deployment platforms. I updated the design doc with this in mind. I am also unsure of the intended use of sys.ip, but given the headaches that users have expressed in #5079, it certainly seems like they are depending on it. I'm happy to rethink this on a larger level, but am current hesitant at the idea of changing it drastically and making these headaches far worse. You are also right that a given network interface may have multiple IP addresses. Not sure how to incorporate that in, but I'm thinking on it (and welcome suggestions :) ) |
I think this indicates a need to understand how |
@christophermaier so I may not have phrased it correctly, but I was referring to the resolved value of
After loading sample-node-app on machine 1, I can query the API and get the following:
Note this is in the Vagrant multi-machine example I have given before, and after loading my second Supervisor configured as follows:
I then get the following (note
Network interfaces on machine 1:
What I find odd here is that if I then use an haproxy package with the configuration set to bind to So what I meant by my original comment above is that |
A quick secondary thought (i'm so sorry for these walls of text); the use case I detailed above uses bindings and concerns network traffic between machines, but I've also seen cases where |
@sirajrauff No need for apologies here. The information you've provided so far is very helpful. |
Briefly discussed this in our morning stand up - here's a summary: We don't know the original intent of sys.ip and sys.gossip_ip - it's not documented and the people who would know have left the project. We do have some insight into how our users are using the two values (as @sirajrauff mentions in his examples in previous comments). I believe we are at the point of needing to decide how users should use the two values - it's only going to get more confusing if we don't have a documented intent/convention. In order to do that, I believe we're at the point of needing help from our product team in defining this intent/convention before we can move to implementation. Sending this link to our product team today. |
My impression (which may be revised with further research!) is that
This falls down with the current implementation because Google's DNS isn't routable in all environments, leading to a fallback value of I'm really not clear why That's my brain dump for now; I'll add more as I think of things 😄 |
Just want to double back here on this comment where I referenced the core-plans Jenkins package:
I modified the plan to listen on |
Re: network interfaces having >1 IP address, we could also allow users to specify an IP address when they specify a network interface. If one is not specified, we can select a default (and return an error if no IP addresses are available for the selected network interface). |
The The There were several strategies for determining Even at the time we suspected that this would be an 80% solution/get it working approach because, as folks on this thread have discovered, you don't always have a route to the public internet. Personally, I thought that a key to getting this Hope this helps, or at least provides folks with a bit of project archeology along the way. Thanks to @eeyun for pointing me this way and letting me walk down memory lane 🥃 |
Hello all! Had a great zoom meeting about this tomorrow. Course of action we agreed on for now is:
|
Closed by #6960 |
Redesign Sys IP Determination and Reporting
This design doc was inspired by the investigation and discussion on this issue.
We currently have a bug where a service will occasionally start reporting it's sys.ip as 127.0.0.1. There may be a few causes, but one we have confirmed is that a service will fall back to reporting 127.0.0.1 as it's ip when Google's DNS (8.8.8.8) is not routable.
This is due to these two sections of the code:
habitat/components/core/src/util/sys.rs
Lines 9 to 16 in 4527c32
This code will use the kernel's routing tables to return the IP address of the interface that can route to '8.8.8.8'
(this is a technique that is detailed in Section 8.14, "Determining Outgoing Interface with UDP", of UNIX Network Programming, Volume 1, 3rd Edition by Stevens et al., the relevant text to which is available here.)
If 8.8.8.8. is not routable, the function will return an error, and we will recover from it later in the code.
habitat/components/sup/src/manager/sys.rs
Lines 38 to 45 in 4527c32
This IP address appears to only be used in templating contexts - through the top-level sys.ip key, as well as the sys.ip key of any svc_member structs (detailed at https://www.habitat.sh/docs/reference/#template-data).
(Thank you to @christophermaier for the initial code tracing and analysis!).
I tried a few approaches to returning the IP address of a host through Rust.
The most viable approach was to use the pnet crate to return the ip address of all network interfaces, though we would need to do some filtering of the network interfaces to determine which IP address we should return, likely similar to the way Ohai does this.
However, a couple of people also pointed out that there may be situations where a service instance will not have an externally resolvable IP address. This happens when a service cluster is running on an internal network - with VMs or containers.
@sirajrauff shared this real world example
In order to account for situations where Google DNS is not routable and/or the service instance has no IP address, we need a way for a user to specify which network interface they want to source their IP from. There should also be a sane default if the user does not specify a network interface - for example, the default route or the route that gets us to the gateway for a default route (taking a similar approach to Ohai's approach.
Motivation
As a Habitat user
I want to be able to specify which network interface my service should use to determine its exposed IP address
So that I can rely on the IP address to be accurate
Including when a service cluster is running on an internal network
Additionally, if I do not specify a network interface
Habitat will select a sane default network interface
For my service to use to determine its exposed IP address
Specification
When a user runs the Habitat supervisor with
They will be able to pass an additional option to select a specific network interface for the Supervisor to use to draw its IP from.
In this case, the sys.ip for any service running under this Supervisor will be the IP for eth0.
If that interface cannot be found, the supervisor will fail with an error (rather than default to 127.0.0.1).
If they do not pass that option when running that command, a default network will be selected to determine the IP address displays as sys.ip. (Perhaps the first interface with external internet access?).
If no interfaces can be found, the supervisor will return an error (rather than default to 127.0.0.1)
Downstream Impact
Some users may see their service's IP address change when they update Habitat to a new version with this functionality in it.
The text was updated successfully, but these errors were encountered: