Skip to content
This repository has been archived by the owner on Apr 27, 2022. It is now read-only.

Check for port in node_connect_network() #474

Closed
henn opened this issue Aug 6, 2015 · 21 comments
Closed

Check for port in node_connect_network() #474

henn opened this issue Aug 6, 2015 · 21 comments
Milestone

Comments

@henn
Copy link
Contributor

henn commented Aug 6, 2015

In node_connect_network(), we currently do not verify that a NIC is connected to a port before adding the connection request to the Network Action Service queue.

According to Ian in #455:

Started working on this, and found there was actually a test that checks for the behavior you're pointing out. Talked to @gsilvis, I think the reasoning at the time was that you should never actually have nics in the DB that aren't connected to the switch, and that this was an administrator error. The driver writes a warning to the log and just noops. I'm not sure I still think this is the right behaviour, but let's not add unrelated api changes to this patch.

I think this is pretty straightforward (even if it breaks compatibility). Anyone disagree? @gsilvis @zenhack

@zenhack
Copy link
Contributor

zenhack commented Aug 7, 2015

I think complaining loudly and as soon as possible is the right thing here (and usually). Should be straightforward to implement.

@henn henn added easy and removed discuss labels Aug 7, 2015
@henn
Copy link
Contributor Author

henn commented May 4, 2016

One tricky part of doing this is that we will want to verify that a NIC is connected to a PORT on all of the real drivers, but not on the mock or null switch drivers we use for testing.

One way around this that keeps the api.py calls driver-neutral would be to check dev_support.py:have_dry_run

@gsilvis
Copy link
Contributor

gsilvis commented May 4, 2016

I don't see any reason to change the behavior here.

@henn
Copy link
Contributor Author

henn commented May 4, 2016

Curious to hear your reasons?

This just bit us in one of our setups -- some folks spent hours trying to debug why their connected nodes couldn't talk to each other.

If we know ahead of time that a NIC is not connected to a PORT (and it's not in dev mode), then why not do the check so that we can tell the user immediately that the operation will fail?

There currently is an error message, but it's in the network service thread output:

WARNING:haas.deferred:Not modifying NIC eno1; NIC is not on a port.

@gsilvis
Copy link
Contributor

gsilvis commented May 4, 2016

...hmm, that's pretty convincing.

How about returning 200, but also including in the body "Warning---this NIC is not connected to any port, and will not provide any network connectivity. Contact your HaaS administrator to get this fixed." or something like that?

Whether the NIC has a PORT can be checked without looking into the driver at all. Now, there's no way to check that the port makes /sense/ without looking into the driver... but I don't know that that's doable at all.

@henn
Copy link
Contributor Author

henn commented May 4, 2016

Would there ever be a case (outside of dry_run) where trying to connect a NIC without a port would succeed? If not, I like returning an error right away rather than delaying it.

@gsilvis
Copy link
Contributor

gsilvis commented May 4, 2016

It "succeeds" fine. It just doesn't do anything.

If we're serious about ports without nics being bad news, then we should structure the API to make it impossible for that to happen in the first place---for instance, having nics and ports be the same thing.

@henn henn added this to the v0.2 milestone May 10, 2016
@shuwens
Copy link
Contributor

shuwens commented Jun 2, 2016

IMO the node_connect_network should only succeed when the work is done, s.t. the user will be convinced the node is successfully attached and move on.

I am going to work on this and submit a fix.

@henn
Copy link
Contributor Author

henn commented Jun 2, 2016

I agree with @shwsun - we shouldn't return success if we know there's an error.

@gsilvis - is this something you feel strongly about?

@henn
Copy link
Contributor Author

henn commented Jun 2, 2016

I also agree with @gsilvis that we should prevent this. Maybe we can use the revert_port logic in #567 to test a port after it's been added?

@gsilvis
Copy link
Contributor

gsilvis commented Jun 2, 2016

node_connect_network cannot wait until the work is done. We very deliberately made the API asynchronous---otherwise it is far too slow. We do need to add something you can poll, to see if it's done---for instance, running a GET on the NIC could return a "status: attaching" during the process.

@gsilvis
Copy link
Contributor

gsilvis commented Jun 2, 2016

Also not returning when we know there's an error, and not returning until the work is done, are VERY different. I'm not convinced you two are agreeing right now.

@gsilvis
Copy link
Contributor

gsilvis commented Jun 2, 2016

I'm going to repeat my recommendation: if we eliminate the distinction between ports and nics, and have each nic have a database field representing the port information, none of this is a problem anymore, and the API is simpler too

@shuwens
Copy link
Contributor

shuwens commented Jun 2, 2016

I see @gsilvis 's point now. @SahilTikale also suggest that having a way to poll the status later may be necessary. The "eliminate distinction between nics and ports" seems to be a good one, though I doubt we will be able to do it right now.

However I still believe that we should not add job to the queue if it will definitely fail, i.e. Nic is not connected to a port.

@gsilvis
Copy link
Contributor

gsilvis commented Jun 2, 2016

+1 for immediately checking that the NIC is attached, as a maybe only short-term solution. But I'm going to keep bringing the other option up, whenever we talk about this :)

@shuwens
Copy link
Contributor

shuwens commented Jun 2, 2016

Great that we finally sort it out! Jason has already started work on the other option :P

I will start to work on this issue.

@matsuuran
Copy link
Contributor

hey @shwsun just wanted to see if you had made it any where on this

@shuwens
Copy link
Contributor

shuwens commented Jul 7, 2016

Fix this is quite easy, but seems like it will break a lot of tests in tests/unitauth.py and main.py. I will fix the tests after revert port is done.

@henn
Copy link
Contributor Author

henn commented Jul 7, 2016

@shwsun discovered that this will also require some adjustments to the tests, including at least tests/unit/api/main.py.

Since there are so many node_connect_network() calls in there, and a lot of repeated boilerplate, it might make sense to define a fixture/setup function (described here), as was done for auth.py.

@shuwens shuwens mentioned this issue Jul 12, 2016
4 tasks
@shuwens
Copy link
Contributor

shuwens commented Sep 16, 2016

@knikolla could you please close it? #625 is already merged. thx

@matsuuran
Copy link
Contributor

closing, fix is in #625

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants