-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provision error logs #310
Provision error logs #310
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's just a draft and it'll be easier to review when the previous stuff lands, but I wanted to go ahead and start taking a look. Just a few comments for now but I'll take a deeper look later as it progresses.
device-connectors/src/testflinger_device_connectors/devices/__init__.py
Outdated
Show resolved
Hide resolved
c90beb3
to
6d31940
Compare
6d31940
to
d91c98f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to fill in the PR details, especially doc/test sections
device-connectors/src/testflinger_device_connectors/devices/maas2/tests/test_maas.py
Outdated
Show resolved
Hide resolved
device-connectors/src/testflinger_device_connectors/devices/maas2/maas2.py
Outdated
Show resolved
Hide resolved
device-connectors/src/testflinger_device_connectors/devices/__init__.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the process of trying to do some testing with a real maas device using this change, I encountered something strange. I decided to see if giving it a bad image name would be enough to make it fail properly. The provisioning step definitely failed... but then it kept on going through the job as if nothing had happened, when it should have stopped at that point. I switched back to main and confirmed that main still has the correct behavior.
I haven't had a chance to run down what happened exactly, but I think we need to do some more testing and figure that out
@val500 Ok, I looked into this a bit more and I see the problem. In init.py of the maas2 device connector (and possibly others, you will want to look at those also!), we seem to specifically intercept the ProvisioningError so that we can add a nice log message about it, but then instead of re-raising it, we log an error. IIRC this was likely intended to avoid getting a lot of extra debug output for common issues, but in this case we need to expose that back up to the exception handler that will write the output so that it gets propagated to the agent. We don't want to show the full traceback in the user logs if we can avoid it - an nicer error message there would suffice. But we still need to write that output file with the full details. |
4fd3707
to
63f4a83
Compare
63f4a83
to
7dbf99a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, this seems to be working for me now. I've tried it with the maas connector and also with some others like muxpi since those had to be altered. I also tried forcing a recovery error to ensure that this still works properly so I think it's doing the right things now. The one thing I did notice is that if the error output is REALLY long, then the UI shows up in kind of a strange way on the test-observer side. We might be able to iterate on distilling it down to the more usable message or alternatively, making it display nicer when this happens, but I think that will be for another time. It'll be more clear once we see what happens in real failure situations.
+1 thanks!
Description
This adds logging of provisioning errors in the device connector to be read by the Testflinger agent and propagated to the event log. The device connector logs errors for each phase in a file, device-connector-error.json, which is read by the agent. This JSON takes the following format:
Resolved issues
Resolves https://warthogs.atlassian.net/browse/CERTTF-359
Documentation
Tests
Adds new tests to the device connector which mocks the MAAS CLI to throw an exception and adds tests to the agent which mocks the running of the provisioning command.