-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLOUDSTACK-9397: Add Watchdog timer to KVM Instance #1707
Conversation
@blueorangutan package |
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
Packaging result: ✖centos6 ✖centos7 ✖debian. JID-140 |
@blueorangutan package |
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-142 |
@blueorangutan package |
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-196 |
I fixed the merge conflict of this PR and it should cleanly merge now. |
Thanks @wido I'll kick some tests |
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
Packaging result: ✖centos6 ✔centos7 ✖debian. JID-337 |
@blueorangutan package |
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
Packaging result: ✔centos6 ✔centos7 ✖debian. JID-342 |
@blueorangutan test |
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
Trillian test result (tid-614)
|
@blueorangutan package |
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-365 |
@blueorangutan test |
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
Trillian test result (tid-638)
|
I rebased the code against master, merges again. Tests are looking good. |
@wido I can run another round of testing, for comparison I'm using the #2225 test results as baseline for comparison |
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-1103 |
@blueorangutan test centos7 kvm-centos6 |
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos6) has been kicked to run smoke tests |
Trillian test result (tid-1529)
|
@borisstoyanov can you check failures related to libvirt tests on kvm/centos6, however those failures are not seen in kvm/centos7? /cc @wido |
Is virtio-scsi supported in CentOS 6 by libvirt? I doubt it. |
According to Redhat, it's supported in RHEL6 U4, so it's quite possible it's supported in Centos 6.4 and beyond, but we haven't tested it. All of our testing was Centos 7.x. |
@wido I think Trillian may be using CentOS 6.8 for buildilng kvm hosts, @borisstoyanov @PaulAngus can you comment? |
Understood! It was my only explanation I could give for this error. Right now I don't know anything else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that these tests fail at CentOS release 6.8 (Final) but pass on CentOS 7... Looks like the discard attribute is missing from the XML on CentOS-6. Please let me know if you need me to retest this once we have a fix. Test failures are observed in the #2225 PR as well (smoke tests on master)
Comparing a recent smoketest result against centos6-kvm I don't see an issue with failing tests, they are failing on the baseline as well. However I do see additional failures around routers, isos etc that diverge from the baseline. We may run another round. |
@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-1110 |
@blueorangutan test centos7 kvm-centos6 |
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos6) has been kicked to run smoke tests |
@blueorangutan test |
@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
Trillian test result (tid-1538)
|
Trillian test result (tid-1539)
|
Ignoring test failures on centos6, LGTM. @borisstoyanov @wido @kiwiflyer what do you think, should we merge this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I personally think we can merge it. But hey, it's my PR, I can't make that decision. |
@wido can you comment on the potential NPE issue, see the code/comment? |
@rhtyd: I already did? I added a if-statement around the check in case libvirt doesn't return anything. Which is highly unlikely |
Alright based on test results and LGTMs presented on this PR, I'll merge this. If we hit regressions or error reports, we might revisit this (unlikely). |
…0-scclouds' Apresentar _disk offering_ utilizada para o _root disk_ no _wizard_ de _deploy_ de VMs Closes apache#1707 See merge request scclouds/scclouds!722
The watchdog timer adds functionality where the Hypervisor can detect if an
instance has crashed or stopped functioning.
When the Instance has the 'watchdog' daemon running it will send heartbeats
to the /dev/watchdog device.
If these heartbeats are no longer received by the HV it will reset the Instance.
If the Instance never sends the heartbeats the HV does not take action. It only
takes action if it stops sending heartbeats.
This is supported since Libvirt 0.7.3 and can be defined in the XML format as
described in the docs: https://libvirt.org/formatdomain.html#elementsWatchdog
To the 'devices' section this will be added:
In the agent.properties the action to be taken can be defined:
vm.watchdog.action=reset
The same goes for the model. The Intel i6300esb is however the most commonly used.
vm.watchdog.model=i6300esb