Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLOUDSTACK-9993: Securing Agents Communications #2239

Merged
merged 3 commits into from
Aug 28, 2017

Conversation

rohityadavcloud
Copy link
Member

@rohityadavcloud rohityadavcloud commented Aug 18, 2017

This introduces a new certificate authority framework that allows
pluggable CA provider implementations to handle certificate operations
around issuance, revocation and propagation. The framework injects
itself to NioServer to handle agent connections securely. The
framework adds assumptions in NioClient that a keystore if available
with known name cloud.jks will be used for SSL negotiations and
handshake.

FS: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Secure+Agent+Communications

This includes a default 'root' CA provider plugin which creates its own
self-signed root certificate authority on first run and uses it for
issuance and provisioning of certificate to CloudStack agents such as
the KVM, CPVM and SSVM agents and also for the management server for
peer clustering.

Additional changes and notes:

  • Comma separate list of management server IPs can be set to the 'host'
    global setting. Newly provisioned agents (KVM/CPVM/SSVM etc) will get
    radomized comma separated list to which they will attempt connection
    or reconnection in provided order. This removes need of a TCP LB on
    port 8250 (default) of the management server(s).
  • All fresh deployment will enforce two-way SSL authentication where
    connecting agents will be required to present certificates issued
    by the 'root' CA plugin.
  • Existing environment on upgrade will continue to use one-way SSL
    authentication and connecting agents will not be required to present
    certificates.
  • A script keystore-setup is responsible for initial keystore setup
    and CSR generation on the agent/hosts.
  • A script keystore-cert-import is responsible for import provided
    certificate payload to the java keystore file.
  • Agent security (keystore, certificates etc) are setup initially using
    SSH, and later provisioning is handled via an existing agent connection
    using command-answers. The supported clients and agents are limited to
    CPVM, SSVM, and KVM agents, and clustered management server (peering).
  • Certificate revocation does not revoke an existing agent-mgmt server
    connection, however rejects a revoked certificate used during SSL
    handshake.
  • Older cloudstackmanagement.keystore is deprecated and will no longer
    be used by mgmt server(s) for SSL negotiations and handshake. New
    keystores will be named cloud.jks, any additional SSL certificates
    should not be imported in it for use with tomcat etc. The cloud.jks
    keystore is stricly used for agent-server communications.
  • Management server keystore are validated and renewed on start up only,
    the validity of them are same as the CA certificates.

New APIs:

  • listCaProviders: lists all available CA provider plugins
  • listCaCertificate: lists the CA certificate(s)
  • issueCertificate: issues X509 client certificate with/without a CSR
  • provisionCertificate: provisions certificate to a host
  • revokeCertificate: revokes a client certificate using its serial

Global settings for the CA framework:

  • ca.framework.provider.plugin: The configured CA provider plugin
  • ca.framework.cert.keysize: The key size for certificate generation
  • ca.framework.cert.signature.algorithm: The certificate signature algorithm
  • ca.framework.cert.validity.period: Certificate validity in days
  • ca.framework.cert.automatic.renewal: Certificate auto-renewal setting
  • ca.framework.background.task.delay: CA background task delay/interval
  • ca.framework.cert.expiry.alert.period: Days to check and alert expiring certificates

Global settings for the default 'root' CA provider:

  • ca.plugin.root.private.key: (hidden/encrypted) CA private key
  • ca.plugin.root.public.key: (hidden/encrypted) CA public key
  • ca.plugin.root.ca.certificate: (hidden/encrypted) CA certificate
  • ca.plugin.root.issuer.dn: The CA issue distinguished name
  • ca.plugin.root.auth.strictness: Are clients required to present certificates
  • ca.plugin.root.allow.expired.cert: Are clients with expired certificates allowed

UI changes:

  • Button to download/save the CA certificates

@rohityadavcloud
Copy link
Member Author

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-961

@rohityadavcloud
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-962

@rohityadavcloud rohityadavcloud force-pushed the new-ca-framework branch 2 times, most recently from 78c740c to 2a67c3c Compare August 18, 2017 19:10
@rohityadavcloud
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-963

@rohityadavcloud
Copy link
Member Author

@blueorangutan test

@blueorangutan
Copy link

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1380)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 39106 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2239-t1380-kvm-centos7.zip
Intermitten failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermitten failure detected: /marvin/tests/smoke/test_vm_life_cycle.py
Intermitten failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Intermitten failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Test completed. 56 look OK, 2 have error(s)

Test Result Time (s) Test File
test_01_vpc_remote_access_vpn Failure 55.64 test_vpc_vpn.py
test_04_rvpc_privategw_static_routes Failure 379.53 test_privategw_acl.py
test_change_service_offering_for_vm_with_snapshots Skipped 0.00 test_vm_snapshots.py
test_09_copy_delete_template Skipped 0.01 test_templates.py
test_06_copy_template Skipped 0.00 test_templates.py
test_static_role_account_acls Skipped 0.01 test_staticroles.py
test_11_ss_nfs_version_on_ssvm Skipped 0.01 test_ssvm.py
test_01_scale_vm Skipped 0.00 test_scale_vm.py
test_01_primary_storage_iscsi Skipped 0.03 test_primary_storage.py
test_vm_nic_adapter_vmxnet3 Skipped 0.00 test_nic_adapter_type.py
test_nested_virtualization_vmware Skipped 0.00 test_nested_virtualization.py
test_06_copy_iso Skipped 0.00 test_iso.py
test_deploy_vgpu_enabled_vm Skipped 0.03 test_deploy_vgpu_enabled_vm.py
test_3d_gpu_support Skipped 0.03 test_deploy_vgpu_enabled_vm.py

@rohityadavcloud
Copy link
Member Author

@blueorangutan test centos6 vmware-55u3

@blueorangutan
Copy link

@rhtyd a Trillian-Jenkins test job (centos6 mgmt + vmware-55u3) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1381)
Environment: vmware-55u3 (x2), Advanced Networking with Mgmt server 6
Total time taken: 48058 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2239-t1381-vmware-55u3.zip
Intermitten failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermitten failure detected: /marvin/tests/smoke/test_routers_network_ops.py
Intermitten failure detected: /marvin/tests/smoke/test_volumes.py
Intermitten failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Test completed. 54 look OK, 4 have error(s)

Test Result Time (s) Test File
test_01_vpc_remote_access_vpn Failure 157.12 test_vpc_vpn.py
test_01_create_volume Failure 193.15 test_volumes.py
test_02_RVR_Network_FW_PF_SSH_default_routes_egress_false Failure 528.14 test_routers_network_ops.py
test_01_RVR_Network_FW_PF_SSH_default_routes_egress_true Failure 515.88 test_routers_network_ops.py
test_04_rvpc_privategw_static_routes Failure 855.26 test_privategw_acl.py
test_08_resize_volume Skipped 5.13 test_volumes.py
test_07_resize_fail Skipped 15.29 test_volumes.py
test_09_copy_delete_template Skipped 0.02 test_templates.py
test_06_copy_template Skipped 0.00 test_templates.py
test_static_role_account_acls Skipped 0.02 test_staticroles.py
test_11_ss_nfs_version_on_ssvm Skipped 0.02 test_ssvm.py
test_01_scale_vm Skipped 66.44 test_scale_vm.py
test_01_primary_storage_iscsi Skipped 0.04 test_primary_storage.py
test_vm_nic_adapter_vmxnet3 Skipped 0.00 test_nic_adapter_type.py
test_06_copy_iso Skipped 0.00 test_iso.py
test_06_verify_guest_lspci_again Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_05_change_vm_ostype_restart Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_04_verify_guest_lspci Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_03_verify_libvirt_attach_disk Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_02_verify_libvirt_after_restart Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_01_verify_libvirt Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_deploy_vgpu_enabled_vm Skipped 1.29 test_deploy_vgpu_enabled_vm.py

@rohityadavcloud
Copy link
Member Author

The one test failing create_volume is not related to this PR but general intermittent failures, include those rvpc/rvr failures and are ignorable.

The failure was caused due to;

sshClient: DEBUG:  Host: 10.1.34.164 Cmd: /sbin/fdisk -l | grep Disk Output:{'status': 'SUCCESS', 'stdin': None, 'stderr': [u"Disk /dev/sda doesn't contain a valid partition table\n"], 'stdout': [u'Disk /dev/hdb: 2147 MB, 2147483648 bytes\n', u'Disk /dev/sda: 1073 MB, 1073741824 bytes\n']}
test_01_create_volume (tests.smoke.test_volumes.TestCreateVolume): DEBUG:  Volume Size Expected 1073741824  Actual :Volume Not Found

@wido
Copy link
Contributor

wido commented Aug 21, 2017

What I am missing is that it's enforced/mandatory for all Agents, right?

What if somebody doesn't want to enable this and simply wants to run unencrypted. Why shouldn't we make a opt-out somewhere?

@rohityadavcloud
Copy link
Member Author

rohityadavcloud commented Aug 21, 2017

@wido in current implementation -- all agent-mgmt server connections are encrypted and SSL enabled, based on a random cert that the mgmt server creates, stores and uses from cloudmanagement.keystore reading/updating from ssl.keystore global setting; and when agents connect to the mgmt server they use a trust-all-manager to trust any certificate presented to them. So, all cloudstack env have the connections encrypted, however not authenticated and secured in one or two-way SSL.

With this new work/PR -- for existing environments, after upgrade the auth strictness enforcement will be 'false, however newer hosts/agents will be provisioned to use this new system to have CA certs etc stored in agents so they will have more secure SSL authentication, however, mgmt server will not perform additional trust validations and it will allow any clients (like it used to now).

@wido
Copy link
Contributor

wido commented Aug 21, 2017

Ok, good! Because I know many (like us) do not use SSH from the mgmt server to add a new KVM host, but simply generate a agent.properties and add it to the mgmt server.

@rohityadavcloud
Copy link
Member Author

@wido okay, so for you environment just keep ca.plugin.root.auth.strictness set to false for both new and existing cloudstack environments

@rohityadavcloud
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-1012

@rohityadavcloud
Copy link
Member Author

@blueorangutan test centos7 xenserver-65sp1

@blueorangutan
Copy link

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + xenserver-65sp1) has been kicked to run smoke tests

throw new CloudRuntimeException("Failed to find default configured CA provider plugin");
}
return configuredCaProvider;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about refactoring method a little bit to something like

if (configuredCaProvider != null) return configuredCaProvider
else if (caProviderMap.containsKey(..) && caProviderMap.get(..) != null) return caProviderMap.get(..);
else throw new CloudRuntimeException("..")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvazquez sure, however if the second else if, we want to set and then return, it would be the same kind of code again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, sorry I ommited set line, then makes no sense to refactor in my opinion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvazquez thanks, anyway I made a change

Copy link
Contributor

@nvazquez nvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great @rhtyd, in general code LGTM

@rohityadavcloud
Copy link
Member Author

Thanks @DaanHoogland @nvazquez for reviewing

Copy link
Contributor

@borisstoyanov borisstoyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I've tested this in environment with kvm, xen and vmware hosts with 3 management servers. It works as expected and there are no regressions with other type of hosts. User is able to use the new APIs as designed.

@rohityadavcloud
Copy link
Member Author

Thanks @borisstoyanov , I'll keep this open for now until we receive the xenserver smoke test results. I'll merge this on satisfactory results.

@blueorangutan
Copy link

Trillian test result (tid-1420)
Environment: xenserver-65sp1 (x2), Advanced Networking with Mgmt server 7
Total time taken: 47551 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2239-t1420-xenserver-65sp1.zip
Intermitten failure detected: /marvin/tests/smoke/test_certauthority_root.py
Intermitten failure detected: /marvin/tests/smoke/test_iso.py
Intermitten failure detected: /marvin/tests/smoke/test_password_server.py
Intermitten failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermitten failure detected: /marvin/tests/smoke/test_routers_network_ops.py
Intermitten failure detected: /marvin/tests/smoke/test_routers.py
Intermitten failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Intermitten failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Test completed. 52 look OK, 6 have error(s)

Test Result Time (s) Test File
test_01_vpc_remote_access_vpn Failure 126.54 test_vpc_vpn.py
test_05_rvpc_multi_tiers Failure 509.32 test_vpc_redundant.py
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Failure 471.29 test_vpc_redundant.py
test_01_RVR_Network_FW_PF_SSH_default_routes_egress_true Failure 437.19 test_routers_network_ops.py
test_04_rvpc_privategw_static_routes Failure 658.11 test_privategw_acl.py
test_05_iso_permissions Failure 0.06 test_iso.py
test_02_edit_iso Failure 0.07 test_iso.py
test_03_vpc_privategw_restart_vpc_cleanup Error 536.09 test_privategw_acl.py
runTest Error 0.00 test_certauthority_root.py
test_09_copy_delete_template Skipped 0.02 test_templates.py
test_06_copy_template Skipped 0.00 test_templates.py
test_static_role_account_acls Skipped 0.02 test_staticroles.py
test_11_ss_nfs_version_on_ssvm Skipped 0.02 test_ssvm.py
test_vm_nic_adapter_vmxnet3 Skipped 0.00 test_nic_adapter_type.py
test_nested_virtualization_vmware Skipped 0.00 test_nested_virtualization.py
test_06_copy_iso Skipped 0.00 test_iso.py
test_06_verify_guest_lspci_again Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_05_change_vm_ostype_restart Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_04_verify_guest_lspci Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_03_verify_libvirt_attach_disk Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_02_verify_libvirt_after_restart Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_01_verify_libvirt Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_deploy_vgpu_enabled_vm Skipped 0.03 test_deploy_vgpu_enabled_vm.py
test_3d_gpu_support Skipped 0.03 test_deploy_vgpu_enabled_vm.py

@rohityadavcloud
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-1023

@rohityadavcloud
Copy link
Member Author

@blueorangutan test centos7 xenserver-65sp1

@blueorangutan
Copy link

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + xenserver-65sp1) has been kicked to run smoke tests

@mlsorensen
Copy link
Contributor

@rhtyd it would still be possible for @wido to use the new secure system without adding hosts via the management server, correct? They could generate a CSR on the host and call issueCertificate, then install it during the process that creates agent.properties?

@rohityadavcloud
Copy link
Member Author

rohityadavcloud commented Aug 25, 2017

@mlsorensen yes if @wido wants to use the new secure system he can (1) create a keystore at /etc/cloudstack/agent/cloud.jks with a 'cloud' alias and password at agent.properties file, and (2) create a CSR using the keystore, (3) use the CSR to issue a certificate from management server and save the certificate and ca-certs and import them into the keystore, (4) add/start the agent to a zone/cluster.
The utility scripts will be installed as part of cloudstack-common package and will be put in systemvm.iso for systemvms, and available at: /usr/share/cloudstack-common/scripts/util/{keystore-cert-import, keystore-setup}

I'll document the usage of these scripts, in admin docs or somewhere, briefly here's how these scripts work: (the command name and options in <> brackets)

keystore-setup <the agent/db properties filepath> <ks/cert validity in number of days>
This script will save and output the csr, save the keystore passphrase in the properties file and also create a cloud.jks.new keystore file.

After the certificate (ca+client) are created, they are stored in the keystore file using keystore-import-cert script:
keystore-cert-import <final keystore filename, give cloud.jks not cloud.jks.new> <application mode: ssh|agent, only in ssh mode the agent/cloud service is restarted, ex. for systemvms> <the certificate content/base-64 with newlines replaced by ^ and spaces replaced by ~> <the ca-cert content, with char-replacements same as the cert content> <the private key file path, this is useful only when csr was not used/created> <the private key content, with same char-replacement scheme>

Running the import-script would save the certs in the keystore file cloud.jks.new and rename it to cloud.jks. The .new is used to act as a two-phase commit approach, so in case during provisioning (for example, a live system using provisionCertificate API) if anything goes bad, it won't affect an existing keystore. Lastly, in addition to the keystore file, the certificates and keys are also stored in usual x509/pem formats with chmod 600 applied to them, at the agent conf directory for usage with other services.

@mlsorensen
Copy link
Contributor

Thanks. I realize the initial question was more along the lines of "will this break how we do things now?", but it's good to know that it can be incorporated into that mode of operation as well if people choose.

@blueorangutan
Copy link

Trillian test result (tid-1424)
Environment: xenserver-65sp1 (x2), Advanced Networking with Mgmt server 7
Total time taken: 49204 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr2239-t1424-xenserver-65sp1.zip
Intermitten failure detected: /marvin/tests/smoke/test_certauthority_root.py
Intermitten failure detected: /marvin/tests/smoke/test_iso.py
Intermitten failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermitten failure detected: /marvin/tests/smoke/test_routers_network_ops.py
Intermitten failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Intermitten failure detected: /marvin/tests/smoke/test_vpc_vpn.py
Test completed. 52 look OK, 6 have error(s)

Test Result Time (s) Test File
test_01_vpc_remote_access_vpn Failure 126.29 test_vpc_vpn.py
test_05_rvpc_multi_tiers Failure 503.61 test_vpc_redundant.py
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Failure 481.21 test_vpc_redundant.py
test_02_RVR_Network_FW_PF_SSH_default_routes_egress_false Failure 444.19 test_routers_network_ops.py
test_01_RVR_Network_FW_PF_SSH_default_routes_egress_true Failure 463.83 test_routers_network_ops.py
test_04_rvpc_privategw_static_routes Failure 867.03 test_privategw_acl.py
test_05_iso_permissions Failure 0.05 test_iso.py
test_02_edit_iso Failure 0.07 test_iso.py
runTest Error 0.00 test_certauthority_root.py
test_09_copy_delete_template Skipped 0.02 test_templates.py
test_06_copy_template Skipped 0.00 test_templates.py
test_static_role_account_acls Skipped 0.02 test_staticroles.py
test_11_ss_nfs_version_on_ssvm Skipped 0.02 test_ssvm.py
test_vm_nic_adapter_vmxnet3 Skipped 0.00 test_nic_adapter_type.py
test_nested_virtualization_vmware Skipped 0.00 test_nested_virtualization.py
test_06_copy_iso Skipped 0.00 test_iso.py
test_06_verify_guest_lspci_again Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_05_change_vm_ostype_restart Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_04_verify_guest_lspci Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_03_verify_libvirt_attach_disk Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_02_verify_libvirt_after_restart Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_01_verify_libvirt Skipped 0.00 test_deploy_virtio_scsi_vm.py
test_deploy_vgpu_enabled_vm Skipped 0.03 test_deploy_vgpu_enabled_vm.py
test_3d_gpu_support Skipped 0.06 test_deploy_vgpu_enabled_vm.py

@cloudmonger
Copy link

ACS CI BVT Run

Sumarry:
Build Number 1149
Hypervisor xenserver
NetworkType Advanced
Passed=107
Failed=5
Skipped=12

Link to logs Folder (search by build_no): https://www.dropbox.com/sh/r2si930m8xxzavs/AAAzNrnoF1fC3auFrvsKo_8-a?dl=0

Failed tests:

  • test_list_ids_parameter.py

  • ContextSuite context=TestListIdsParams>:setup Failing since 53 runs

  • test_routers_network_ops.py

  • test_01_isolate_network_FW_PF_default_routes_egress_true Failing since 14 runs

  • test_02_isolate_network_FW_PF_default_routes_egress_false Failing since 141 runs

  • test_01_RVR_Network_FW_PF_SSH_default_routes_egress_true Failing since 136 runs

  • test_02_RVR_Network_FW_PF_SSH_default_routes_egress_false Failing since 136 runs

Skipped tests:
test_vm_nic_adapter_vmxnet3
test_01_verify_libvirt
test_02_verify_libvirt_after_restart
test_03_verify_libvirt_attach_disk
test_04_verify_guest_lspci
test_05_change_vm_ostype_restart
test_06_verify_guest_lspci_again
test_static_role_account_acls
test_11_ss_nfs_version_on_ssvm
test_nested_virtualization_vmware
test_3d_gpu_support
test_deploy_vgpu_enabled_vm

Passed test suits:
test_deploy_vm_with_userdata.py
test_affinity_groups_projects.py
test_portable_publicip.py
test_vm_snapshots.py
test_over_provisioning.py
test_global_settings.py
test_router_dnsservice.py
test_scale_vm.py
test_service_offerings.py
test_routers_iptables_default_policy.py
test_loadbalance.py
test_routers.py
test_reset_vm_on_reboot.py
test_deploy_vms_with_varied_deploymentplanners.py
test_network.py
test_router_dns.py
test_non_contigiousvlan.py
test_login.py
test_deploy_vm_iso.py
test_public_ip_range.py
test_multipleips_per_nic.py
test_metrics_api.py
test_regions.py
test_affinity_groups.py
test_network_acl.py
test_pvlan.py
test_volumes.py
test_nic.py
test_deploy_vm_root_resize.py
test_resource_detail.py
test_secondary_storage.py
test_vm_life_cycle.py
test_disk_offerings.py

This introduces a new certificate authority framework that allows
pluggable CA provider implementations to handle certificate operations
around issuance, revocation and propagation. The framework injects
itself to `NioServer` to handle agent connections securely. The
framework adds assumptions in `NioClient` that a keystore if available
with known name `cloud.jks` will be used for SSL negotiations and
handshake.

This includes a default 'root' CA provider plugin which creates its own
self-signed root certificate authority on first run and uses it for
issuance and provisioning of certificate to CloudStack agents such as
the KVM, CPVM and SSVM agents and also for the management server for
peer clustering.

Additional changes and notes:
- Comma separate list of management server IPs can be set to the 'host'
  global setting. Newly provisioned agents (KVM/CPVM/SSVM etc) will get
  radomized comma separated list to which they will attempt connection
  or reconnection in provided order. This removes need of a TCP LB on
  port 8250 (default) of the management server(s).
- All fresh deployment will enforce two-way SSL authentication where
  connecting agents will be required to present certificates issued
  by the 'root' CA plugin.
- Existing environment on upgrade will continue to use one-way SSL
  authentication and connecting agents will not be required to present
  certificates.
- A script `keystore-setup` is responsible for initial keystore setup
  and CSR generation on the agent/hosts.
- A script `keystore-cert-import` is responsible for import provided
  certificate payload to the java keystore file.
- Agent security (keystore, certificates etc) are setup initially using
  SSH, and later provisioning is handled via an existing agent connection
  using command-answers. The supported clients and agents are limited to
  CPVM, SSVM, and KVM agents, and clustered management server (peering).
- Certificate revocation does not revoke an existing agent-mgmt server
  connection, however rejects a revoked certificate used during SSL
  handshake.
- Older `cloudstackmanagement.keystore` is deprecated and will no longer
  be used by mgmt server(s) for SSL negotiations and handshake. New
  keystores will be named `cloud.jks`, any additional SSL certificates
  should not be imported in it for use with tomcat etc. The `cloud.jks`
  keystore is stricly used for agent-server communications.
- Management server keystore are validated and renewed on start up only,
  the validity of them are same as the CA certificates.

New APIs:
- listCaProviders: lists all available CA provider plugins
- listCaCertificate: lists the CA certificate(s)
- issueCertificate: issues X509 client certificate with/without a CSR
- provisionCertificate: provisions certificate to a host
- revokeCertificate: revokes a client certificate using its serial

Global settings for the CA framework:
- ca.framework.provider.plugin: The configured CA provider plugin
- ca.framework.cert.keysize: The key size for certificate generation
- ca.framework.cert.signature.algorithm: The certificate signature algorithm
- ca.framework.cert.validity.period: Certificate validity in days
- ca.framework.cert.automatic.renewal: Certificate auto-renewal setting
- ca.framework.background.task.delay: CA background task delay/interval
- ca.framework.cert.expiry.alert.period: Days to check and alert expiring certificates

Global settings for the default 'root' CA provider:
- ca.plugin.root.private.key: (hidden/encrypted) CA private key
- ca.plugin.root.public.key: (hidden/encrypted) CA public key
- ca.plugin.root.ca.certificate: (hidden/encrypted) CA certificate
- ca.plugin.root.issuer.dn: The CA issue distinguished name
- ca.plugin.root.auth.strictness: Are clients required to present certificates
- ca.plugin.root.allow.expired.cert: Are clients with expired certificates allowed

UI changes:
- Button to download/save the CA certificates.

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
- Upgrades bountycastle version and uses newer classes
- Refactors SAMLUtil to use new CertUtils

Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
@rohityadavcloud
Copy link
Member Author

The XenServer env related failure was caused by centos7 based packging/dependencies issues where pyopessl.crypto may not have a load_publickey method available. I've put a fix in the marvin test to skip when pyopenssl library does not export such a method, with that all tests pass now:

=== TestName: test_issue_certificate_with_csr | Status : SUCCESS ===

=== TestName: test_issue_certificate_without_csr | Status : SUCCESS ===

=== TestName: test_list_ca_certificate | Status : SUCCESS ===

=== TestName: test_list_ca_providers | Status : SUCCESS ===

=== TestName: test_provision_certificate | Status : SUCCESS ===

=== TestName: test_revoke_certificate | Status : SUCCESS ===

With enough test results and code reviews, this is ready for merging. I'll wait for Travis to go green and then merge this feature.

@rohityadavcloud rohityadavcloud merged commit 7ce54bf into apache:master Aug 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants