Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding connecting cluster nodes to leader if autounseal is set to false #58

Closed
rhinck opened this issue Jul 31, 2021 · 12 comments · Fixed by #62
Closed

Question regarding connecting cluster nodes to leader if autounseal is set to false #58

rhinck opened this issue Jul 31, 2021 · 12 comments · Fixed by #62
Assignees
Labels
enhancement New feature or request question Further information is requested
Milestone

Comments

@rhinck
Copy link

rhinck commented Jul 31, 2021

If the value autounseal is set to false, does that require me to SSH into every Vault node that is not the leader and try to link them to the leader node?

@binlab
Copy link
Owner

binlab commented Jul 31, 2021

To be honest, this case was not tested well. I did a little testing when I just started developing the module. Theoretically, the first time of initialization it will be possible to enter an unseal token via UI, but after the node is marked as healthy, ALB will not switch traffic to it. So far I can not say for sure whether the Raft algorithm from Vault provides for the distribution of the unseal token to the entire cluster, but most likely not

@binlab
Copy link
Owner

binlab commented Jul 31, 2021

If after testing it becomes clear that this is not possible, in addition to SSH, you can also get access using public access to the nodes, but this is not very secure since the traffic will not be encrypted. It is also possible to use another Vault for unseal, but this is already a chicken and egg issue. Perhaps it makes sense to use other clouds such as GCP or Azure

@binlab
Copy link
Owner

binlab commented Jul 31, 2021

May I ask why there was such a need to refuse to unseal with AWS KMS?

@binlab binlab added the question Further information is requested label Jul 31, 2021
@rhinck
Copy link
Author

rhinck commented Aug 1, 2021

Thanks for the insight!

The main reason I was asking was that before I put Vault into production use, I wanted to test if I was successfully able to do cluster backup and restores. Using autounseal = true, I'm able to do a restore if the backup is from the same cluster. If I run terraform destroy and delete the cluster, create a new one, and then try to do a restore from the old cluster's backup file, I am unable to unseal the Vault. The unseal keys from the old cluster don't work.

Using autounseal = false, I was able to restore the backup of the old cluster onto the new one and unseal it with the old cluster's sharded keys. I am experiencing intermittent issues where the Vault will randomly seal at times and is reporting that the leader is the only node in the cluster although I have specified 3 nodes in the module.

If I understand correctly, with autounseal = true I would need to have the new cluster point to the old AWS KMS key in order to use the old cluster's backup, right? Here's a comment on a vault issue where I got the info from: hashicorp/vault#6046 (comment)

If so, do you know how I could implement using an existing AWS KMS key with the module?

@binlab binlab added the enhancement New feature or request label Aug 1, 2021
@binlab binlab linked a pull request Aug 1, 2021 that will close this issue
@binlab
Copy link
Owner

binlab commented Aug 1, 2021

Thanks for reporting!

The main reason I was asking was that before I put Vault into production use, I wanted to test if I was successfully able to do cluster backup and restores. Using autounseal = true, I'm able to do a restore if the backup is from the same cluster. If I run terraform destroy and delete the cluster, create a new one, and then try to do a restore from the old cluster's backup file, I am unable to unseal the Vault. The unseal keys from the old cluster don't work.

After Terraform destroys the KMS key (inside the module) will be marked as deleted but can be restored in some period of time (10 days by module configuration, maybe need to provide an option to configure this value). So you have a chance to restore data from snapshots or EBS snapshots

Using autounseal = false, I was able to restore the backup of the old cluster onto the new one and unseal it with the old cluster's sharded keys. I am experiencing intermittent issues where the Vault will randomly seal at times and is reporting that the leader is the only node in the cluster although I have specified 3 nodes in the module.

For a cloud deployment with more than 1 node autounseal = false is not so good solution, but you can configure "cluster" with one node and autounseal = false, and use this just to unseal another cluster. Looks like not for production but should work.

If I understand correctly, with autounseal = true I would need to have the new cluster point to the old AWS KMS key in order to use the old cluster's backup, right?

Exactly, and on a new version of Vault it supports even migration, more: https://www.vaultproject.io/docs/concepts/seal#seal-migration and https://support.hashicorp.com/hc/en-us/articles/360002040848-Seal-Migration. But it will require a little manual work. In theory, it can be automated, but this is a rather difficult task and will take a lot of time. But this is quite interesting and I probably will add it to the list of tasks

If so, do you know how I could implement using an existing AWS KMS key with the module?

I have added some configuration and example by this PR #62, after a quick test, it looks like everything is working as intended. Please test it from your side

@binlab binlab self-assigned this Aug 1, 2021
@rhinck
Copy link
Author

rhinck commented Aug 2, 2021

I just tested #62, and it was working for me, thanks! It is nice to be able to use KMS keys instead of manual unseal with the sharded keys. The Vault cluster seems to be having less issues as well.

So you have a chance to restore data from snapshots or EBS snapshots

I was considering making a Lambda script that would run on a certain interval and make Raft backups and store them in an encrypted S3 bucket. In your opinion, what is the best option for backup, EBS snapshots or the Raft ones? Are there any pros or cons to either approach?

@rhinck
Copy link
Author

rhinck commented Aug 2, 2021

Screen Shot 2021-08-02 at 4 43 57 PM

Also, if I understand correctly the node volumes are used to install the OS and the raft volumes are used by each cluster node to store Vault data?

So if I had snapshots and needed to restore them, then I take the snapshots, create new volumes from them, detach the existing volumes from the ec2 instances, and attach the volumes created from the snapshots to the ec2 instances?

@binlab
Copy link
Owner

binlab commented Aug 5, 2021

I just tested #62, and it was working for me, thanks! It is nice to be able to use KMS keys instead of manual unseal with the sharded keys. The Vault cluster seems to be having less issues as well.

Thanks, nice to hear that!

@binlab
Copy link
Owner

binlab commented Aug 5, 2021

I was considering making a Lambda script that would run on a certain interval and make Raft backups and store them in an encrypted S3 bucket. In your opinion, what is the best option for backup, EBS snapshots or the Raft ones? Are there any pros or cons to either approach?

I started thinking about automatic snapshots on the S3 bucket from the very beginning of the development of the module, but the main problem is that for this you need to create a system account and policy, and this, in turn, requires initializing the Vault. So far, there is no mechanism for pre-initializing the system user (with rights only to create snapshots). Using snapshots with storage on an S3 bucket is more preferable from the point of view of money costs and space-saving, but requires post-configuration (as I described before). But there are never too many backups 🙂

@binlab binlab added this to the v0.2.x milestone Aug 5, 2021
@binlab
Copy link
Owner

binlab commented Aug 5, 2021

Also, if I understand correctly the node volumes are used to install the OS and the raft volumes are used by each cluster node to store Vault data?

yes, absolutely. A separate EBS is needed to store the state of the cluster and data even with a complete re-creation (in case of updating the version of the Vault, for example)

@binlab
Copy link
Owner

binlab commented Aug 5, 2021

So if I had snapshots and needed to restore them, then I take the snapshots, create new volumes from them, detach the existing volumes from the ec2 instances, and attach the volumes created from the snapshots to the ec2 instances?

Yes, that's right. This is one of the possible options (but I have not tested it). There are many options for how to recover data (if you have them). For example, you can also launch the instance, add an external EBS, go to the instance by SSH and copy the data and transfer it to a new cluster. Or you can create a separate cluster with one node, and specify the EBS snapshot as a source of recovery, and then take a snapshot with your hands using a Vault UI or CLI.

@binlab
Copy link
Owner

binlab commented Aug 5, 2021

So if I had snapshots and needed to restore them, then I take the snapshots, create new volumes from them, detach the existing volumes from the ec2 instances, and attach the volumes created from the snapshots to the ec2 instances?

btw, this is a good point for creating a separate issue for this enhancement! I can investigation this method and it may be possible to completely automate this process. Since Terraform has an option snapshot_id для aws_ebs_volume resource

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants