-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] - Terraform provider version inconsistency within stages #2614
Comments
I think we could add a consistent version in tf_objects.py instead of in the terraform files directly. That would enforce the same version in all of the stages. |
I'm also curious what issues you've seen from this. It seems like it shouldn't cause a problem to use different provider versions in different stages. |
I haven't noted any issue directly, but keeping this inconsistency might open the chance for bugs where tracking would be difficult; for example, a specific version of the provider might handle certain API request in a particular order while another newer version does not (same with error messages) or have different internal requirements like region, zones etc.. |
It seems like b/c the stages are isolated from each other (isolated terraform modules) that differering provider versions should be okay. That said, I think we should try to keep the versions consistent between the stages, but I'm not sure I would support enforcing it (e.g. for plugins), at least not until we see an issue. |
@smokestacklightnin will be picking up this issue |
@smokestacklightnin, here's a bit more context to help bring you up to speed on this issue. Nebari has several Terraform stages, which are run sequentially because some require the output of others as input. For each stage, we have one or multiple It also seems we are setting providers in other files, like for example: nebari/src/_nebari/stages/terraform_state/template/aws/main.tf Lines 23 to 31 in 9b1310b
We need to ensure consistency across all stages by using the same provider versions. For now, I suggest we avoid updating to the latest available versions and instead stick to the most up-to-date versions among the ones we’re currently using. |
It seems to me that this Issue could be solved by implementing #2814 |
This is also a good alternative to #2814. |
That's right. However, implementing #2814 means that all providers will get updated, and while that might be the end goal, it might also break things. Thus, in my opinion, a safer approach is to make sure providers are consistent using the versions we already have and then explore automatic upgrades to the providers.
I think it is indeed a good alternative but it would probably not work well with tools for automatic upgrades like Renovate. |
I'm looking for a way with |
It appears that it is possible to pin provider versions. |
@smokestacklightnin I think it would be best to actually pin the provider versions in We talked about this approach in our Nebari meeting this week and there was consensus that this was a good way to go here. As far as #2814, this would mean that we would need to render the terraform and then run any scanner against the rendered files, but I think that is ok. |
@smokestacklightnin: before looking into Renovate, can you look into the suggestion from above? As Chuck mentions, it would be good to have a place where we can centralize the versions instead of having them scattered around the stages. We need to review our rendering logic to make sure we can render the |
Both OpenTofu and Terraform require that the provider version constraint must be a string and not, for example, a variable. This makes it cumbersome to specify the version constraints in
Since we can't fill in the templates using native Terraform variables like the other fields that are present, we would need to template our templates, which doesn't sit well with me. Maybe others disagree. Since Nebari basically pulls the template files directly from their respective template directories, it seems to me that keeping the versions in the templates up to date (or pinned as desired) is the most direct way to keep the versions synchronized. Renovate would accomplish this.
Renovate has a configuration file where versions are specified.
As it currently stands, the only way to update Terraform versions is to do so manually, either in the templates or in the rendered files. Updates in the latter option get overwritten every time |
Hi @smokestacklightnin , thanks for the feedback and the ideas above. I have overall points at the end, but first, I wanted to provide some context regarding the templates and the render process. So, as you correctly mentioned above, Nebari pulls all terraform files from their respective stages directories, which happens during render. As you also mentioned, any manual change to the rendered files will be lost by nebari/src/_nebari/stages/base.py Lines 246 to 267 in 855aa14
we also inject a _nebari.tf.json file that will be red by terraform during runtime; this file contains certain information, such as the backend configuration when doing cloud deployments (since each stage references the same backend),
{
"provider": {
"kubernetes": {
"host": "${data.aws_eks_cluster.default.endpoint}",
"cluster_ca_certificate": "${base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)}",
"token": "${data.aws_eks_cluster_auth.default.token}"
},
"helm": {
"kubernetes": {
"host": "${data.aws_eks_cluster.default.endpoint}",
"cluster_ca_certificate": "${base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)}",
"token": "${data.aws_eks_cluster_auth.default.token}"
}
},
"aws": {
"region": "us-east-1"
}
},
"terraform": {
"backend": {
"s3": {
"bucket": "************",
"key": "terraform/******/03-kubernetes-initialize.tfstate",
"region": "us-east-1",
"encrypt": true,
"dynamodb_table": "********"
}
}
}
} that's said, there is already one nebari/src/_nebari/provider/terraform.py Lines 294 to 295 in 855aa14
If we pass this object across all stages as a
This means we would run the scans against the already rendered files as part of a CI pipeline. On the other hand, I am interested in this:
Curious in seeing what the renovate version file would look like, |
I have a minimal (almost complete) working proof of concept here. The version file is complete as a proof of concept |
From the perspective of developing the Terraform templates, it's an awkward and somewhat confusing workflow to be specifying which required providers are needed for each deployment platform (aws, gcp, etc.) when writing the Terraform injection Python code, as opposed to the more traditional Terraform workflow of specifying the required providers in a In other words, it seems awkward to remove It's also worth mentioning that, since the providers are not uniformly used across all stages, if we dynamically generate |
@viniciusdc it seems we might be using nebari/src/_nebari/stages/infrastructure/__init__.py Lines 759 to 785 in 855aa14
Do you know what's the difference between specifying the providers here and having them in a From reading the comments in this issue, it seems to me we have two different options:
I agree with @smokestacklightnin that the second option seems a like simpler approach. If we're already planning to use Renovate to manage other dependencies, I would favor it over the first option, specially since it seems we're not actually rendering the Happy to read other thoughts. |
I've gotten working proof of concepts for a path forward that uses Renovate and one that doesn't. There are advantages and disadvantages to both choices. I'd like to hear others' opinions about whether one path is preferable to the other. Path 1: Not using RenovateI have a branch that demonstrates injecting Terraform version constraints and Terraform required providers into the Terraform configuration for the Advantages of this path:
Disadvantages of this path:
Path 2: Using RenovateI have a branch that provides a minimal configuration. One of the concerns that @viniciusdc raised was that he was worried that Renovate would overwhelm the GitHub CI with too much activity. Renovate has rules for limiting CI activity specifically for this purpose, so I believe this point can be addressed and mitigated if we go with this path. Advantages of this path:
Disadvantages of this path:
I like both paths, so I'm hoping to get others' opinions on how to proceed. |
Thanks for the comprehensive overview and the POCs, @smokestacklightnin. I’m leaning towards Path 1: Not using Renovate. The main goal here is to ensure all providers use the same version across all Nebari stages. Right now, we have inconsistencies, like different versions of
While this has yet to cause major issues, it could lead to problems. Path 1 addresses this by injecting Terraform version constraints and required providers directly into our config. We’re not changing much from the current setup since the base in This approach simplifies things by moving the provider constraints into nebari/src/_nebari/constants.py Lines 3 to 5 in d272176
Regarding the concern about reverting to the old layout later:
If we decide to use Renovate in the future, we can apply it to the rendered templates in {
"commitMessageTopic": "Terraform {{depName}}",
"fileMatch": [
"\\.tf$",
"\\.tf.json$"
],
"pinDigests": false
} |
@smokestacklightnin thank you for outlining these two alternatives! While I was leaning towards path 2 in the beginning, I think @viniciusdc makes a good point about centralizing the versions in the Lines 54 to 60 in d272176
Moving forward with path 1 does not necessarily mean we won't be using Renovate. As you already mentioned, it also tracks other versions, such as GitHub Actions and Python dependencies. Aditionally, as @viniciusdc points out, there's the possibility of running Renovate against rendered Terraform files. Given that you already have a PoC for path 1 (not using Renovate) and it seems you understand what needs to be done to get it fully working, I suggest we move forward with that. |
Unless @dcmcand or @Adam-D-Lewis have any opinions in favor of using Renovate, I'll proceed with path 1: not using Renovate |
Sounds like a good plan @smokestacklightnin, thanks! |
Describe the bug
We must be more consistent in Terraform provider versions across different deployment stages. This discrepancy can lead to unpredictable behavior and potential issues during deployment. For example, on a recent AWS deployment, I noticed the following in deployment logs from Terraform:
Stage 01 -- Terraform State:
Stage 02:
Stage 03:
While we do set the version for the most important infrastructure resources:
nebari/src/_nebari/stages/infrastructure/template/aws/versions.tf
Lines 1 to 9 in a65ff53
The order stages use the
terraform.Provider
to instantiate the providers across the deployment:nebari/src/_nebari/stages/terraform_state/__init__.py
Lines 181 to 186 in a65ff53
We should make sure that it becomes consistent. Also, the exciting thing is that after stage 3, it becomes consistent across all calls; I guess it comes from the backend being already set up.
Expected behavior
At least the cloud provider versions respect the versions described in their infra modules, as that would be expected.
OS and architecture in which you are running Nebari
Linux
How to Reproduce the problem?
Any cloud provider deployment might lead to the same problem.
Command output
No response
Versions and dependencies used.
No response
Compute environment
AWS
Integrations
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: