[BUG] - Terraform provider version inconsistency within stages #2614

viniciusdc · 2024-08-07T17:38:20Z

Describe the bug

We must be more consistent in Terraform provider versions across different deployment stages. This discrepancy can lead to unpredictable behavior and potential issues during deployment. For example, on a recent AWS deployment, I noticed the following in deployment logs from Terraform:

Stage 01 -- Terraform State:

hashicorp/aws: v5.12.0

Stage 02:

hashicorp/aws: v5.33.0
hashicorp/local: v2.5.1
hashicorp/tls: v4.0.5

Stage 03:

hashicorp/aws: v5.61.0
hashicorp/kubernetes: v2.20.0
hashicorp/helm: v2.1.2

While we do set the version for the most important infrastructure resources:

nebari/src/_nebari/stages/infrastructure/template/aws/versions.tf

Lines 1 to 9 in a65ff53

    
           terraform { 
        
             required_providers { 
        
               aws = { 
        
                 source  = "hashicorp/aws" 
        
                 version = "5.33.0" 
        
               } 
        
             } 
        
             required_version = ">= 1.0" 
        
           }

The order stages use the terraform.Provider to instantiate the providers across the deployment:

nebari/src/_nebari/stages/terraform_state/__init__.py

Lines 181 to 186 in a65ff53

    
           elif self.config.provider == schema.ProviderEnum.aws: 
        
               return [ 
        
                   terraform.Provider( 
        
                       "aws", region=self.config.amazon_web_services.region 
        
                   ), 
        
               ]

We should make sure that it becomes consistent. Also, the exciting thing is that after stage 3, it becomes consistent across all calls; I guess it comes from the backend being already set up.

Expected behavior

At least the cloud provider versions respect the versions described in their infra modules, as that would be expected.

OS and architecture in which you are running Nebari

Linux

How to Reproduce the problem?

Any cloud provider deployment might lead to the same problem.

Command output

No response

Versions and dependencies used.

No response

Compute environment

AWS

Integrations

No response

Anything else?

No response

The text was updated successfully, but these errors were encountered:

Adam-D-Lewis · 2024-08-20T15:07:00Z

I think we could add a consistent version in tf_objects.py instead of in the terraform files directly. That would enforce the same version in all of the stages.

Adam-D-Lewis · 2024-08-20T15:08:38Z

I'm also curious what issues you've seen from this. It seems like it shouldn't cause a problem to use different provider versions in different stages.

viniciusdc · 2024-08-20T16:43:32Z

I'm also curious what issues you've seen from this. It seems like it shouldn't cause a problem to use different provider versions in different stages.

I haven't noted any issue directly, but keeping this inconsistency might open the chance for bugs where tracking would be difficult; for example, a specific version of the provider might handle certain API request in a particular order while another newer version does not (same with error messages) or have different internal requirements like region, zones etc..

Adam-D-Lewis · 2024-08-20T16:53:25Z

It seems like b/c the stages are isolated from each other (isolated terraform modules) that differering provider versions should be okay. That said, I think we should try to keep the versions consistent between the stages, but I'm not sure I would support enforcing it (e.g. for plugins), at least not until we see an issue.

marcelovilla · 2024-10-31T20:17:57Z

@smokestacklightnin will be picking up this issue

marcelovilla · 2024-11-01T09:39:12Z

@smokestacklightnin, here's a bit more context to help bring you up to speed on this issue.

Nebari has several Terraform stages, which are run sequentially because some require the output of others as input. For each stage, we have one or multiple versions.tf file where Terraform provider versions are defined. For example: https://github.com/nebari-dev/nebari/blob/9b1310b33e89c2c11c3b39128ec792ca80342486/src/_nebari/stages/infrastructure/template/aws/versions.tf

It also seems we are setting providers in other files, like for example:

nebari/src/_nebari/stages/terraform_state/template/aws/main.tf

Lines 23 to 31 in 9b1310b

    
           terraform { 
        
             required_providers { 
        
               aws = { 
        
                 source  = "hashicorp/aws" 
        
                 version = "5.12.0" 
        
               } 
        
             } 
        
             required_version = ">= 1.0" 
        
           }

We need to ensure consistency across all stages by using the same provider versions. For now, I suggest we avoid updating to the latest available versions and instead stick to the most up-to-date versions among the ones we’re currently using.

smokestacklightnin · 2024-11-04T04:54:03Z

It seems to me that this Issue could be solved by implementing #2814

smokestacklightnin · 2024-11-04T07:10:16Z

I think we could add a consistent version in tf_objects.py instead of in the terraform files directly. That would enforce the same version in all of the stages.

This is also a good alternative to #2814.

marcelovilla · 2024-11-04T08:29:51Z

It seems to me that this Issue could be solved by implementing #2814

That's right. However, implementing #2814 means that all providers will get updated, and while that might be the end goal, it might also break things. Thus, in my opinion, a safer approach is to make sure providers are consistent using the versions we already have and then explore automatic upgrades to the providers.

This is also a good alternative to #2814.

I think it is indeed a good alternative but it would probably not work well with tools for automatic upgrades like Renovate.

smokestacklightnin · 2024-11-04T23:29:37Z

However, implementing #2814 means that all providers will get updated, and while that might be the end goal, it might also break things.

I'm looking for a way with renovate to pin versions of dependencies

smokestacklightnin · 2024-11-04T23:58:49Z

However, implementing #2814 means that all providers will get updated, and while that might be the end goal, it might also break things.

I'm looking for a way with renovate to pin versions of dependencies

It appears that it is possible to pin provider versions.

dcmcand · 2024-11-06T12:51:14Z

@smokestacklightnin I think it would be best to actually pin the provider versions in constants.py since we are generating the TF from templates. That way we have one single source of truth for what provider we are running.

We talked about this approach in our Nebari meeting this week and there was consensus that this was a good way to go here.

As far as #2814, this would mean that we would need to render the terraform and then run any scanner against the rendered files, but I think that is ok.

marcelovilla · 2024-11-06T18:38:19Z

@smokestacklightnin: before looking into Renovate, can you look into the suggestion from above? As Chuck mentions, it would be good to have a place where we can centralize the versions instead of having them scattered around the stages.

We need to review our rendering logic to make sure we can render the versions.tf files from a template, taking the provider versions from a common place such as constants.py. So before working on #2814, I'd suggest you work on this issue first and get acquainted with the logic behind how Nebari uses the Terraform files to render the actual configuration for a deployment, and how that's used in every stage of the deployment.

smokestacklightnin · 2024-11-08T08:40:48Z

We need to review our rendering logic to make sure we can render the versions.tf files from a template, taking the provider versions from a common place such as constants.py.

Both OpenTofu and Terraform require that the provider version constraint must be a string and not, for example, a variable. This makes it cumbersome to specify the version constraints in .tf files dynamically.

I think it would be best to actually pin the provider versions in constants.py since we are generating the TF from templates. That way we have one single source of truth for what provider we are running.

Since we can't fill in the templates using native Terraform variables like the other fields that are present, we would need to template our templates, which doesn't sit well with me. Maybe others disagree.

Since Nebari basically pulls the template files directly from their respective template directories, it seems to me that keeping the versions in the templates up to date (or pinned as desired) is the most direct way to keep the versions synchronized. Renovate would accomplish this.

it would be good to have a place where we can centralize the versions instead of having them scattered around the stages.

Renovate has a configuration file where versions are specified.

As far as #2814, this would mean that we would need to render the terraform and then run any scanner against the rendered files, but I think that is ok.

As it currently stands, the only way to update Terraform versions is to do so manually, either in the templates or in the rendered files. Updates in the latter option get overwritten every time nebari render is run.

viniciusdc · 2024-11-08T14:47:24Z

Hi @smokestacklightnin , thanks for the feedback and the ideas above. I have overall points at the end, but first, I wanted to provide some context regarding the templates and the render process.

So, as you correctly mentioned above, Nebari pulls all terraform files from their respective stages directories, which happens during render. As you also mentioned, any manual change to the rendered files will be lost by nebari render or nebari deploy (unless --disable-render is passed). Though, as part our internal Python wrapper for the terraform stage files rendering process:

nebari/src/_nebari/stages/base.py

Lines 246 to 267 in 855aa14

    
           def tf_objects(self) -> List[Dict]: 
        
               return [NebariTerraformState(self.name, self.config)] 
        
           def render(self) -> Dict[pathlib.Path, str]: 
        
               contents = { 
        
                   (self.stage_prefix / "_nebari.tf.json"): terraform.tf_render_objects( 
        
                       self.tf_objects() 
        
                   ) 
        
               } 
        
               for root, dirs, filenames in os.walk(self.template_directory): 
        
                   for filename in filenames: 
        
                       root_filename = pathlib.Path(root) / filename 
        
                       with root_filename.open("rb") as f: 
        
                           contents[ 
        
                               pathlib.Path( 
        
                                   self.stage_prefix, 
        
                                   pathlib.Path.relative_to( 
        
                                       pathlib.Path(root_filename), self.template_directory 
        
                                   ), 
        
                               ) 
        
                           ] = f.read() 
        
               return contents

we also inject a _nebari.tf.json file that will be red by terraform during runtime; this file contains certain information, such as the backend configuration when doing cloud deployments (since each stage references the same backend),

{
    "provider": {
        "kubernetes": {
            "host": "${data.aws_eks_cluster.default.endpoint}",
            "cluster_ca_certificate": "${base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)}",
            "token": "${data.aws_eks_cluster_auth.default.token}"
        },
        "helm": {
            "kubernetes": {
                "host": "${data.aws_eks_cluster.default.endpoint}",
                "cluster_ca_certificate": "${base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)}",
                "token": "${data.aws_eks_cluster_auth.default.token}"
            }
        },
        "aws": {
            "region": "us-east-1"
        }
    },
    "terraform": {
        "backend": {
            "s3": {
                "bucket": "************",
                "key": "terraform/******/03-kubernetes-initialize.tfstate",
                "region": "us-east-1",
                "encrypt": true,
                "dynamodb_table": "********"
            }
        }
    }
}

that's said, there is already one tf_object that we are not using which is the required_provider:

nebari/src/_nebari/provider/terraform.py

Lines 294 to 295 in 855aa14

    
           def RequiredProvider(_name, **kwargs): 
        
               return {"terraform": {"required_providers": {_name: kwargs}}}

If we pass this object across all stages as a tf_object during the render above, we could correctly pin all the provider versions and control the version we are passing to this class from the current constants.py.

As far as #2814, this would mean that we would need to render the terraform and then run any scanner against the rendered files, but I think that is ok.

This means we would run the scans against the already rendered files as part of a CI pipeline.

On the other hand, I am interested in this:

Renovate has a configuration file where versions are specified.

Curious in seeing what the renovate version file would look like,

smokestacklightnin · 2024-11-09T00:20:19Z

Curious in seeing what the renovate version file would look like,

I have a minimal (almost complete) working proof of concept here. The version file is complete as a proof of concept

smokestacklightnin · 2024-11-10T02:18:35Z

From the perspective of developing the Terraform templates, it's an awkward and somewhat confusing workflow to be specifying which required providers are needed for each deployment platform (aws, gcp, etc.) when writing the Terraform injection Python code, as opposed to the more traditional Terraform workflow of specifying the required providers in a versions.tf or main.tf.

In other words, it seems awkward to remove versions.tf and then scatter the information it contained (which required providers per deployment platform) across the injection logic and constants.py. If there was a more consolidated way to do this, then I would find it less awkward.

It's also worth mentioning that, since the providers are not uniformly used across all stages, if we dynamically generate versions.tf, we would also need to keep track of which stages use which providers. This is in contrast to, for example, the Terraform backend which is basically the same across all stages. I feel that doing this would add an unnecessary layer of complexity. To be frank, I would like if there was a simple way to do this, but because the providers are not the same across all stages, this seems like unnecessary added complexity versus using a tool like renovate.

marcelovilla · 2024-11-12T14:58:24Z

that's said, there is already one tf_object that we are not using which is the required_provider:

@viniciusdc it seems we might be using terraform.Provider for that, though. One example is the one you wrote in the issue description and another one is:

nebari/src/_nebari/stages/infrastructure/__init__.py

Lines 759 to 785 in 855aa14

    
           def tf_objects(self) -> List[Dict]: 
        
               if self.config.provider == schema.ProviderEnum.gcp: 
        
                   return [ 
        
                       terraform.Provider( 
        
                           "google", 
        
                           project=self.config.google_cloud_platform.project, 
        
                           region=self.config.google_cloud_platform.region, 
        
                       ), 
        
                       NebariTerraformState(self.name, self.config), 
        
                   ] 
        
               elif self.config.provider == schema.ProviderEnum.do: 
        
                   return [ 
        
                       NebariTerraformState(self.name, self.config), 
        
                   ] 
        
               elif self.config.provider == schema.ProviderEnum.azure: 
        
                   return [ 
        
                       NebariTerraformState(self.name, self.config), 
        
                   ] 
        
               elif self.config.provider == schema.ProviderEnum.aws: 
        
                   return [ 
        
                       terraform.Provider( 
        
                           "aws", region=self.config.amazon_web_services.region 
        
                       ), 
        
                       NebariTerraformState(self.name, self.config), 
        
                   ] 
        
               else: 
        
                   return []

Do you know what's the difference between specifying the providers here and having them in a versions.tf file? It seems we're currently doing both, at least for the terraform_state and the infrastructure stages. I'm unsure what defining the provider in the tf_objects file is actually accomplishing.

From reading the comments in this issue, it seems to me we have two different options:

Centralize the Terraform provider versions in the constants.py file and make sure we're using to render the relevant Terraform files for each stage.
Use Renovate to pin the maximum allowed versions and run that against the versions.tf files we already have.

I agree with @smokestacklightnin that the second option seems a like simpler approach. If we're already planning to use Renovate to manage other dependencies, I would favor it over the first option, specially since it seems we're not actually rendering the versions.tf files at this moment.

Happy to read other thoughts.

smokestacklightnin · 2024-11-18T03:19:57Z

I've gotten working proof of concepts for a path forward that uses Renovate and one that doesn't. There are advantages and disadvantages to both choices. I'd like to hear others' opinions about whether one path is preferable to the other.

Path 1: Not using Renovate

I have a branch that demonstrates injecting Terraform version constraints and Terraform required providers into the Terraform configuration for the terraform_state and kubernetes_initialize stages using tf_objects. To implement this for the whole repository, it would require going through every versions.tf or main.tf in subdirectories of nebari/src/_nebari/stages and move versioning information to constants.py.

Advantages of this path:

Uses already present native Nebari tooling for Terraform injections
We have full control over the implementation for tracking versions, as opposed to an external tool.

Disadvantages of this path:

It only applies to the Terraform versions
It moves the terraform blocks out of Terraform .tf files to a nonstandard location. This is awkward from a Terraform development perspective, but makes sense to do in the context of Python.
If we choose this path and change our mind in the future, it would take work to get back to the previous required providers layout.

Path 2: Using Renovate

I have a branch that provides a minimal configuration.

One of the concerns that @viniciusdc raised was that he was worried that Renovate would overwhelm the GitHub CI with too much activity. Renovate has rules for limiting CI activity specifically for this purpose, so I believe this point can be addressed and mitigated if we go with this path.

Advantages of this path:

Works for tracking versions of GitHub Actions and Python packages in addition to Terraform providers.
One Renovate instance can, if desired, maintain versions for multiple repositories, which would allow synchronizing all Nebari repos.

Disadvantages of this path:

External tool
Yet another tool to maintain

I like both paths, so I'm hoping to get others' opinions on how to proceed.

viniciusdc · 2024-11-20T16:38:18Z

Thanks for the comprehensive overview and the POCs, @smokestacklightnin. I’m leaning towards Path 1: Not using Renovate.

The main goal here is to ensure all providers use the same version across all Nebari stages. Right now, we have inconsistencies, like different versions of hashicorp/aws being used:

v5.12.0
v5.33.0
v5.61.0

While this has yet to cause major issues, it could lead to problems. Path 1 addresses this by injecting Terraform version constraints and required providers directly into our config. We’re not changing much from the current setup since the base in versions.tf remains in a different format.

This approach simplifies things by moving the provider constraints into constants.py. It aligns well with our existing constants like HIGHEST_SUPPORTED_KUBERNETES_VERSION and others, reducing the need to manage multiple versions.tf files. You can see how it integrates here:

nebari/src/_nebari/constants.py

Lines 3 to 5 in d272176

    
           HELM_VERSION = "v3.15.3" 
        
           KUSTOMIZE_VERSION = "5.4.3" 
        
           OPENTOFU_VERSION = "1.8.3"

Regarding the concern about reverting to the old layout later:

If we choose this path and change our mind in the future, it would take work to get back to the previous required providers layout.

If we decide to use Renovate in the future, we can apply it to the rendered templates in _nebari.tf.json, which already contain the necessary data in a different format. Plus, Renovate’s default settings can be adjusted to handle these files:

{
  "commitMessageTopic": "Terraform {{depName}}",
  "fileMatch": [
    "\\.tf$",
    "\\.tf.json$"
  ],
  "pinDigests": false
}

c.c. @dcmcand @marcelovilla @Adam-D-Lewis

marcelovilla · 2024-11-21T12:37:12Z

@smokestacklightnin thank you for outlining these two alternatives!

While I was leaning towards path 2 in the beginning, I think @viniciusdc makes a good point about centralizing the versions in the constants.py file being consistent with the way we're specifying other dependency versions, such as different image tags and binaries that we use. Furthermore, I think we can expand on this approach in the future to also include other versions hard-coded in the Terraform files, such as Helm chart versions. For example:

nebari/src/_nebari/stages/kubernetes_services/template/modules/kubernetes/services/jupyterhub/main.tf

Lines 54 to 60 in d272176

    
           resource "helm_release" "jupyterhub" { 
        
             name      = "jupyterhub-${var.namespace}" 
        
             namespace = var.namespace 
        
             repository = "https://jupyterhub.github.io/helm-chart/" 
        
             chart      = "jupyterhub" 
        
             version    = "4.0.0-0.dev.git.6707.h109668fd"

Moving forward with path 1 does not necessarily mean we won't be using Renovate. As you already mentioned, it also tracks other versions, such as GitHub Actions and Python dependencies. Aditionally, as @viniciusdc points out, there's the possibility of running Renovate against rendered Terraform files.

Given that you already have a PoC for path 1 (not using Renovate) and it seems you understand what needs to be done to get it fully working, I suggest we move forward with that.

smokestacklightnin · 2024-11-21T22:02:03Z

Unless @dcmcand or @Adam-D-Lewis have any opinions in favor of using Renovate, I'll proceed with path 1: not using Renovate

dcmcand · 2024-11-22T15:44:51Z

Sounds like a good plan @smokestacklightnin, thanks!

viniciusdc added type: bug 🐛 Something isn't working needs: triage 🚦 Someone needs to have a look at this issue and triage labels Aug 7, 2024

Adam-D-Lewis removed the needs: triage 🚦 Someone needs to have a look at this issue and triage label Aug 20, 2024

dcmcand added this to the Nebari Maintenance Team - September 2024 milestone Aug 29, 2024

github-project-automation bot added this to 🪴 Nebari Project Management Aug 29, 2024

github-project-automation bot moved this to New 🚦 in 🪴 Nebari Project Management Aug 29, 2024

viniciusdc added needs: follow-up 📫 Someone needs to get back to this issue or PR area: terraform 💾 area:developer-experience 👩🏻‍💻 labels Aug 30, 2024

viniciusdc modified the milestones: Nebari Maintenance Team - September 2024, Nebari Maintenance Team - Backlog Oct 28, 2024

dcmcand modified the milestones: Nebari Maintenance Team - Backlog, Future Planning Oct 28, 2024

marcelovilla added good first issue Good for newcomers and removed needs: follow-up 📫 Someone needs to get back to this issue or PR labels Nov 1, 2024

smokestacklightnin mentioned this issue Nov 4, 2024

[ENH] - Explore automatic update depedencies using renovate #2814

Open

marcelovilla assigned smokestacklightnin Nov 5, 2024

smokestacklightnin linked a pull request Nov 24, 2024 that will close this issue

[FIX] Terraform provider version inconsistency within stages #2862

Open

10 tasks

viniciusdc moved this from New 🚦 to In progress 🏗 in 🪴 Nebari Project Management Jan 6, 2025

viniciusdc modified the milestones: Nebari Maintenance Team - Backlog, 2025.1.2 Release Jan 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] - Terraform provider version inconsistency within stages #2614

[BUG] - Terraform provider version inconsistency within stages #2614

viniciusdc commented Aug 7, 2024

Adam-D-Lewis commented Aug 20, 2024

Adam-D-Lewis commented Aug 20, 2024

viniciusdc commented Aug 20, 2024 •

edited

Loading

Adam-D-Lewis commented Aug 20, 2024 •

edited

Loading

marcelovilla commented Oct 31, 2024

marcelovilla commented Nov 1, 2024

smokestacklightnin commented Nov 4, 2024 •

edited

Loading

smokestacklightnin commented Nov 4, 2024

marcelovilla commented Nov 4, 2024

smokestacklightnin commented Nov 4, 2024

smokestacklightnin commented Nov 4, 2024 •

edited

Loading

dcmcand commented Nov 6, 2024

marcelovilla commented Nov 6, 2024

smokestacklightnin commented Nov 8, 2024

viniciusdc commented Nov 8, 2024 •

edited

Loading

smokestacklightnin commented Nov 9, 2024 •

edited

Loading

smokestacklightnin commented Nov 10, 2024 •

edited

Loading

marcelovilla commented Nov 12, 2024

smokestacklightnin commented Nov 18, 2024 •

edited

Loading

viniciusdc commented Nov 20, 2024 •

edited

Loading

marcelovilla commented Nov 21, 2024

smokestacklightnin commented Nov 21, 2024

dcmcand commented Nov 22, 2024

[BUG] - Terraform provider version inconsistency within stages #2614

[BUG] - Terraform provider version inconsistency within stages #2614

Comments

viniciusdc commented Aug 7, 2024

Describe the bug

Stage 01 -- Terraform State:

Stage 02:

Stage 03:

Expected behavior

OS and architecture in which you are running Nebari

How to Reproduce the problem?

Command output

Versions and dependencies used.

Compute environment

Integrations

Anything else?

Adam-D-Lewis commented Aug 20, 2024

Adam-D-Lewis commented Aug 20, 2024

viniciusdc commented Aug 20, 2024 • edited Loading

Adam-D-Lewis commented Aug 20, 2024 • edited Loading

marcelovilla commented Oct 31, 2024

marcelovilla commented Nov 1, 2024

smokestacklightnin commented Nov 4, 2024 • edited Loading

smokestacklightnin commented Nov 4, 2024

marcelovilla commented Nov 4, 2024

smokestacklightnin commented Nov 4, 2024

smokestacklightnin commented Nov 4, 2024 • edited Loading

dcmcand commented Nov 6, 2024

marcelovilla commented Nov 6, 2024

smokestacklightnin commented Nov 8, 2024

viniciusdc commented Nov 8, 2024 • edited Loading

smokestacklightnin commented Nov 9, 2024 • edited Loading

smokestacklightnin commented Nov 10, 2024 • edited Loading

marcelovilla commented Nov 12, 2024

smokestacklightnin commented Nov 18, 2024 • edited Loading

Path 1: Not using Renovate

Path 2: Using Renovate

viniciusdc commented Nov 20, 2024 • edited Loading

marcelovilla commented Nov 21, 2024

smokestacklightnin commented Nov 21, 2024

dcmcand commented Nov 22, 2024

viniciusdc commented Aug 20, 2024 •

edited

Loading

Adam-D-Lewis commented Aug 20, 2024 •

edited

Loading

smokestacklightnin commented Nov 4, 2024 •

edited

Loading

smokestacklightnin commented Nov 4, 2024 •

edited

Loading

viniciusdc commented Nov 8, 2024 •

edited

Loading

smokestacklightnin commented Nov 9, 2024 •

edited

Loading

smokestacklightnin commented Nov 10, 2024 •

edited

Loading

smokestacklightnin commented Nov 18, 2024 •

edited

Loading

viniciusdc commented Nov 20, 2024 •

edited

Loading