Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gateway group unexpectedly updated on job update #10077

Closed
tgross opened this issue Feb 23, 2021 · 3 comments · Fixed by #10080
Closed

gateway group unexpectedly updated on job update #10077

tgross opened this issue Feb 23, 2021 · 3 comments · Fixed by #10080
Assignees
Labels
theme/consul/connect Consul Connect integration type/bug

Comments

@tgross
Copy link
Member

tgross commented Feb 23, 2021

An update to another task group within a job causes a gateway task group to also be updated.

Using the following job, which is the terminating gateway example job with a bunch of the block comments removed:

job "countdash-terminating" {
  datacenters = ["dc1"]

  # This group provides the service that exists outside of the Consul Connect
  # service mesh. It is using host networking and listening to a statically
  # allocated port.
  group "api" {
    network {
      mode = "host"
      port "port" {
        static = "9001"
      }
    }
    # This example will enable services in the service mesh to make requests
    # to this service which is not in the service mesh by making requests
    # through the terminating gateway.
    service {
      name = "count-api"
      port = "port"
    }
    task "api" {
      driver = "docker"
      config {
        image        = "hashicorpnomad/counter-api:v3"
        network_mode = "host"
      }
    }
  }

  group "gateway" {
    network {
      mode = "bridge"
    }
    service {
      name = "api-gateway"
      connect {
        gateway {
          proxy {}
          terminating {
            service {
              name = "count-api"
            }
          }
        }
      }
    }
  }

  # The dashboard service is in the service mesh, making use of bridge network
  # mode and connect.sidecar_service. When running, the dashboard should be
  # available from a web browser at localhost:9002.
  group "dashboard" {
    network {
      mode = "bridge"
      port "http" {
        static = 9002
        to     = 9002
      }
    }
    service {
      name = "count-dashboard"
      port = "9002"
      connect {
        sidecar_service {
          proxy {
            upstreams {
              destination_name = "count-api"
              local_bind_port  = 8080
            }
          }
        }
      }
    }
    task "dashboard" {
      driver = "docker"
      env {
        COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
        # SOMETHING_ELSE = 1
      }
      config {
        image = "hashicorpnomad/counter-dashboard:v3"
      }
    }
  }
}

Run the job and wait for the deployment to succeed:

$ nomad job run ./example.nomad
==> Monitoring evaluation "0dcccaa8"
    Evaluation triggered by job "countdash-terminating"
==> Monitoring evaluation "0dcccaa8"
    Evaluation within deployment: "47837fe8"
    Allocation "885d9682" created: node "ae0e7ff7", group "gateway"
    Allocation "b277bfa8" created: node "ae0e7ff7", group "api"
    Allocation "804f86fe" created: node "ae0e7ff7", group "dashboard"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "0dcccaa8" finished with status "complete"

$ nomad job status c
ID            = countdash-terminating
Name          = countdash-terminating
Submit Date   = 2021-02-23T13:42:55-05:00
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
api         0       0         1        0       0         0
dashboard   0       0         1        0       0         0
gateway     0       0         1        0       0         0

Latest Deployment
ID          = 47837fe8
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
api         1        1       1        0          2021-02-23T18:53:06Z
dashboard   1        1       1        0          2021-02-23T18:53:07Z
gateway     1        1       1        0          2021-02-23T18:53:07Z

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created  Modified
804f86fe  ae0e7ff7  dashboard   0        run      running  14s ago  2s ago
885d9682  ae0e7ff7  gateway     0        run      running  14s ago  2s ago
b277bfa8  ae0e7ff7  api         0        run      running  14s ago  4s ago

Then modify the job by uncommenting the env var in the dashboard group, and run again:

$ nomad job run ./example.nomad
==> Monitoring evaluation "cb88b84c"
    Evaluation triggered by job "countdash-terminating"
==> Monitoring evaluation "cb88b84c"
    Evaluation within deployment: "8a33a331"
    Allocation "89964e89" created: node "ae0e7ff7", group "dashboard"
    Allocation "a6320a7b" created: node "ae0e7ff7", group "gateway"
    Allocation "b277bfa8" modified: node "ae0e7ff7", group "api"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "cb88b84c" finished with status "complete"

$ nomad job status c
ID            = countdash-terminating
Name          = countdash-terminating
Submit Date   = 2021-02-23T13:43:14-05:00
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
api         0       0         1        0       0         0
dashboard   0       1         1        0       0         0
gateway     0       1         1        0       0         0

Latest Deployment
ID          = 8a33a331
Status      = running
Description = Deployment is running

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
api         1        1       0        0          2021-02-23T18:53:15Z
dashboard   1        1       0        0          2021-02-23T18:53:15Z
gateway     1        1       0        0          2021-02-23T18:53:15Z

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created  Modified
89964e89  ae0e7ff7  dashboard   1        run      pending  5s ago   5s ago
a6320a7b  ae0e7ff7  gateway     1        run      pending  5s ago   5s ago
804f86fe  ae0e7ff7  dashboard   0        stop     running  24s ago  5s ago
885d9682  ae0e7ff7  gateway     0        stop     running  24s ago  5s ago
b277bfa8  ae0e7ff7  api         1        run      running  24s ago  5s ago

Note that both the dashboard and gateway groups have new allocations.

I tried the same thing with the count-api group and the gateway was updated again, so it doesn't appear to matter whether the other group was a target of the gateway or not:

Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created    Modified
d4cf445a  ae0e7ff7  gateway     2        run      pending   5s ago     5s ago
a07c79b6  ae0e7ff7  api         2        run      running   5s ago     4s ago
89964e89  ae0e7ff7  dashboard   2        run      running   6m53s ago  5s ago
a6320a7b  ae0e7ff7  gateway     1        stop     running   6m53s ago  4s ago
885d9682  ae0e7ff7  gateway     0        stop     complete  7m12s ago  6m42s ago
b277bfa8  ae0e7ff7  api         1        stop     complete  7m12s ago  4s ago
804f86fe  ae0e7ff7  dashboard   0        stop     complete  7m12s ago  6m42s ago
@tgross tgross changed the title terminating gateway unexpectedly updated on job update gateway group unexpectedly updated on job update Feb 23, 2021
@tgross
Copy link
Member Author

tgross commented Feb 23, 2021

I was also able to reproduce this behavior with the ingress gateway example.

@shoenig shoenig self-assigned this Feb 23, 2021
shoenig added a commit that referenced this issue Feb 23, 2021
This PR fixes a bug where tasks with Connect services could be
triggered to destructively update (i.e. placed in a new alloc)
when no update should be necessary.

Fixes #10077
tgross pushed a commit that referenced this issue Feb 23, 2021
This PR fixes a bug where tasks with Connect services could be
triggered to destructively update (i.e. placed in a new alloc)
when no update should be necessary.

Fixes #10077
@tgross
Copy link
Member Author

tgross commented Feb 24, 2021

I've confirmed fixed with the #10080 PR on the release build for 1.0.4

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
theme/consul/connect Consul Connect integration type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants