Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

controller parallelism setting not working in 3.4.9 #11541

Closed
2 of 3 tasks
andef opened this issue Aug 7, 2023 · 11 comments · Fixed by #11546
Closed
2 of 3 tasks

controller parallelism setting not working in 3.4.9 #11541

andef opened this issue Aug 7, 2023 · 11 comments · Fixed by #11546
Labels
area/controller Controller issues, panics type/bug type/regression Regression from previous behavior (a specific type of bug)

Comments

@andef
Copy link

andef commented Aug 7, 2023

Pre-requisites

  • I have double-checked my configuration
  • I can confirm the issues exists when I tested with :latest
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what you expected to happen?

I have pasted a small bash script that starts minikube and installs argo with 3.4.9 and configuring parallelism to 5.
Then submit 10 workflows and watches the result. In 3.4.8 this works as expected, but not in 3.4.9 or latest.

Version

3.4.9

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

#!/usr/bin/env bash

minikube delete
minikube start
kubectl create namespace argo

# Install argo.
#kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/download/v3.4.8/install.yaml
kubectl apply -n argo -f https://github.com/argoproj/argo-workflows/releases/download/v3.4.9/install.yaml
#kubectl apply -n argo -k https://github.com/argoproj/argo-workflows/manifests/cluster-install

echo "Wait for controller to start..."
kubectl wait --for=condition=available --timeout=60s deployment/workflow-controller -n argo

# Add parallelism: 5 to configmap.
kubectl patch configmap workflow-controller-configmap -n argo -p '{"data":{"parallelism": "5"}}'

# Restart controller.
kubectl -n argo get pod|grep controller|cut -d " " -f1|xargs kubectl -n argo delete pod

echo "Wait for server to start..."
kubectl wait --for=condition=available --timeout=60s deployment/argo-server -n argo

WORKFLOW=$(cat <<EOF
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: sleep-
  namespace: argo
spec:
  entrypoint: sleep
  templates:
  - name: sleep
    container:
      image: busybox:latest
      command: [sleep]
      args: ["30"]
EOF
)

for _ in {1..10}
do
   echo "${WORKFLOW}" | argo submit -n argo - 
done

watch argo -n argo list

Logs from the workflow controller

N/A

Logs from in your workflow's wait container

N/A
@andef andef added type/bug type/regression Regression from previous behavior (a specific type of bug) labels Aug 7, 2023
@terrytangyuan
Copy link
Member

Can you try restarting the Argo controller pod after your configmap patch?

@agilgur5
Copy link

agilgur5 commented Aug 7, 2023

Can also see more info in the Slack thread

@andef
Copy link
Author

andef commented Aug 7, 2023

@terrytangyuan I am pretty sure the controller has not started when the patch is done. It works fine if you change to 3.4.8. But, I can update the script to restart the controller.

@terrytangyuan
Copy link
Member

Potentially related commit f7b3072

@terrytangyuan
Copy link
Member

@andef
Copy link
Author

andef commented Aug 7, 2023

Updated the script to restart controller. Same results.

@sonbui00
Copy link
Member

sonbui00 commented Aug 9, 2023

Potentially related commit f7b3072

Just test and confirm this. After move wfc.UpdateConfig(ctx) Throttler init without config from config map

sonbui00 added a commit to sonbui00/argo-workflows that referenced this issue Aug 9, 2023
@agilgur5
Copy link

agilgur5 commented Aug 9, 2023

Just test and confirm this. After move wfc.UpdateConfig(ctx) Throttler init without config from config map

Nice catch!

Potentially related commit f7b3072

#11178 is the PR, for back-link purposes.

The same PR seems to be the root cause for #11542. In my root cause analysis there, I suspect a couple of other things are impacted as well. Basically anything that was previously after the config update.

@tooptoop4
Copy link
Contributor

just wanted to mention that since upgrading to v3.4.9 we have seen 150 workflows move from pending to running (at the same time) even though parallelism at wfcontroller is 30 meaning I would never expect more than 30 should be running. @terrytangyuan can there be a 3.4.10 release today for this fix?

@terrytangyuan
Copy link
Member

terrytangyuan commented Aug 9, 2023

Yes we will release 3.4.1

@terrytangyuan
Copy link
Member

Let's track in #11552

terrytangyuan added a commit to terrytangyuan/argo-workflows that referenced this issue Aug 10, 2023
sarabala1979 pushed a commit that referenced this issue Aug 10, 2023
…, #11541 (#11553)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
terrytangyuan added a commit that referenced this issue Aug 11, 2023
…, #11541 (#11553)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
@agilgur5 agilgur5 changed the title workflow-controller-configmap/parallelism setting not working in 3.4.9 controller parallelism setting not working in 3.4.9 Feb 21, 2024
@agilgur5 agilgur5 added the area/controller Controller issues, panics label Feb 21, 2024
dpadhiar pushed a commit to dpadhiar/argo-workflows that referenced this issue May 9, 2024
…proj#11542, argoproj#11541 (argoproj#11553)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: Dillen Padhiar <dillen_padhiar@intuit.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics type/bug type/regression Regression from previous behavior (a specific type of bug)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants