-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
argo3.3.10 termante workflow then retry meet error: Workflow operation error #10285
Comments
@yeicandoit can you try on v3.4.4? |
argo3.4.4-debug.mp4Hi @sarabala1979 check workflow-controller.log, it met another error: time="2023-01-09T06:44:35.880Z" level=error msg="Mark error node" error="task 'dag-diamond-1-tjz7j.A' errored: no Node found by the name of ; wf.Status.Nodes=map[:{ID: Name: DisplayName: Type: TemplateName: TemplateRef:nil TemplateScope: Phase: BoundaryID: Message: StartedAt:2023-01-09 06:44:35.869380936 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress: ResourcesDuration: PodIP: Daemoned: Inputs:nil Outputs:&Outputs{Parameters:[]Parameter{},Artifacts:[]Artifact{Artifact{Name:main-logs,Path:,Mode:nil,From:,ArtifactLocation:ArtifactLocation{ArchiveLogs:nil,S3:&S3Artifact{S3Bucket:S3Bucket{Endpoint:,Bucket:,Region:,Insecure:nil,AccessKeySecret:nil,SecretKeySecret:nil,RoleARN:,UseSDKCreds:false,CreateBucketIfNotPresent:nil,EncryptionOptions:nil,},Key:dag-diamond-1-tjz7j/dag-diamond-1-tjz7j-echo-581138205/main.log,},Git:nil,HTTP:nil,Artifactory:nil,HDFS:nil,Raw:nil,OSS:nil,GCS:nil,Azure:nil,},GlobalName:,Archive:nil,Optional:false,SubPath:,RecurseMode:false,FromExpression:,ArtifactGC:nil,Deleted:false,},},Result:nil,ExitCode:nil,} Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} dag-diamond-1-tjz7j:{ID:dag-diamond-1-tjz7j Name:dag-diamond-1-tjz7j DisplayName:dag-diamond-1-tjz7j Type:DAG TemplateName:diamond TemplateRef:nil TemplateScope:local/dag-diamond-1-tjz7j Phase:Running BoundaryID: Message: StartedAt:2023-01-09 06:44:35 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress:0/1 ResourcesDuration: PodIP: Daemoned: Inputs:nil Outputs:nil Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil} dag-diamond-1-tjz7j-581138205:{ID: Name: DisplayName: Type: TemplateName: TemplateRef:nil TemplateScope: Phase: BoundaryID: Message: StartedAt:0001-01-01 00:00:00 +0000 UTC FinishedAt:0001-01-01 00:00:00 +0000 UTC EstimatedDuration:0 Progress: ResourcesDuration: PodIP: Daemoned: Inputs:nil Outputs:&Outputs{Parameters:[]Parameter{},Artifacts:[]Artifact{Artifact{Name:main-logs,Path:,Mode:nil,From:,ArtifactLocation:ArtifactLocation{ArchiveLogs:nil,S3:&S3Artifact{S3Bucket:S3Bucket{Endpoint:,Bucket:,Region:,Insecure:nil,AccessKeySecret:nil,SecretKeySecret:nil,RoleARN:,UseSDKCreds:false,CreateBucketIfNotPresent:nil,EncryptionOptions:nil,},Key:dag-diamond-1-tjz7j/dag-diamond-1-tjz7j-echo-581138205/main.log,},Git:nil,HTTP:nil,Artifactory:nil,HDFS:nil,Raw:nil,OSS:nil,GCS:nil,Azure:nil,},GlobalName:,Archive:nil,Optional:false,SubPath:,RecurseMode:false,FromExpression:,ArtifactGC:nil,Deleted:false,},},Result:nil,ExitCode:nil,} Children:[] OutboundNodes:[] HostNodeName: MemoizationStatus:nil SynchronizationStatus:nil}]" namespace=enos nodeName=dag-diamond-1-tjz7j.A workflow=dag-diamond-1-tjz7j time="2023-01-09T06:44:35.880Z" level=info msg="node phase -> Error" namespace=enos workflow=dag-diamond-1-tjz7j check argo.log to get complete workflow-controller log |
@sarabala1979 please check this issue, thanks very much |
@yeicandoit It looks like an edge case bug. node is not initialized in the retry case. Do you like to fix this issue? |
@sarabala1979 OK, I will try to fix it. |
Hi @sarabala1979 @yeicandoit Do you mind share a bit more info on this issue? We encountered |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions. |
when the workflow is running, stop/terminate it, then retry it. the workflow will meet issue: Workflow operation error. |
I think I found the root cause,
old := woc.wf.Status.Nodes[nodeID] should be changed to
https://github.com/argoproj/argo-workflows/blob/master/workflow/controller/taskresult.go |
fix issue that stop/termante workflow then retry meet error: Workflow operation error <br> argoproj#10285 Signed-off-by: yeicandoit <410342333@qq.com>
@sarabala1979 I have sent pull request #10886, please check, thanks |
Sorry just saw this. Thank you so much for the fix ! |
Signed-off-by: yeicandoit <410342333@qq.com> Signed-off-by: Dillen Padhiar <dillen_padhiar@intuit.com>
Pre-requisites
:latest
What happened/what you expected to happen?
argo3.3.10-debug.mp4
when the workflow is running, stop/terminate it, then retry it. the workflow will meet issue: Workflow operation error.
I except the workflow to re-run correctly
Version
v3.3.10
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container
The text was updated successfully, but these errors were encountered: