Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run workflows in kubernetes #1070

Open
mPyKen opened this issue Mar 24, 2023 · 2 comments
Open

Unable to run workflows in kubernetes #1070

mPyKen opened this issue Mar 24, 2023 · 2 comments
Labels
kind/bug Something isn't working

Comments

@mPyKen
Copy link

mPyKen commented Mar 24, 2023

Expected Behavior

Being able to run workflows normally on kubernetes the same way it works locally

Actual Behavior

Somehow workflow does not start and times out as shown in the log:

time="2023-03-23T12:19:56.31185114Z" level=debug msg="invoking method 'CreateWorkflowInstance' on workflow actor 'asdf'" app_id=wfapp instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.runtime.wfengine type=log ver=1.10.4
time="2023-03-23T12:19:56.311875005Z" level=debug msg="asdf: loading workflow state" app_id=wfapp instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.runtime.wfengine type=log ver=1.10.4
time="2023-03-23T12:19:56.347261672Z" level=debug msg="asdf: creating 'start-4ac7fe35' reminder with DueTime = 0s" app_id=wfapp instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.runtime.wfengine type=log ver=1.10.4
time="2023-03-23T12:19:56.370525261Z" level=debug msg="starting to read reminders for actor type dapr.internal.wfengine.workflow (migrate=false), with metadata id 00000000-0000-0000-0000-000000000000 and 0 partitions" app_id=wfapp instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.runtime.actor type=log ver=1.10.4
time="2023-03-23T12:19:56.377679829Z" level=debug msg="read reminders from actors||dapr.internal.wfengine.workflow without partition: " app_id=wfapp instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.runtime.actor type=log ver=1.10.4
time="2023-03-23T12:19:56.37770212Z" level=debug msg="finished reading reminders for actor type dapr.internal.wfengine.workflow (migrate=false), with metadata id 00000000-0000-0000-0000-000000000000 and no partitions: total of 0 reminders" app_id=wfapp instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.runtime.actor type=log ver=1.10.4
time="2023-03-23T12:19:56.377719283Z" level=debug msg="saving 1 reminders in actors||dapr.internal.wfengine.workflow ..." app_id=wfapp instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.runtime.actor type=log ver=1.10.4
time="2023-03-23T12:19:56.403174193Z" level=debug msg="asdf: saving 3 keys to actor state store" app_id=wfapp instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.runtime.wfengine type=log ver=1.10.4
time="2023-03-23T12:19:56.40325296Z" level=debug msg="#operations=3,partitionkey=wfapp||dapr.internal.wfengine.workflow||asdf" app_id=wfapp component="actor-state-store (state.azure.cosmosdb/v1)" instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.contrib type=log ver=1.10.4
time="2023-03-23T12:19:56.403305396Z" level=debug msg="executing reminder start-4ac7fe35 for actor type dapr.internal.wfengine.workflow with id asdf" app_id=wfapp instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.runtime.actor type=log ver=1.10.4
time="2023-03-23T12:19:56.413475002Z" level=debug msg="Operation 0 completed with status code 201" app_id=wfapp component="actor-state-store (state.azure.cosmosdb/v1)" instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.contrib type=log ver=1.10.4
time="2023-03-23T12:19:56.413498607Z" level=debug msg="Operation 1 completed with status code 201" app_id=wfapp component="actor-state-store (state.azure.cosmosdb/v1)" instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.contrib type=log ver=1.10.4
time="2023-03-23T12:19:56.413514687Z" level=debug msg="Operation 2 completed with status code 201" app_id=wfapp component="actor-state-store (state.azure.cosmosdb/v1)" instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.contrib type=log ver=1.10.4
time="2023-03-23T12:19:56.413558719Z" level=info msg="created new workflow instance with ID 'asdf'" app_id=wfapp component="dapr (workflow.dapr/v1)" instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.contrib type=log ver=1.10.4
time="2023-03-23T12:19:56.413586441Z" level=debug msg="&{asdf}" app_id=wfapp instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.runtime.http type=log ver=1.10.4
time="2023-03-23T12:19:56.413585635Z" level=debug msg="invoking reminder 'start-4ac7fe35' on workflow actor 'asdf'" app_id=wfapp instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.runtime.wfengine type=log ver=1.10.4
time="2023-03-23T12:19:56.413677475Z" level=debug msg="orchestration-processor: processing work item: asdf (1 event(s))" app_id=wfapp instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.runtime.wfengine type=log ver=1.10.4
time="2023-03-23T12:19:56.413723611Z" level=debug msg="asdf: received work item with 1 new event(s): [ExecutionStarted]" app_id=wfapp instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.runtime.wfengine type=log ver=1.10.4
time="2023-03-23T12:19:56.413745232Z" level=debug msg="asdf: got orchestration runtime state: name=OrderProcessingWorkflow, status=PENDING, events=0, age=(new)" app_id=wfapp instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.runtime.wfengine type=log ver=1.10.4
time="2023-03-23T12:19:56.413768135Z" level=info msg="asdf: starting new 'OrderProcessingWorkflow' instance." app_id=wfapp instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.runtime.wfengine type=log ver=1.10.4
time="2023-03-23T12:19:56.413786479Z" level=debug msg="asdf: invoking orchestrator" app_id=wfapp instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.runtime.wfengine type=log ver=1.10.4
time="2023-03-23T12:20:26.414052507Z" level=warning msg="asdf: execution timed-out and will be retried later" app_id=wfapp instance=wfapp-7cb4ccd64b-fgn68 scope=dapr.runtime.wfengine type=log ver=1.10.4

Steps to Reproduce the Problem

It should be reproducible, but I don't have the exact steps for now. I used azure kubernetes service. Make sure to call AddDaprWorkflow() which will add a dapr client with localhost endpoint. (The default value when calling AddDaprClient() is 127.0.0.1)

Workaround

I wrote my own AddDaprWorkflow() method with a TryGetGrpcAddress() that returns an endpoint with 127.0.0.1. Call AddWorkflowsAndActivitiesToRegistry() via reflection

var dynMethod = typeof(WorkflowRuntimeOptions).GetMethod(
    "AddWorkflowsAndActivitiesToRegistry",
    BindingFlags.NonPublic | BindingFlags.Instance
);
dynMethod.Invoke(options, new object[] { registry });

It then works

Related issues

@cgillum
Copy link
Contributor

cgillum commented May 15, 2023

@mPyKen any clues as to why you needed to change 127.0.0.1 to localhost to get this working? I wouldn't have expected that to be a requirement.

Also, just to make sure this isn't a separate, known issue that you're running into:

  1. How many replicas does your app have?
  2. Do you have any other workflow apps running in your cluster?

@mPyKen
Copy link
Author

mPyKen commented May 16, 2023

I had to change it the other way around, from localhost to 127.0.0.1. No idea why it would not work with localhost, I could confirm there was an entry for localhost in /etc/hosts in the container.
Were you able to reproduce this issue on your end?

1. How many replicas does your app have?

only had 1 replica

2. Do you have any other workflow apps running in your cluster?

no, this was the only one running

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants