Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to retry function execution on invocation failures #6603

Merged
merged 4 commits into from
Sep 24, 2020

Conversation

pragnagopa
Copy link
Member

@pragnagopa pragnagopa commented Sep 4, 2020

Issue describing the changes in this PR

Functions host changes to add retry support : Azure/azure-webjobs-sdk#2463

Retry can be added in host.json for function app level or can be added in function.json for function level. If both are specified, function.json takes precedence.

"retry": {
    "strategy": "fixedDelay",
    "maxRetryCount": 2,
    "delayInterval": "00:00:10"
  }

resolves #6664

Pull request checklist

  • My changes do not require documentation changes
    • Otherwise: Documentation issue linked to PR
  • My changes should not be added to the release notes for the next release
    • Otherwise: I've added my notes to release_notes.md
  • My changes do not need to be backported to a previous version
  • I have added all required tests (Unit tests, E2E tests)

Note: Documentation for this feature is tracked internally.

@pragnagopa pragnagopa marked this pull request as ready for review September 21, 2020 22:04
@pragnagopa pragnagopa requested a review from alrod September 21, 2020 22:05
@pragnagopa
Copy link
Member Author

@mathewc / @brettsam - PR ready for review!

@jeffhollan - let me know if there is a doc I can link here.

sample/NodeRetry/host.json Outdated Show resolved Hide resolved
Copy link
Member Author

@pragnagopa pragnagopa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed CR feedback!

src/WebJobs.Script.Abstractions/Description/Retry.cs Outdated Show resolved Hide resolved
src/WebJobs.Script/WebJobs.Script.csproj Outdated Show resolved Hide resolved
@pragnagopa pragnagopa changed the title Function execution retry Add support to retry function execution on invocation failures Sep 24, 2020
@pragnagopa pragnagopa merged commit dd062a5 into dev Sep 24, 2020
@pragnagopa pragnagopa deleted the pgopa/host-retry branch September 24, 2020 17:30
@amotl
Copy link

amotl commented Oct 13, 2020

Dear @pragnagopa and @mathewc,

we are currently looking into running full integration tests with pytest through a host instance provided by azure-functions-core-tools. We are probably aiming at similar things which have been outlined within [1] and [2] and already sketched out within [3] and [4] by @jeffhollan and @Francisco-Gamino, but this time for a pure Python environment.

As we don't only want to run tests looking for successful outcomes but also want to wrap the test harness around erroneous function invocations, we have been looking for a way to get around the default setting with a retry count of 3, which terminates the host instance after trying to invoke the function three times like

Exceeded language worker restart retry count for runtime:python. Shutting down and proactively recycling the Functions Host to recover

Actually, we would like to reduce that to "1" when running within the test harness in order to speed things up.

So, we have been happy to find things like host.json or function.json just recently coming from your pen.

  "retry": {
    "strategy": "fixedDelay",
    "maxRetryCount": 2,
    "delayInterval": "00:00:03"
  }

Now - as these new attributes are not mentioned within host.json reference for Azure Functions 2.x and later yet - am I right to assume the corresponding improvements are not part of the most recent release of Azure Functions Core Tools yet? I am currently running

$ func --version
3.0.2931

Regarding this, I have two questions:

Please correct me if I am wrong on any of the details I was trying to outline here.

With kind regards,
Andreas.

[1] https://stackoverflow.com/questions/62749060/unit-and-integration-test-for-azure-function-with-servicebustrigger
[2] https://medium.com/cheranga/testing-azure-functions-part-2-86e036785f59
[3] https://github.com/jeffhollan/functions-test-helper
[4] https://github.com/Azure/azure-functions-integration-tests

@jeffhollan
Copy link

@pragnagopa to help - but I'm not sure if the retry policy here is going to apply to the retry you are showing there (which is retrying communication to a language worker, and not just retrying execution of a message). That said, to answer other questions, yes it's not yet in core tools, docs will merge sometime next week once feature is live and core tools get released. Likely some way to test out with latest WebJobs SDK and such but I'm not sure how. The release is rolling out right now I believe.

@pragnagopa
Copy link
Member Author

@amotl - @jeffhollan is right. Retry support added in this PR is for function executions. Following log message

Exceeded language worker restart retry count for runtime:python. Shutting down and proactively recycling the Functions Host to recover

is coming from built-in retries to handle process crashes - in this case if python process crashes - functions host retries upto 3 times. We do not intend to expose this as a configuration setting for end-users.

@amotl
Copy link

amotl commented Oct 13, 2020

Dear @jeffhollan and @pragnagopa,

thanks a bunch for your quick answers. What I was trying to achieve is to make the complete function host instance terminate itself by doing such a thing:

def main(events: List[func.EventHubEvent]):

    # Process a bunch of events here.
    for event in events:
        pass

    # Exit the function host when running SuT.
    # https://docs.pytest.org/en/latest/example/simple.html#pytest-current-test-environment-variable
    if "PYTEST_CURRENT_TEST" in os.environ:
        sys.exit(0)

In this manner, our idea was to spin up a function/host instance from a pytest fixture, submit a single or more messages to it by e.g. invoking

http http://0.0.0.0:7071/admin/functions/eventHubTrigger input=foobar

and then check the outcome of that what has been done when processing the message(s) (e.g. writing to a database).

After seeing that the host was shutting down itself after three attempts, we figured there would be a way to get around terminating the process from outside (the pytest fixture) after the test run has completed.

Now, after learning what the added "retry" functionality is all about, I believe this was an ill-fated idea.

With kind regards,
Andreas.

P.S.: One last question, @jeffhollan: I see that you are terminating the function host after 30 seconds like Host.StartAsync().Wait(TimeSpan.FromSeconds(30));, right? This might be exactly the spot where I was aiming at: You just lazily wait there for 30 seconds and then shut down the whole machinery?

In contrast to this, I am aiming at getting this together more tightly and terminate the function host directly after it has done its job processing all the messages coming from a single function invocation.

If you know any other people or resources having attempted similar things running full integration tests through Core Tools' Function Host which could guide me along, please let me know. We are aiming to wrap a test harness around a scenario where messages are ingested from EventHub (probably similar to your example at [1]) and submitted to both CrateDB and Kafka, see also Azure/azure-functions-kafka-extension#181.

[1] https://github.com/jeffhollan/functions-test-helper/blob/master/FunctionApp.Tests.Integration/EventEndToEndTests.cs

@pragnagopa
Copy link
Member Author

@amotl - looking at the code you shared:

if "PYTEST_CURRENT_TEST" in os.environ:
        sys.exit(0)

sys.exit(0) is causing the python process to exit - which triggers functions host to restart python language worker process.

Please open new issue on : https://github.com/Azure/azure-functions-python-worker/issues if you need more guidance how to proceed.

@amotl
Copy link

amotl commented Oct 14, 2020

Dear Pragna,

thanks for your answer and sorry for spamming this PR. I believe we will just use terminate() in order to kill the function host process from outside within the test runner.

Regarding another question to proceed where we are aiming at, I have just created Azure/azure-functions-python-worker#764.

Thanks again and with kind regards,
Andreas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for funciton execution retries
5 participants