Add support to retry function execution on invocation failures #6603

pragnagopa · 2020-09-04T19:10:52Z

Issue describing the changes in this PR

Functions host changes to add retry support : Azure/azure-webjobs-sdk#2463

Retry can be added in host.json for function app level or can be added in function.json for function level. If both are specified, function.json takes precedence.

"retry": {
    "strategy": "fixedDelay",
    "maxRetryCount": 2,
    "delayInterval": "00:00:10"
  }

resolves #6664

Pull request checklist

My changes do not require documentation changes
- Otherwise: Documentation issue linked to PR
My changes should not be added to the release notes for the next release
- Otherwise: I've added my notes to release_notes.md
My changes do not need to be backported to a previous version
- Otherwise: Backport tracked by issue/PR [V2]Add support for function execution retries #6684
I have added all required tests (Unit tests, E2E tests)

Note: Documentation for this feature is tracked internally.

src/WebJobs.Script/Description/FunctionDescriptorProvider.cs

src/WebJobs.Script/WebJobs.Script.csproj

src/WebJobs.Script.Abstractions/Description/Retry.cs

sample/Node-Retry/HttpTrigger-RetryFunctionJson/function.json

sample/Node-Retry/host.json

src/WebJobs.Script/CustomAttributeBuilderUtility.cs

src/WebJobs.Script/Host/ScriptHost.cs

test/WebJobs.Script.Tests/FunctionMetadataProviderTests.cs

pragnagopa · 2020-09-22T21:27:47Z

@mathewc / @brettsam - PR ready for review!

@jeffhollan - let me know if there is a doc I can link here.

sample/NodeRetry/HttpTrigger-RetryFunctionJson/function.json

sample/NodeRetry/host.json

src/WebJobs.Script.Abstractions/Description/FunctionMetadata.cs

src/WebJobs.Script/Config/ScriptJobHostOptions.cs

src/WebJobs.Script.Abstractions/Description/Retry.cs

pragnagopa

Addressed CR feedback!

src/WebJobs.Script.Abstractions/Description/Retry.cs

src/WebJobs.Script/WebJobs.Script.csproj

amotl · 2020-10-13T19:03:20Z

Dear @pragnagopa and @mathewc,

we are currently looking into running full integration tests with pytest through a host instance provided by azure-functions-core-tools. We are probably aiming at similar things which have been outlined within [1] and [2] and already sketched out within [3] and [4] by @jeffhollan and @Francisco-Gamino, but this time for a pure Python environment.

As we don't only want to run tests looking for successful outcomes but also want to wrap the test harness around erroneous function invocations, we have been looking for a way to get around the default setting with a retry count of 3, which terminates the host instance after trying to invoke the function three times like

Exceeded language worker restart retry count for runtime:python. Shutting down and proactively recycling the Functions Host to recover

Actually, we would like to reduce that to "1" when running within the test harness in order to speed things up.

So, we have been happy to find things like host.json or function.json just recently coming from your pen.

  "retry": {
    "strategy": "fixedDelay",
    "maxRetryCount": 2,
    "delayInterval": "00:00:03"
  }

Now - as these new attributes are not mentioned within host.json reference for Azure Functions 2.x and later yet - am I right to assume the corresponding improvements are not part of the most recent release of Azure Functions Core Tools yet? I am currently running

$ func --version
3.0.2931

Regarding this, I have two questions:

Would this be resolved with [V2]Add support for function execution retries #6684?
Will I have to wait until the next official release of Azure Functions Core Tools?
Is there a way to compile this machinery on behalf of my own or is there way to download nightly builds of https://github.com/Azure/azure-functions-core-tools/releases?

Please correct me if I am wrong on any of the details I was trying to outline here.

With kind regards,
Andreas.

[1] https://stackoverflow.com/questions/62749060/unit-and-integration-test-for-azure-function-with-servicebustrigger
[2] https://medium.com/cheranga/testing-azure-functions-part-2-86e036785f59
[3] https://github.com/jeffhollan/functions-test-helper
[4] https://github.com/Azure/azure-functions-integration-tests

jeffhollan · 2020-10-13T19:53:08Z

@pragnagopa to help - but I'm not sure if the retry policy here is going to apply to the retry you are showing there (which is retrying communication to a language worker, and not just retrying execution of a message). That said, to answer other questions, yes it's not yet in core tools, docs will merge sometime next week once feature is live and core tools get released. Likely some way to test out with latest WebJobs SDK and such but I'm not sure how. The release is rolling out right now I believe.

pragnagopa · 2020-10-13T20:58:41Z

@amotl - @jeffhollan is right. Retry support added in this PR is for function executions. Following log message

Exceeded language worker restart retry count for runtime:python. Shutting down and proactively recycling the Functions Host to recover

is coming from built-in retries to handle process crashes - in this case if python process crashes - functions host retries upto 3 times. We do not intend to expose this as a configuration setting for end-users.

amotl · 2020-10-13T22:03:19Z

Dear @jeffhollan and @pragnagopa,

thanks a bunch for your quick answers. What I was trying to achieve is to make the complete function host instance terminate itself by doing such a thing:

def main(events: List[func.EventHubEvent]):

    # Process a bunch of events here.
    for event in events:
        pass

    # Exit the function host when running SuT.
    # https://docs.pytest.org/en/latest/example/simple.html#pytest-current-test-environment-variable
    if "PYTEST_CURRENT_TEST" in os.environ:
        sys.exit(0)

In this manner, our idea was to spin up a function/host instance from a pytest fixture, submit a single or more messages to it by e.g. invoking

http http://0.0.0.0:7071/admin/functions/eventHubTrigger input=foobar

and then check the outcome of that what has been done when processing the message(s) (e.g. writing to a database).

After seeing that the host was shutting down itself after three attempts, we figured there would be a way to get around terminating the process from outside (the pytest fixture) after the test run has completed.

Now, after learning what the added "retry" functionality is all about, I believe this was an ill-fated idea.

With kind regards,
Andreas.

P.S.: One last question, @jeffhollan: I see that you are terminating the function host after 30 seconds like Host.StartAsync().Wait(TimeSpan.FromSeconds(30));, right? This might be exactly the spot where I was aiming at: You just lazily wait there for 30 seconds and then shut down the whole machinery?

In contrast to this, I am aiming at getting this together more tightly and terminate the function host directly after it has done its job processing all the messages coming from a single function invocation.

If you know any other people or resources having attempted similar things running full integration tests through Core Tools' Function Host which could guide me along, please let me know. We are aiming to wrap a test harness around a scenario where messages are ingested from EventHub (probably similar to your example at [1]) and submitted to both CrateDB and Kafka, see also Azure/azure-functions-kafka-extension#181.

[1] https://github.com/jeffhollan/functions-test-helper/blob/master/FunctionApp.Tests.Integration/EventEndToEndTests.cs

pragnagopa · 2020-10-14T15:46:19Z

@amotl - looking at the code you shared:

if "PYTEST_CURRENT_TEST" in os.environ:
        sys.exit(0)

sys.exit(0) is causing the python process to exit - which triggers functions host to restart python language worker process.

Please open new issue on : https://github.com/Azure/azure-functions-python-worker/issues if you need more guidance how to proceed.

amotl · 2020-10-14T16:36:03Z

Dear Pragna,

thanks for your answer and sorry for spamming this PR. I believe we will just use terminate() in order to kill the function host process from outside within the test runner.

Regarding another question to proceed where we are aiming at, I have just created Azure/azure-functions-python-worker#764.

Thanks again and with kind regards,
Andreas.

pragnagopa requested review from mathewc and brettsam September 4, 2020 19:11

mathewc reviewed Sep 14, 2020

View reviewed changes

src/WebJobs.Script/Description/FunctionDescriptorProvider.cs Outdated Show resolved Hide resolved

mathewc reviewed Sep 14, 2020

View reviewed changes

src/WebJobs.Script/WebJobs.Script.csproj Outdated Show resolved Hide resolved

mathewc reviewed Sep 14, 2020

View reviewed changes

src/WebJobs.Script.Abstractions/Description/Retry.cs Outdated Show resolved Hide resolved

pragnagopa mentioned this pull request Sep 16, 2020

Add support for funciton execution retries #6664

Closed