Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python] Fix backport of rolling batch non-streaming non-200 error code support #2478

Conversation

davidthomas426
Copy link
Contributor

Description

With further testing, we found that pull request #2466 was incomplete. This adds the rest of what was needed.

I ran with this model.py script to cause exceptions to be thrown randomly, as well as a variant that overrides postprocess_results to not set code or error, to ensure that @stop_on_any_exception works as expected.

I also verified that it does not change the behavior unless enabling the feature flag.

model.py:

import djl_python.rolling_batch.lmi_dist_rolling_batch
_update_request_cache_with_output = djl_python.rolling_batch.lmi_dist_rolling_batch.update_request_cache_with_output

_ctr = 0

def update_request_cache_with_output(*args, **kwargs):
    global _ctr
    _ctr += 1
    if _ctr % 20 == 0:
        raise RuntimeError("Uh oh")
    return _update_request_cache_with_output(*args, **kwargs)

djl_python.rolling_batch.lmi_dist_rolling_batch.update_request_cache_with_output = update_request_cache_with_output

import djl_python.huggingface
handle = djl_python.huggingface.handle
$ env | sort | grep SERVING_
SERVING_BACKPORT_FOR_NON_STREAMING_HTTP_ERROR_CODES=true
SERVING_FEATURES=vllm,lmi-dist

$ env | sort | grep OPTION_
OPTION_ENFORCE_EAGER=true
OPTION_MODEL_ID=TheBloke/Llama-2-7B-fp16
OPTION_MPI_MODE=true
OPTION_PIPELINE_PARALLEL_DEGREE=1
OPTION_TENSOR_PARALLEL_DEGREE=4
$ time awscurl -c 1 -N 1 -X POST http://127.0.0.1:8080/invocations --connect-timeout 60 -H Content-type: application/json -v --data {"inputs": "list all the names which start with the letter r","parameters":{"max_new_tokens":10,"seed":2,"do_sample":true}}

WARN Client 0 delay: 0

timestamp: 1729579725060
input: {"inputs": "list all the names which start with the letter r","parameters":{"max_new_tokens":10,"seed":2,"do_sample":true}}
output: > POST //invocations HTTP/1.1
>
> Accept: */*
> User-Agent: awscurl/1.0.0
> Host: 127.0.0.1:8094
> Content-Type: application/json
>
< HTTP/1.1 424 Uh oh
<
< Content-Type: application/json
< x-request-id: c667f402-1df9-4134-92af-923c21071c30
< Pragma: no-cache
< Cache-Control: no-cache; no-store, must-revalidate, private
< Expires: Thu, 01 Jan 1970 00:00:00 UTC
< transfer-encoding: chunked
< connection: keep-alive
<
{"error":"Uh oh","code":424}
real    0m0.539s
user    0m0.812s
sys     0m0.054s

@siddvenk siddvenk merged commit c807fe0 into deepjavalibrary:0.28.0-dlc Oct 22, 2024
1 of 6 checks passed
@davidthomas426 davidthomas426 deleted the fix-backport-pull-2173-on-v0.28 branch October 24, 2024 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants