[serving][python] Support non 200 HTTP response codes for non-streami… #2173

siddvenk · 2024-07-13T02:26:59Z

…ng rolling-batch

Change non-streaming output formatters to only generate output on final token (valid and error cases)
Update InferenceRequestHandler to only publish HTTP Response when data is actually received
Update RollingBatch to properly set status code in non-streaming use-cases

Note:

The ask for supporting non 200 HTTP responses has been asked for be a few users.

TODO:

I need to update unit tests, but i have done extensive local testing and am comfortable with the core changes raised here. I'll change from WIP to Open once i complete the unit tests.

This change is motivated by two things:

Make output formatters for non streaming use-cases easier to deal with, and more readable
Provide non-200 HTTP status code in the non-streaming error cases.

The output formatters for non-streaming use-cases are currently pretty hard to reason about. Beyond just making it easier to read and work with, this also fixes an issue:

If there is an error during generation after we have generated a few tokens, the current behavior results in a response that is not valid json. This change ensure that the output formatters we provide always result in valid json content in mid-generation error scenarios

The second part of this change is to provide non-200 status codes during non-streaming use-cases when an error occurs during inference. This is only possible with the first change to the output formatters. There are two small changes in InferenceRequestHandler and RollingBatch to enable this.

frankfliu · 2024-07-16T03:23:51Z

engines/python/setup/djl_python/utils.py

@@ -98,7 +98,7 @@ def rolling_batch_inference(parsed_input, inputs: Input, outputs: Output,
            outputs.add(Output.binary_encode(err), key="data", batch_index=i)
            outputs.add_property(f"batch_{i}_Content-Type", "application/json")
        else:
-            content_type = result[idx].pop("content_type")
+            content_type = result[idx].get("content_type")


Just curious if .pop() causing any issue?

I would suggest to use get here. Maybe in the future someone would access the content in the later time. Python GC would collect it eventually

pop does not cause issues currently, but it's incorrectly used here with the subsequent check to see if content-type is None.

Either we use pop and get rid of that check, or use get here.

frankfliu · 2024-07-16T03:25:55Z

engines/python/setup/djl_python/tests/test_rolling_batch.py

@@ -628,7 +607,7 @@ def test_chat_json(self):
            },
            'finish_reason': 'length'
        }]
-        assert final_json['usage'] == {
+        assert output['usage'] == {


A side topic, noticed we use assert in production code quite a bit. I'm not a fan of assert, we should rethink about usage of assert

agreed, this is the test code but we should not be using assert in the production code.

I think keeping assert in the test code is fine - separately i'm exploring a migration to pytest instead of unittest to get some better test outputs and codecoverage. pytest only supports assert

lanking520 · 2024-07-16T04:54:28Z

engines/python/setup/djl_python/tests/test_rolling_batch.py

            req.set_next_token(Token(4558, " world", -0.567854), True,
                               'length')
            print(req.get_next_token(), end='')
-            assert req.get_next_token() == ' world"}'
-            req.reset_next_token()
+            assert req.get_next_token() == json.dumps(


self.assertEquals

let me take this up as a separate change. We don't have consistent usage of assert/self.assert across our tests, so i can fix on that across our tests

lanking520 · 2024-07-16T04:58:56Z

engines/python/src/main/java/ai/djl/python/engine/RollingBatch.java

@@ -306,24 +304,6 @@ void addResponse(byte[] json, Map<String, String> properties) {
                output.getProperties().putAll(properties);
            }
            ++count;
-            if (json[0] == '{') {


This may potentially break some legacy custom handler. Maybe mark as deprecate and raise Deprecation warning for this code piece execution for now

i'm not sure if this condition is still valid, but i'll add it back with a deprecation warning.

We removed the huggingface.parse_input and replaced it with the input formatter @sindhuvahinis implemented.

I think this is only valid if user does json encoding on the outputs from python side that are received on java side

…g-batch 1. Change non-streaming output formatters to only generate output on final token (valid and error cases) 3. Update RollingBatch to set status code in non-streaming use-cases

…pport for non-200 HTTP codes for non-streaming rolling batch requests

#2466) Co-authored-by: Siddharth Venkatesan <siddhave@amazon.com>

siddvenk force-pushed the rb-non-streaming branch 4 times, most recently from 0ef29b9 to a05c246 Compare July 15, 2024 19:46

siddvenk marked this pull request as ready for review July 15, 2024 22:22

siddvenk requested review from zachgk, frankfliu and a team as code owners July 15, 2024 22:22

siddvenk force-pushed the rb-non-streaming branch from a05c246 to dd18031 Compare July 15, 2024 23:21

siddvenk marked this pull request as draft July 15, 2024 23:50

siddvenk force-pushed the rb-non-streaming branch 4 times, most recently from 4fdf2de to abb0cf8 Compare July 16, 2024 03:12

siddvenk marked this pull request as ready for review July 16, 2024 03:16

frankfliu approved these changes Jul 16, 2024

View reviewed changes

lanking520 reviewed Jul 16, 2024

View reviewed changes

[python] Support non 200 HTTP response codes for non-streaming rollin…

96c1ceb

…g-batch 1. Change non-streaming output formatters to only generate output on final token (valid and error cases) 3. Update RollingBatch to set status code in non-streaming use-cases

siddvenk force-pushed the rb-non-streaming branch from abb0cf8 to 96c1ceb Compare July 16, 2024 16:57

siddvenk merged commit a8ac267 into deepjavalibrary:master Jul 16, 2024
9 checks passed

siddvenk deleted the rb-non-streaming branch July 16, 2024 17:09

davidthomas426 added a commit to davidthomas426/djl-serving that referenced this pull request Oct 18, 2024

[serving][python] Backport for pull deepjavalibrary#2173 to v0.28, su…

c2fe7c7

…pport for non-200 HTTP codes for non-streaming rolling batch requests

davidthomas426 mentioned this pull request Oct 18, 2024

[serving][python] Backport for pull #2173 to v0.28, support for non-2… #2466

Merged

siddvenk added a commit that referenced this pull request Oct 19, 2024

[serving][python] Backport for pull #2173 to v0.28, support for non-2… (

975c50d

#2466) Co-authored-by: Siddharth Venkatesan <siddhave@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[serving][python] Support non 200 HTTP response codes for non-streami… #2173

[serving][python] Support non 200 HTTP response codes for non-streami… #2173

siddvenk commented Jul 13, 2024 •

edited

Loading

frankfliu Jul 16, 2024

lanking520 Jul 16, 2024

siddvenk Jul 16, 2024

frankfliu Jul 16, 2024

siddvenk Jul 16, 2024

lanking520 Jul 16, 2024

siddvenk Jul 16, 2024

lanking520 Jul 16, 2024

siddvenk Jul 16, 2024

[serving][python] Support non 200 HTTP response codes for non-streami… #2173

[serving][python] Support non 200 HTTP response codes for non-streami… #2173

Conversation

siddvenk commented Jul 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddvenk commented Jul 13, 2024 •

edited

Loading