-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[serving][python] Support non 200 HTTP response codes for non-streami… #2173
Conversation
0ef29b9
to
a05c246
Compare
a05c246
to
dd18031
Compare
4fdf2de
to
abb0cf8
Compare
@@ -98,7 +98,7 @@ def rolling_batch_inference(parsed_input, inputs: Input, outputs: Output, | |||
outputs.add(Output.binary_encode(err), key="data", batch_index=i) | |||
outputs.add_property(f"batch_{i}_Content-Type", "application/json") | |||
else: | |||
content_type = result[idx].pop("content_type") | |||
content_type = result[idx].get("content_type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious if .pop()
causing any issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to use get here. Maybe in the future someone would access the content in the later time. Python GC would collect it eventually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pop
does not cause issues currently, but it's incorrectly used here with the subsequent check to see if content-type is None.
Either we use pop and get rid of that check, or use get here.
@@ -628,7 +607,7 @@ def test_chat_json(self): | |||
}, | |||
'finish_reason': 'length' | |||
}] | |||
assert final_json['usage'] == { | |||
assert output['usage'] == { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A side topic, noticed we use assert
in production code quite a bit. I'm not a fan of assert
, we should rethink about usage of assert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, this is the test code but we should not be using assert in the production code.
I think keeping assert in the test code is fine - separately i'm exploring a migration to pytest instead of unittest to get some better test outputs and codecoverage. pytest only supports assert
req.set_next_token(Token(4558, " world", -0.567854), True, | ||
'length') | ||
print(req.get_next_token(), end='') | ||
assert req.get_next_token() == ' world"}' | ||
req.reset_next_token() | ||
assert req.get_next_token() == json.dumps( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.assertEquals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me take this up as a separate change. We don't have consistent usage of assert/self.assert across our tests, so i can fix on that across our tests
@@ -306,24 +304,6 @@ void addResponse(byte[] json, Map<String, String> properties) { | |||
output.getProperties().putAll(properties); | |||
} | |||
++count; | |||
if (json[0] == '{') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may potentially break some legacy custom handler. Maybe mark as deprecate and raise Deprecation warning for this code piece execution for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure if this condition is still valid, but i'll add it back with a deprecation warning.
We removed the huggingface.parse_input and replaced it with the input formatter @sindhuvahinis implemented.
I think this is only valid if user does json encoding on the outputs from python side that are received on java side
…g-batch 1. Change non-streaming output formatters to only generate output on final token (valid and error cases) 3. Update RollingBatch to set status code in non-streaming use-cases
abb0cf8
to
96c1ceb
Compare
…pport for non-200 HTTP codes for non-streaming rolling batch requests
#2466) Co-authored-by: Siddharth Venkatesan <siddhave@amazon.com>
…ng rolling-batch
Note:
TODO:
This change is motivated by two things:
The output formatters for non-streaming use-cases are currently pretty hard to reason about. Beyond just making it easier to read and work with, this also fixes an issue:
The second part of this change is to provide non-200 status codes during non-streaming use-cases when an error occurs during inference. This is only possible with the first change to the output formatters. There are two small changes in InferenceRequestHandler and RollingBatch to enable this.