-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a new metric: Client Processing Time #450
Conversation
Signed-off-by: saimedhi <saimedhi@amazon.com>
Signed-off-by: saimedhi <saimedhi@amazon.com>
Signed-off-by: saimedhi <saimedhi@amazon.com>
I've noticed inaccuracies in the calculation of 'Service Time' in Opensearch Benchmarks. I'll be raising an issue to address this concern. The accuracy of the Client Processing Time metric relies on the accuracy of 'Service Time' used in its computation. |
def time_func(func): | ||
async def advised(*args, **kwargs): | ||
request_context_holder.on_client_request_start() | ||
rsl = await func(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does rsl
represent? Is there a more descriptive name we could use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IanHoang, changed it to response
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IanHoang, kindly inform me of any additional corrections needed. I'm ready to address them promptly. Let's move forward with merging. Thank you.
Signed-off-by: saimedhi <saimedhi@amazon.com>
Signed-off-by: saimedhi <saimedhi@amazon.com>
Signed-off-by: saimedhi <saimedhi@amazon.com>
@saimedhi Overall, looks good. Did you run any tests or tests in --test-mode? If so, could you supply an example metric document from benchmark-metrics-* of external datastore? |
|
benchmark-results-2024-02
|
@@ -1212,7 +1238,9 @@ def status(v): | |||
# we're good with any count of relocating shards. | |||
expected_relocating_shards = sys.maxsize | |||
|
|||
request_context_holder.on_client_request_start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IanHoang @saimedhi We also have custom runners in workloads eg: https://github.com/opensearch-project/opensearch-benchmark-workloads/blob/9c0fdca67b0a929a7f28649c911737898609cb50/nyc_taxis/workload.py#L3
Shouldn't we update there as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not completely sure, but I tested running benchmarks against the 'nyc_taxis' workload and everything seems to be working fine.
@saimedhi, @IanHoang there is an issue with this change. There is no error checking during the calls to the client, for instance, in the case below:
So, if there is an exception thrown in that call, the
This condition was suspected in a related comment. It might be best to revert this change until the fix goes in. |
@govind, Instead of reverting we can use wrapper for all the runners. Wrapper is already in the code. Just we need to add @time_func to runners removing time_measurement within the runners. def time_func(func): |
@IanHoang, @gkamat Please try to make this change. If not I will work on Tuesday. I am on-call this weekend. |
@saimedhi essentially that wrapper will need to go around the Furthermore, even if this were done, it does not take into account the exception handling. The call may fail withibn the client, leading to the All-in-all, the best and cleanest option may be to update each of the calls with the appropriate exception handling, either with a Since you are busy right now, the most expeditious option for now is probably reverting. The change can be checked-in later with additional testing. Thanks. cc: @IanHoang |
@gkamat, I will again try to raise PR soon. But just one confirmation, when there is error during the calls to the client both client request start and client request end should be calculated right. |
Yes, client_request_start will already have been computed before the call. All possible exceptions should be caught and then client_request_end should be set subsequently. Else, there will be an error upstream due to the unset variable. |
@gkamat please take a look at PR462. I hope it addresses the above issue. Thank you. |
@saimedhi If i understood correctly your PR will add client request end if any runner raises exception. However, for force merge, there won't be any exception if polling is used , see here . Did you test whether your fix will work for case where force merge time out at first call and complete is true on subsequent call ? |
Hi @VijayanB, thank you for checking out the PR! I'm not quite sure how to replicate that scenario, but your point makes sense. Maybe we should calculate
|
Looks good to me. Thanks. Vector search using high dimensions with large number of vectors will take longer time to merge segments. I was able to reproduce for 768D, 10 M vectors. Thanks. |
Description
Introduced a new metric: Client Processing Time.
Client Processing Time: The delta between total request time and service time.
Total Request Time: Defined as the duration between the runner sending a request to the client OpenSearch-py and receiving the response.
Service Time: Represents the interval from the server receiving the request to the server sending the response. Note: There was a discrepancy in the documentation regarding Service Time, and I have clarified my observations in the associated PR.
Issues Resolved
#432