-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publish recall as a kpi metric #581
Publish recall as a kpi metric #581
Conversation
Signed-off-by: Finn Roblin <finnrobl@amazon.com>
osbenchmark/metrics.py
Outdated
@@ -13,7 +13,7 @@ | |||
# not use this file except in compliance with the License. | |||
# You may obtain a copy of the License at | |||
# | |||
# http://www.apache.org/licenses/LICENSE-2.0 | |||
# http://www.apache.org/licenses/LICENSE-2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why my code formatter keeps on changing code outside of the selected lines. I'll revert this and the other spacing changes to out-of-scope code in the next revision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend reverting for this PR, just so its more clear what functional changes are being made.
osbenchmark/metrics.py
Outdated
"operation": operation, | ||
"recall@k": recall_at_k_stats, | ||
"recall@1":recall_at_1_stats, | ||
"recall_time_ms": recall_time_ms_stats, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spoke with @VijayanB and recall_time_ms
is not a stat we want to display. I'll remove it from the kpi_metrics
method in the next revision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove from this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry not sure what "this" is referring to. Do you mean remove the add_kpi_metrics
function (and instead append directly to the kpi_metrics
list), remove the recall_time_ms
metric, or remove kpi_metrics
all together and go a different direction for passing recall from metadata to the metrics store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry meant the recall_time_ms - not sure if this is needed
name=recall_metric_name, | ||
value=sample.request_meta_data[recall_metric_name], | ||
unit="", | ||
task=sample.task.name, # todo change unit to segment count unit... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove todo comment in next revision
tests/metrics_test.py
Outdated
@@ -1436,97 +1436,105 @@ def test_store_results(self): | |||
"benchmark-version": "0.4.4", | |||
"benchmark-revision": "123abc", | |||
"environment": "unittest", | |||
"test-execution-id": OsResultsStoreTests.TEST_EXECUTION_ID, | |||
"test-execution-id": "6ebc6e53-ee20-4b0c-99b4-09697987e9f4", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why modify so much in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the ordering of the json got scrambled when I added the kpi_metrics variable. I agree though that I shouldn't have changed so much; I'll revert most of the unit test and just add a single extra element in the json array with the kpi_metrics and associated data.
osbenchmark/metrics.py
Outdated
@@ -13,7 +13,7 @@ | |||
# not use this file except in compliance with the License. | |||
# You may obtain a copy of the License at | |||
# | |||
# http://www.apache.org/licenses/LICENSE-2.0 | |||
# http://www.apache.org/licenses/LICENSE-2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend reverting for this PR, just so its more clear what functional changes are being made.
osbenchmark/metrics.py
Outdated
@@ -1966,6 +1986,7 @@ def single_latency(self, task, operation_type, metric_name="latency"): | |||
class GlobalStats: | |||
def __init__(self, d=None): | |||
self.op_metrics = self.v(d, "op_metrics", default=[]) | |||
self.kpi_metrics = self.v(d, "kpi_metrics", default=[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the scope for kpi metrics? Aside from recall and maybe ndcg, what else would go in here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think any performance metric for k-NN is in scope for this. Right now it seems like recall is the most major one but if k-NN adds ndcg we could also track that in OSB. (The process for adding new metrics would be to calculate/add it to the metadata in the search query runner and then pull it from the metadata into this kpi_metrics
list). Likewise if we or the anomaly detection plugin want to track false positives rate/specificity.
I mainly wanted to explicitly distinguish between algorithm metrics like recall and op_metrics
like latency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. KPI though might include latency right? It seems like a good category here would be correctness_metrics
or relevance_metrics
osbenchmark/metrics.py
Outdated
"operation": operation, | ||
"recall@k": recall_at_k_stats, | ||
"recall@1":recall_at_1_stats, | ||
"recall_time_ms": recall_time_ms_stats, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove from this PR?
Signed-off-by: Finn Roblin <finnrobl@amazon.com>
…bug) Signed-off-by: Finn Roblin <finnrobl@amazon.com>
Signed-off-by: Finn Roblin <finnrobl@amazon.com>
osbenchmark/metrics.py
Outdated
@@ -13,7 +13,7 @@ | |||
# not use this file except in compliance with the License. | |||
# You may obtain a copy of the License at | |||
# | |||
# http://www.apache.org/licenses/LICENSE-2.0 | |||
# http://www.apache.org/licenses/LICENSE-2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove unrelated formatting changes from this PR?
osbenchmark/results_publisher.py
Outdated
|
||
keys = record.keys() | ||
recall_keys_in_task_dict = "recall@1" in keys and "recall@k" in keys | ||
if recall_keys_in_task_dict and record["recall@1"] and record["recall@k"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't check for record['recall@k"], if recall is 0, it will be skipped. just check only for keys
osbenchmark/results_publisher.py
Outdated
except KeyError: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to catch this? can you return from like 214 if keys doesn't exist?
@@ -1369,7 +1413,7 @@ def calculate_task_throughput(self, task, current_samples, bucket_interval_secs) | |||
self.task_stats[task] = ThroughputCalculator.TaskStats(bucket_interval=bucket_interval_secs, | |||
sample_type=first_sample.sample_type, | |||
start_time=first_sample.absolute_time - first_sample.time_period) | |||
current = self.task_stats[task] | |||
current = self.task_stats[task] # TaskStats object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this comment here and as part of this PR?
@@ -1190,6 +1233,7 @@ def __init__(self, start_timestamp, buffer_size=16384): | |||
def add(self, task, client_id, sample_type, meta_data, absolute_time, request_start, latency, service_time, | |||
client_processing_time, processing_time, throughput, ops, ops_unit, time_period, percent_completed, | |||
dependent_timing=None): | |||
self.logger.debug("Logging with metadata: [%s]", meta_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this?
Signed-off-by: Finn Roblin <finnrobl@amazon.com>
Signed-off-by: Finn Roblin <finnrobl@amazon.com>
Signed-off-by: Finn Roblin <finnrobl@amazon.com>
sample.request_meta_data, | ||
) | ||
|
||
self.logger.debug( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this to log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this -- It's debug level so it shouldn't show up with normal logging settings. I was using it to figure out how samples worked, so this log has served its purpose and I'll remove it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@jmazanec15 Can you take a look at this PR? Thanks |
Signed-off-by: Finn Roblin <finnrobl@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome! great job @finnroblin !
Description
Displays recall as part of the final benchmarking stats.
i.e.:
Issues Resolved
(From workloads repo): 282
While this PR is not a general solution to #199, it does address the specific need mentioned in that issue of displaying the recall metadata as a summary.
I have added a
kpi_metrics
field to theGlobalStats
metric collector where additional metrics could be added. Users would then need to add display logic to theresults_publisher.py
file in order for the metrics to show up in the summary.Testing
I added some unit tests to assert that recall metrics were collected. I also altered the
OsResultsStoreTest
to account for the newkpi_metrics
field.I manually tested the
train-test
workload, which contains a vector search operation, and theno-train-test-index-only
workload, which does not contain a vector search operation. Both workloads ran to completion and displayed the expected statistics (recall fortrain-test
, and no recall metrics forno-train-test-index-only
).By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.