Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] add deployed native models to inference_stats in trained model stats response #88187

Merged

Conversation

benwtrent
Copy link
Member

This adds a valid inference_stats section for deployed native models.

inference_stats is effectively a sub-set of the deployment_stats. It's a high level view of the overall stats of the model, deployment_stats contains more detailed information around types of errors seen, throughput, etc.

@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Jun 29, 2022
@elasticsearchmachine
Copy link
Collaborator

Hi @benwtrent, I've created a changelog YAML for you.

public InferenceStats getOverallInferenceStats() {
return new InferenceStats(
0L,
nodeStats.stream().filter(n -> n.getInferenceCount().isPresent()).mapToLong(n -> n.getInferenceCount().get()).sum(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inference_count is the only duplicate field between these two stats objects.

But, it seems weird to remove it from the assignment stats. Having just a single duplicate field doesn't seem that big of a deal to me.

@dimitris-athanasiou @davidkyle what say you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually isn't a duplicate as it is the sum of inferences across all nodes. I think that's useful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a duplicate but I don't think that's a problem. AssignmentStats collects various counts across all nodes

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

…nwtrent/elasticsearch into feature/ml-fix-pytorch-stats-response
@benwtrent
Copy link
Member Author

@elasticmachine update branch

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dimitris-athanasiou
Copy link
Contributor

Btw, I think a unit test that checks we sum the node inference counts and failures might be useful to add.

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

This fixes the discrepancy where DFA models have inference stats but NLP models do not?

@benwtrent
Copy link
Member Author

This fixes the discrepancy where DFA models have inference stats but NLP models do not?

correct @davidkyle

@benwtrent benwtrent merged commit e5d1d10 into elastic:master Jul 5, 2022
@benwtrent benwtrent deleted the feature/ml-fix-pytorch-stats-response branch July 5, 2022 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :ml Machine learning Team:ML Meta label for the ML team v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants