Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Make it possible to attach a PayloadProcessor to process model predictions #84

Merged
merged 45 commits into from
Mar 13, 2023

Conversation

tteofili
Copy link
Contributor

@tteofili tteofili commented Feb 16, 2023

Motivation

This PR seeks to address the model-mesh side of kserve/modelmesh-serving#284.

Modifications

It provides a PayloadProcessor interface. PayloadProcessors are picked by ModelMesh instances at startup and predictions (Payloads) are processed asynchronously at fixed timing.
A first logger implementation allows to log Payloads (at info level).

Result

A SPI for post processing model predictions.


resolves kserve/modelmesh-serving#284

@tteofili
Copy link
Contributor Author

A README is provided.

@tteofili
Copy link
Contributor Author

@njhill @ckadner would you be able to have a preliminary look ?

@ckadner ckadner requested review from njhill and removed request for animeshsingh and tedhtchang February 16, 2023 17:26
Copy link
Member

@ckadner ckadner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow @tteofili -- this is an impressively quick turn around on a prototype! And your code looks well written and well structured!

I need to delegate the actual review of the logic to @njhill and I have not worked on any Java code in almost 10 years, so I only have a few superficial comments/questions for now (please forgive my ignorance if I missed a few steps :-)

I assume you are still working to add some test cases?

src/main/java/com/ibm/watson/modelmesh/ModelMesh.java Outdated Show resolved Hide resolved
src/main/java/com/ibm/watson/modelmesh/ModelMesh.java Outdated Show resolved Hide resolved
src/main/java/com/ibm/watson/modelmesh/ModelMesh.java Outdated Show resolved Hide resolved
src/main/java/com/ibm/watson/modelmesh/ModelMesh.java Outdated Show resolved Hide resolved
@ckadner ckadner requested a review from rafvasq February 17, 2023 03:17
@ckadner ckadner added the enhancement New feature or request label Feb 17, 2023
@ckadner ckadner changed the title Make it possible to attach a PayloadProcessor to process model predictions feat: Make it possible to attach a PayloadProcessor to process model predictions Feb 17, 2023
@ckadner
Copy link
Member

ckadner commented Feb 17, 2023

Re: failed DCO check:

Commit sha: 24aad57, Author: Tommaso Teofili, Committer: Tommaso Teofili; The sign-off is missing.
Commit sha: 2e48123, Author: Tommaso Teofili, Committer: Tommaso Teofili; The sign-off is missing.

How to fix the DCO check
There are 2 commits incorrectly signed off. This means that the author(s) of these commits failed to include a `Signed-off-by` line in their commit message.

To avoid having PRs blocked in the future, always include `Signed-off-by: Author Name <authoremail@example.com>` in every commit message. You can also do this automatically by using the `-s` flag (i.e., `git commit -s`).

Here is how to fix the problem so that this code can be merged.

Rebase the branch

If you have a local git environment and meet the criteria below, one option is to rebase the branch and add your Signed-off-by lines in the new commits. Please note that if others have already begun work based upon the commits in this branch, this solution will rewrite history and may cause serious issues for collaborators ([described in the git documentation](https://git-scm.com/book/en/v2/Git-Branching-Rebasing) under "The Perils of Rebasing").

You should only do this if:

    You are the only author of the commits in this branch
    You are absolutely certain nobody else is doing any work based upon this branch
    There are no empty commits in the branch (for example, a DCO Remediation Commit which was added using --allow-empty)

To add your `Signed-off-by` line to every commit in this branch:

    Ensure you have a local copy of your branch by [checking out the pull request locally via command line](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/checking-out-pull-requests-locally).
    In your local branch, run: git rebase HEAD~4 --signoff
    Force push your changes to overwrite the branch: git push --force-with-lease origin FAI-948

@tteofili
Copy link
Contributor Author

I think I've fixed the missing Signed-off by commits

Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tteofili thanks ever so much for this! The changes look like a great start.

I made a few comments inline but here are my main thoughts:

I don't think the current calls to process the payload in the invokeModel are the best places to do this. Better would be in the ModelMeshApi class.

Specifically somewhere around here.

It would probably be good to have two separate calls, one for the request and one for the response. Since at this point the request has already been released. So the place to log the request could be around here. And we need to think about the error case, if/how to log the response in that case.

Rather than opaque Objects in the Payload we can have concrete types i.e. for the request it can be ByteBuf for the payload and Metadata (this is a grpc class). For the response it can be the ModelResponse object that contains both of those and has its own release() method.

These should be enqueued if the payload logging is enabled and that should take ownership i.e. be responsible for releasing the buffers after they have been dequeued and processed. It's important that this is watertight i.e. if the enqueue fails they must be released as they are now.

I'd suggest we make the new sub package name payload rather than processor since it's a bit less generic.

I have more comments but hopefully this is enough for a first pass!

@njhill
Copy link
Member

njhill commented Feb 18, 2023

We should probably include both the modelId and vModelId with each payload (latter may be null).

…eq/resp separately

Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
@tteofili
Copy link
Contributor Author

this is still in a work in progress state, however I hope most of the comments got addressed.

Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
…ayload processor

Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
@njhill
Copy link
Member

njhill commented Mar 8, 2023

@tteofili I have pushed a commit with some suggested changes to this branch, perhaps you could cherry pick it to your PR branch if they look ok.

Other things which I didn't address and would be good if you could have a go at are:

  • I included a Status field in the Payload class which is the grpc response status for responses and null for requests, but I didn't look at everywhere that should be changed to incorporate that (e.g. the remote payload serialization, maybe other places)
  • The AsyncPayloadProcessor uses a scheduled executor service but this should really be shut down as part of the overall shutdown procedure. This is so that shutdown isn't blocked by its non-daemon threads and also so that all payloads get logged in the case of upgrades etc. So probably we should make PayloadProcessors Closeable and have the close method shutdown the threadpool for the async processor, so that it interrupts the consumer loop and cleanly processes remaining items in the queue.

@tteofili
Copy link
Contributor Author

tteofili commented Mar 9, 2023

thanks so much @njhill for your help.

@tteofili I have pushed a commit with some suggested changes to this branch, perhaps you could cherry pick it to your PR branch if they look ok.

I don't get the logic behind the ownership concept as it is implemented right now in ModelMeshApi: payloadProcessor is called with takeOwnership=true for requests and, as a consequence, data.retain() isn't called, whereas payloadProcessor is called with takeOwnership=false and data.retain() is called, what is the rationale behind this distinction between requests and responses?
However that works nicely, hence fine by me.

Also I think it's reasonable to assume that a PayloadProcessors can't actually consume the binaries (mayTakeOwnership method), eventually making them unavailable for other processors or ModelMeshApi class, hence I was initially a bit confused about the PayloadProcessor refactoring that connects taking ownership with the result of the process call.
However if you feel like this is a better design, that's fine by me.

Other things which I didn't address and would be good if you could have a go at are:

  • I included a Status field in the Payload class which is the grpc response status for responses and null for requests, but I didn't look at everywhere that should be changed to incorporate that (e.g. the remote payload serialization, maybe other places)

ok, I've included also in the remote processor.

  • The AsyncPayloadProcessor uses a scheduled executor service but this should really be shut down as part of the overall shutdown procedure. This is so that shutdown isn't blocked by its non-daemon threads and also so that all payloads get logged in the case of upgrades etc. So probably we should make PayloadProcessors Closeable and have the close method shutdown the threadpool for the async processor, so that it interrupts the consumer loop and cleanly processes remaining items in the queue.

sure, I'll add that part too.

tteofili added 5 commits March 9, 2023 11:54
Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
@tteofili
Copy link
Contributor Author

tteofili commented Mar 9, 2023

@njhill I've included your patch, with minor changes and implemented the some other points you raised.

Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tteofili thanks again for all your work and patience with this.

Only a few minor remaining comments which should hopefully be quick to address. I will look out for updates and get it merged quickly.

Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
@tteofili
Copy link
Contributor Author

@njhill I've included your most recent comments, please let me know if you want me to address anything else.

Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
Signed-off-by: Tommaso Teofili <tteofili@redhat.com>
@tteofili
Copy link
Contributor Author

@njhill I've also included the Base64 fix, I don't see anything else to be addressed right now; this should be good to be merged.

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tteofili thanks again for all of the great work!

There's one remaining small question about including error messages in the remote-published payloads in addition to the code, but no harm in handling that later.

@kserve-oss-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: njhill, tteofili

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@njhill
Copy link
Member

njhill commented Mar 13, 2023

/lgtm

@kserve-oss-bot kserve-oss-bot merged commit eb384db into kserve:main Mar 13, 2023
kserve-oss-bot pushed a commit to kserve/modelmesh-serving that referenced this pull request Mar 16, 2023
#339)

#### Motivation
In order to integrate with the upcoming [ModelMesh payload processing feature](kserve/modelmesh#84 (comment)), the MM_PAYLOAD_PROCESSOR environment variable within the ModelMesh image needs to point at the endpoint of the payload processing service within the cluster. By setting this as a variable set in the config.yaml, it allows for a variety of payload processing or logging services to be plugged in via the deployment manifest.

#### Modifications

- Added PayloadProcessor field to the `controllers/mmesh/modelmesh.go` Deployment struct
- Added PayloadProcessor field to the `pkg/config/config.go` Config struct
- Added default setting of this field to an empty string within `pkg/config/config.go` 
- Added functions in `controllers/suite_text.go` to load a config file that specifies the Payload processor
    - these tests ensure the default value of `MM_PAYLOAD_PROCESSOR` is empty, while it receives the correct value if such a field is present in the config yaml file 

#### Result
- Ability to set MM_PAYLOAD_PROCESSOR env variable via config.yaml
- If no such variable is set, the only effect is the ModelMesh image receives an empty env var

Signed-off-by: robgeada <rob@geada.net>
VedantMahabaleshwarkar pushed a commit to VedantMahabaleshwarkar/modelmesh-serving that referenced this pull request Mar 20, 2023
kserve#339)

#### Motivation
In order to integrate with the upcoming [ModelMesh payload processing feature](kserve/modelmesh#84 (comment)), the MM_PAYLOAD_PROCESSOR environment variable within the ModelMesh image needs to point at the endpoint of the payload processing service within the cluster. By setting this as a variable set in the config.yaml, it allows for a variety of payload processing or logging services to be plugged in via the deployment manifest.

#### Modifications

- Added PayloadProcessor field to the `controllers/mmesh/modelmesh.go` Deployment struct
- Added PayloadProcessor field to the `pkg/config/config.go` Config struct
- Added default setting of this field to an empty string within `pkg/config/config.go` 
- Added functions in `controllers/suite_text.go` to load a config file that specifies the Payload processor
    - these tests ensure the default value of `MM_PAYLOAD_PROCESSOR` is empty, while it receives the correct value if such a field is present in the config yaml file 

#### Result
- Ability to set MM_PAYLOAD_PROCESSOR env variable via config.yaml
- If no such variable is set, the only effect is the ModelMesh image receives an empty env var

Signed-off-by: robgeada <rob@geada.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved enhancement New feature or request lgtm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Payload logging/events
4 participants