Skip to content

Commit

Permalink
Publish 0.10 release blog (#227)
Browse files Browse the repository at this point in the history
* 0.10 release blog

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

* Update to 0.10 blog

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

* Address comments

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

* Add contributors

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

* Add open inference protocol adoptions

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

---------

Signed-off-by: Dan Sun <dsun20@bloomberg.net>
  • Loading branch information
yuzisun authored Feb 19, 2023
1 parent 050b7c9 commit 20ce74f
Show file tree
Hide file tree
Showing 5 changed files with 160 additions and 5 deletions.
154 changes: 154 additions & 0 deletions docs/blog/articles/2023-02-05-KServe-0.10-release.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# Announcing: KServe v0.10.0

We are excited to announce KServe 0.10 release. In this release we have enabled more KServe networking options,
improved KServe telemetry for supported serving runtimes and increased support coverage for [Open(aka v2) inference protocol](https://kserve.github.io/website/0.10/modelserving/data_plane/v2_protocol/) for both standard and ModelMesh InferenceService.

## KServe Networking Options

Istio is now optional for both [Serverless](https://kserve.github.io/website/0.10/admin/serverless/serverless/) and [RawDeployment](https://kserve.github.io/website/0.10/admin/kubernetes_deployment/) mode. Please see the [alternative networking guide](https://kserve.github.io/website/0.10/admin/serverless/kourier_networking/) for how you can enable other ingress options supported by Knative with Serverless mode.
For Istio users, if you want to turn on full service mesh mode to secure InferenceService with mutual TLS and enable the traffic policies, please read the [service mesh setup guideline](https://kserve.github.io/website/0.10/admin/serverless/servicemesh/).

## KServe Telemetry for Serving Runtimes

We have instrumented additional latency metrics in KServe Python ServingRuntimes for `preprocess`, `predict` and `postprocess` handlers.
In Serverless mode we have extended Knative `queue-proxy` to enable metrics aggregation for both metrics exposed in `queue-proxy` and `kserve-container` from each `ServingRuntime`.
Please read the [prometheus metrics setup guideline](https://kserve.github.io/website/0.10/modelserving/observability/prometheus_metrics/) for how to enable the metrics scraping and aggregations.

## Open(v2) Inference Protocol Support Coverage

As there have been increasing adoptions for `KServe v2 Inference Protocol` from [AMD Inference ServingRuntime](https://kserve.github.io/website/0.10/modelserving/v1beta1/amd/) which
supports FPGAs and OpenVINO which now provides KServe [REST](https://docs.openvino.ai/latest/ovms_docs_rest_api_kfs.html) and [gRPC](https://docs.openvino.ai/latest/ovms_docs_grpc_api_kfs.html) compatible API,
in [the issue](https://github.com/kserve/kserve/issues/2663) we have proposed to rename to `KServe Open Inference Protocol`.

In KServe 0.10, we have added Open(v2) inference protocol support for KServe custom runtimes.
Now, you can enable v2 REST/gRPC for both custom transformer and predictor with images built by implementing KServe Python SDK API.
gRPC enables high performance inference data plane as it is built on top of HTTP/2 and binary data transportation which is more efficient to send over the wire compared to REST.
Please see the detailed example for [transformer](https://kserve.github.io/website/0.10/modelserving/v1beta1/transformer/torchserve_image_transformer/) and
[predictor](https://kserve.github.io/website/0.10/modelserving/v1beta1/custom/custom_model/).

```python
from kserve import Model

def image_transform(byte_array):
image_processing = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])
image = Image.open(io.BytesIO(byte_array))
tensor = image_processing(image).numpy()
return tensor

class CustomModel(Model):
def predict(self, request: InferRequest, headers: Dict[str, str]) -> InferResponse:
input_tensors = [image_transform(instance) for instance in request.inputs[0].data]
input_tensors = np.asarray(input_tensors)
output = self.model(input_tensors)
torch.nn.functional.softmax(output, dim=1)
values, top_5 = torch.topk(output, 5)
result = values.flatten().tolist()
response_id = generate_uuid()
infer_output = InferOutput(name="output-0", shape=list(values.shape), datatype="FP32", data=result)
infer_response = InferResponse(model_name=self.name, infer_outputs=[infer_output], response_id=response_id)
return infer_response

class CustomTransformer(Model):
def preprocess(self, request: InferRequest, headers: Dict[str, str]) -> InferRequest:
input_tensors = [image_transform(instance) for instance in request.inputs[0].data]
input_tensors = np.asarray(input_tensors)
infer_inputs = [InferInput(name="INPUT__0", datatype='FP32', shape=list(input_tensors.shape),
data=input_tensors)]
infer_request = InferRequest(model_name=self.model_name, infer_inputs=infer_inputs)
return infer_request
```

You can use the same Python API type `InferRequest` and `InferResponse` for both REST and gRPC protocol. KServe handles the underlying decoding and encoding according to the protocol.

!!! Warning
A new `headers` argument is added to the custom handlers to pass http/gRPC headers or other metadata. You can also use this as context dict to pass data between handlers.
If you have existing custom transformer or predictor, the `headers` argument is now required to add to the `preprocess`, `predict` and `postprocess` handlers.


Please check the following matrix for supported ModelFormats and [ServingRuntimes](https://kserve.github.io/website/0.10/modelserving/v1beta1/serving_runtime/).

| Model Format | v1 | Open(v2) REST/gRPC |
| ------------------- |--------------| ----------------|
| Tensorflow | ✅ TFServing | ✅ Triton |
| PyTorch | ✅ TorchServe | ✅ TorchServe |
| TorchScript | ✅ TorchServe | ✅ Triton |
| ONNX || ✅ Triton |
| Scikit-learn | ✅ KServe | ✅ MLServer |
| XGBoost | ✅ KServe | ✅ MLServer |
| LightGBM | ✅ KServe | ✅ MLServer |
| MLFlow || ✅ MLServer |
| Custom | ✅ KServe | ✅ KServe |


## Multi-Arch Image Support

KServe control plane images [kserve-controller](https://hub.docker.com/r/kserve/kserve-controller/tags),
[kserve/agent](https://hub.docker.com/r/kserve/agent/tags), [kserve/router](https://hub.docker.com/r/kserve/router/tags) are now supported
for multiple architectures: `ppc64le`, `arm64`, `amd64`, `s390x`.

## KServe Storage Credentials Support

- Currently, AWS users need to create a secret with long term/static IAM credentials for downloading models stored in S3.
Security best practice is to use [IAM role for service account(IRSA)](https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/)
which enables automatic credential rotation and fine-grained access control, see how to [setup IRSA](https://kserve.github.io/website/0.10/modelserving/storage/s3/s3/#create-service-account-with-iam-role).
- Support Azure Blobs with [managed identity](https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/how-manage-user-assigned-managed-identities?pivots=identity-mi-methods-azcli).

## ModelMesh updates
ModelMesh has continued to integrate itself as KServe's multi-model serving backend, introducing improvements and features that better align the two projects. For example, it now supports ClusterServingRuntimes, allowing use of cluster-scoped ServingRuntimes, originally introduced in KServe 0.8.

Additionally, ModelMesh introduced support for TorchServe enabling users to serve arbitrary PyTorch models (e.g. eager-mode) in the context of distributed-multi-model serving.

Other limitations have been addressed as well, such as adding support for BYTES/string type tensors when using the REST inference API for inference requests that require them.


## Other Changes:

For a complete change list please read the release notes from [KServe v0.10](https://github.com/kserve/kserve/releases/tag/v0.10.0) and
[ModelMesh v0.10](https://github.com/kserve/modelmesh-serving/releases/tag/v0.10.0).

## Join the community

- Visit our [Website](https://kserve.github.io/website/) or [GitHub](https://github.com/kserve)
- Join the Slack ([#kserve](https://kubeflow.slack.com/?redir=%2Farchives%2FCH6E58LNP))
- Attend our community meeting by subscribing to the [KServe calendar](https://wiki.lfaidata.foundation/display/kserve/calendars).
- View our [community github repository](https://github.com/kserve/community) to learn how to make contributions. We are excited to work with you to make KServe better and promote its adoption!


Thanks for all the contributors who have made the commits to 0.10 release!

- [Steve Larkin](https://github.com/sel)
- [Stephan Schielke](https://github.com/stephanschielke)
- [Curtis Maddalozzo](https://github.com/cmaddalozzo)
- [Zhongcheng Lao](https://github.com/laozc)
- [Dimitris Aragiorgis](https://github.com/dimara)
- [Pan Li](https://github.com/panli889)
- [tjandy98](https://github.com/tjandy98)
- [Sukumar Gaonkar](https://github.com/sukumargaonkar)
- [Rachit Chauhan](https://github.com/rachitchauhan43)
- [Rafael Vasquez](https://github.com/rafvasq)
- [Tim Kleinloog](https://github.com/TimKleinloog)
- [Christian Kadner](https://github.com/ckadner)
- [ddelange](https://github.com/ddelange)
- [Lize Cai](https://github.com/lizzzcai)
- [sangjune.park](https://github.com/park12sj)
- [Suresh Nakkeran](https://github.com/Suresh-Nakkeran)
- [Konstantinos Messis](https://github.com/MessKon)
- [Matt Rose](https://github.com/matty-rose)
- [Alexa Griffith](https://github.com/alexagriffith)
- [Jagadeesh J](https://github.com/jagadeeshi2i)
- [Alex Lembiyeuski](https://github.com/alembiewski)
- [Yuki Iwai](https://github.com/tenzen-y)
- [Andrews Arokiam](https://github.com/andyi2it)
- [Xin Fu](https://github.com/xfu83)
- [adilhusain-s](https://github.com/adilhusain-s)
- [Pranav Pandit](https://github.com/pranavpandit1)
- [C1berwiz](https://github.com/C1berwiz)
- [dilverse](https://github.com/dilverse)
- [Yuan Tang](https://github.com/terrytangyuan)
- [Dan Sun](https://github.com/yuzisun)
- [Nick Hill](https://github.com/njhill)

The KServe Working Group
2 changes: 1 addition & 1 deletion docs/modelserving/servingruntimes.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Several out-of-the-box _ClusterServingRuntimes_ are provided with KServe so that
| kserve-mlserver | SKLearn, XGBoost, LightGBM, MLflow |
| kserve-paddleserver | Paddle |
| kserve-pmmlserver | PMML |
| kserver-sklearnserver | SKLearn |
| kserve-sklearnserver | SKLearn |
| kserve-tensorflow-serving | TensorFlow |
| kserve-torchserve | PyTorch |
| kserve-tritonserver | TensorFlow, ONNX, PyTorch, TensorRT |
Expand Down
3 changes: 2 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ nav:
- V1 Inference Protocol: modelserving/data_plane/v1_protocol.md
- Open Inference Protocol (V2 Inference Protocol): modelserving/data_plane/v2_protocol.md
- Serving Runtimes: modelserving/servingruntimes.md
- Single Model Serving:
- Model Serving Runtimes:
- Supported Model Frameworks/Formats:
- Overview: modelserving/v1beta1/serving_runtime.md
- Tensorflow: modelserving/v1beta1/tensorflow/README.md
Expand Down Expand Up @@ -91,6 +91,7 @@ nav:
- Debugging guide: developer/debug.md
- Blog:
- Releases:
- KServe 0.10 Release: blog/articles/2023-02-05-KServe-0.10-release.md
- KServe 0.9 Release: blog/articles/2022-07-21-KServe-0.9-release.md
- KServe 0.8 Release: blog/articles/2022-02-18-KServe-0.8-release.md
- KServe 0.7 Release: blog/articles/2021-10-11-KServe-0.7-release.md
Expand Down
4 changes: 2 additions & 2 deletions overrides/home.html
Original file line number Diff line number Diff line change
Expand Up @@ -84,12 +84,12 @@ <h2>KServe Components</h2>
<div class="org-card--body">
<div class="org-card--body-heading">
<h5 class="org-card--heading" id="org-card--heading">
<a href="./modelserving/v1beta1/serving_runtime">Single Model Serving</a>
<a href="./modelserving/v1beta1/serving_runtime">Model Serving</a>
</h5>
</div>
<div class="org-card--body-content">
<div class="org-card--body-content-wrapper">
Provides Serverless deployment of single model inference on CPU/GPU for common ML frameworks
Provides Serverless deployment for model inference on CPU/GPU with common ML frameworks
<a href="https://scikit-learn.org/">Scikit-Learn</a>, <a href="https://xgboost.readthedocs.io/">XGBoost</a>, <a href="https://www.tensorflow.org/">Tensorflow</a>, <a href="https://pytorch.org/">PyTorch</a> as well as pluggable custom model runtime.
</div>
</div>
Expand Down
2 changes: 1 addition & 1 deletion overrides/main.html
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@

{% block announce %}
<h1>
<b>KServe v0.9 is Released</b>, <a href="/website/0.9/blog/articles/2022-07-21-KServe-0.9-release/">Read blog &gt;&gt;</a>
<b>KServe v0.10 is Released</b>, <a href="/website/0.10/blog/articles/2023-02-05-KServe-0.10-release/">Read blog &gt;&gt;</a>
</h1>
{% endblock %}

0 comments on commit 20ce74f

Please sign in to comment.