Publish 0.10 release blog (#227)

* 0.10 release blog Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Update to 0.10 blog Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Address comments Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Add contributors Signed-off-by: Dan Sun <dsun20@bloomberg.net> * Add open inference protocol adoptions Signed-off-by: Dan Sun <dsun20@bloomberg.net> --------- Signed-off-by: Dan Sun <dsun20@bloomberg.net>
kserve · Feb 19, 2023 · 20ce74f · 20ce74f
1 parent 050b7c9
commit 20ce74f
Show file tree

Hide file tree

Showing 5 changed files with 160 additions and 5 deletions.
diff --git a/docs/blog/articles/2023-02-05-KServe-0.10-release.md b/docs/blog/articles/2023-02-05-KServe-0.10-release.md
@@ -0,0 +1,154 @@
+# Announcing: KServe v0.10.0
+
+We are excited to announce KServe 0.10 release. In this release we have enabled more KServe networking options,
+improved KServe telemetry for supported serving runtimes and increased support coverage for [Open(aka v2) inference protocol](https://kserve.github.io/website/0.10/modelserving/data_plane/v2_protocol/) for both standard and ModelMesh InferenceService.
+
+## KServe Networking Options
+
+Istio is now optional for both [Serverless](https://kserve.github.io/website/0.10/admin/serverless/serverless/) and [RawDeployment](https://kserve.github.io/website/0.10/admin/kubernetes_deployment/) mode. Please see the [alternative networking guide](https://kserve.github.io/website/0.10/admin/serverless/kourier_networking/) for how you can enable other ingress options supported by Knative with Serverless mode.
+For Istio users, if you want to turn on full service mesh mode to secure InferenceService with mutual TLS and enable the traffic policies, please read the [service mesh setup guideline](https://kserve.github.io/website/0.10/admin/serverless/servicemesh/).
+
+## KServe Telemetry for Serving Runtimes
+
+We have instrumented additional latency metrics in KServe Python ServingRuntimes for `preprocess`, `predict` and `postprocess` handlers.
+In Serverless mode we have extended Knative `queue-proxy` to enable metrics aggregation for both metrics exposed in `queue-proxy` and `kserve-container` from each `ServingRuntime`.
+Please read the [prometheus metrics setup guideline](https://kserve.github.io/website/0.10/modelserving/observability/prometheus_metrics/) for how to enable the metrics scraping and aggregations.
+
+## Open(v2) Inference Protocol Support Coverage
+
+As there have been increasing adoptions for `KServe v2 Inference Protocol` from [AMD Inference ServingRuntime](https://kserve.github.io/website/0.10/modelserving/v1beta1/amd/) which
+supports FPGAs and OpenVINO which now provides KServe [REST](https://docs.openvino.ai/latest/ovms_docs_rest_api_kfs.html) and [gRPC](https://docs.openvino.ai/latest/ovms_docs_grpc_api_kfs.html) compatible API,
+in [the issue](https://github.com/kserve/kserve/issues/2663) we have proposed to rename to `KServe Open Inference Protocol`.
+
+In KServe 0.10, we have added Open(v2) inference protocol support for KServe custom runtimes.
+Now, you can enable v2 REST/gRPC for both custom transformer and predictor with images built by implementing KServe Python SDK API.
+gRPC enables high performance inference data plane as it is built on top of HTTP/2 and binary data transportation which is more efficient to send over the wire compared to REST.
+Please see the detailed example for [transformer](https://kserve.github.io/website/0.10/modelserving/v1beta1/transformer/torchserve_image_transformer/) and 
+[predictor](https://kserve.github.io/website/0.10/modelserving/v1beta1/custom/custom_model/).
+
+```python
+from kserve import Model
+
+def image_transform(byte_array):
+    image_processing = transforms.Compose([
+        transforms.ToTensor(),
+        transforms.Normalize((0.1307,), (0.3081,))
+    ])
+    image = Image.open(io.BytesIO(byte_array))
+    tensor = image_processing(image).numpy()
+    return tensor
+
+class CustomModel(Model):
+    def predict(self, request: InferRequest, headers: Dict[str, str]) -> InferResponse:
+        input_tensors = [image_transform(instance) for instance in request.inputs[0].data]
+        input_tensors = np.asarray(input_tensors)
+        output = self.model(input_tensors)
+        torch.nn.functional.softmax(output, dim=1)
+        values, top_5 = torch.topk(output, 5)
+        result = values.flatten().tolist()
+        response_id = generate_uuid()
+        infer_output = InferOutput(name="output-0", shape=list(values.shape), datatype="FP32", data=result)
+        infer_response = InferResponse(model_name=self.name, infer_outputs=[infer_output], response_id=response_id)
+        return infer_response
+
+class CustomTransformer(Model):
+    def preprocess(self, request: InferRequest, headers: Dict[str, str]) -> InferRequest:
+        input_tensors = [image_transform(instance) for instance in request.inputs[0].data]
+        input_tensors = np.asarray(input_tensors)
+        infer_inputs = [InferInput(name="INPUT__0", datatype='FP32', shape=list(input_tensors.shape),
+                                   data=input_tensors)]
+        infer_request = InferRequest(model_name=self.model_name, infer_inputs=infer_inputs)
+        return infer_request
+```
+
+You can use the same Python API type `InferRequest` and `InferResponse` for both REST and gRPC protocol. KServe handles the underlying decoding and encoding according to the protocol.
+
+!!! Warning
+    A new `headers` argument is added to the custom handlers to pass http/gRPC headers or other metadata. You can also use this as context dict to pass data between handlers.
+    If you have existing custom transformer or predictor, the `headers` argument is now required to add to the `preprocess`, `predict` and `postprocess` handlers.
+
+
+Please check the following matrix for supported ModelFormats and [ServingRuntimes](https://kserve.github.io/website/0.10/modelserving/v1beta1/serving_runtime/).
+
+| Model Format        | v1           | Open(v2) REST/gRPC | 
+| ------------------- |--------------| ----------------|
+| Tensorflow          | ✅ TFServing    | ✅ Triton |
+| PyTorch             | ✅ TorchServe   | ✅ TorchServe |
+| TorchScript         | ✅ TorchServe   | ✅ Triton |
+| ONNX                | ❌              | ✅ Triton |
+| Scikit-learn        | ✅ KServe       | ✅ MLServer |
+| XGBoost             | ✅ KServe       | ✅ MLServer |
+| LightGBM            | ✅ KServe       | ✅ MLServer |
+| MLFlow              | ❌              | ✅ MLServer |
+| Custom              | ✅ KServe       | ✅ KServe |
+
+
+## Multi-Arch Image Support
+
+KServe control plane images [kserve-controller](https://hub.docker.com/r/kserve/kserve-controller/tags),
+[kserve/agent](https://hub.docker.com/r/kserve/agent/tags), [kserve/router](https://hub.docker.com/r/kserve/router/tags) are now supported 
+for multiple architectures: `ppc64le`, `arm64`, `amd64`, `s390x`.
+
+## KServe Storage Credentials Support
+
+- Currently, AWS users need to create a secret with long term/static IAM credentials for downloading models stored in S3.
+  Security best practice is to use [IAM role for service account(IRSA)](https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/) 
+  which enables automatic credential rotation and fine-grained access control, see how to [setup IRSA](https://kserve.github.io/website/0.10/modelserving/storage/s3/s3/#create-service-account-with-iam-role).
+- Support Azure Blobs with [managed identity](https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/how-manage-user-assigned-managed-identities?pivots=identity-mi-methods-azcli).
+
+## ModelMesh updates
+ModelMesh has continued to integrate itself as KServe's multi-model serving backend, introducing improvements and features that better align the two projects. For example, it now supports ClusterServingRuntimes, allowing use of cluster-scoped ServingRuntimes, originally introduced in KServe 0.8.
+
+Additionally, ModelMesh introduced support for TorchServe enabling users to serve arbitrary PyTorch models (e.g. eager-mode) in the context of distributed-multi-model serving.
+
+Other limitations have been addressed as well, such as adding support for BYTES/string type tensors when using the REST inference API for inference requests that require them.
+
+
+## Other Changes:
+
+For a complete change list please read the release notes from [KServe v0.10](https://github.com/kserve/kserve/releases/tag/v0.10.0) and
+[ModelMesh v0.10](https://github.com/kserve/modelmesh-serving/releases/tag/v0.10.0).
+
+## Join the community
+
+- Visit our [Website](https://kserve.github.io/website/) or [GitHub](https://github.com/kserve)
+- Join the Slack ([#kserve](https://kubeflow.slack.com/?redir=%2Farchives%2FCH6E58LNP))
+- Attend our community meeting by subscribing to the [KServe calendar](https://wiki.lfaidata.foundation/display/kserve/calendars).
+- View our [community github repository](https://github.com/kserve/community) to learn how to make contributions. We are excited to work with you to make KServe better and promote its adoption!
+
+
+Thanks for all the contributors who have made the commits to 0.10 release!
+
+- [Steve Larkin](https://github.com/sel)
+- [Stephan Schielke](https://github.com/stephanschielke)
+- [Curtis Maddalozzo](https://github.com/cmaddalozzo)
+- [Zhongcheng Lao](https://github.com/laozc)
+- [Dimitris Aragiorgis](https://github.com/dimara)
+- [Pan Li](https://github.com/panli889)
+- [tjandy98](https://github.com/tjandy98)
+- [Sukumar Gaonkar](https://github.com/sukumargaonkar)
+- [Rachit Chauhan](https://github.com/rachitchauhan43)
+- [Rafael Vasquez](https://github.com/rafvasq)
+- [Tim Kleinloog](https://github.com/TimKleinloog)
+- [Christian Kadner](https://github.com/ckadner)
+- [ddelange](https://github.com/ddelange)
+- [Lize Cai](https://github.com/lizzzcai)
+- [sangjune.park](https://github.com/park12sj)
+- [Suresh Nakkeran](https://github.com/Suresh-Nakkeran)
+- [Konstantinos Messis](https://github.com/MessKon)
+- [Matt Rose](https://github.com/matty-rose)
+- [Alexa Griffith](https://github.com/alexagriffith)
+- [Jagadeesh J](https://github.com/jagadeeshi2i)
+- [Alex Lembiyeuski](https://github.com/alembiewski)
+- [Yuki Iwai](https://github.com/tenzen-y)
+- [Andrews Arokiam](https://github.com/andyi2it)
+- [Xin Fu](https://github.com/xfu83)
+- [adilhusain-s](https://github.com/adilhusain-s)
+- [Pranav Pandit](https://github.com/pranavpandit1)
+- [C1berwiz](https://github.com/C1berwiz)
+- [dilverse](https://github.com/dilverse)
+- [Yuan Tang](https://github.com/terrytangyuan)
+- [Dan Sun](https://github.com/yuzisun)
+- [Nick Hill](https://github.com/njhill)
+
+The KServe Working Group
diff --git a/docs/modelserving/servingruntimes.md b/docs/modelserving/servingruntimes.md
@@ -41,7 +41,7 @@ Several out-of-the-box _ClusterServingRuntimes_ are provided with KServe so that
 | kserve-mlserver           | SKLearn, XGBoost, LightGBM, MLflow  |
 | kserve-paddleserver       | Paddle                              |
 | kserve-pmmlserver         | PMML                                |
-| kserver-sklearnserver     | SKLearn                             |
+| kserve-sklearnserver      | SKLearn                             |
 | kserve-tensorflow-serving | TensorFlow                          |
 | kserve-torchserve         | PyTorch                             |
 | kserve-tritonserver       | TensorFlow, ONNX, PyTorch, TensorRT |

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -22,7 +22,7 @@ nav:
                   - V1 Inference Protocol: modelserving/data_plane/v1_protocol.md
                   - Open Inference Protocol (V2 Inference Protocol): modelserving/data_plane/v2_protocol.md
                 - Serving Runtimes: modelserving/servingruntimes.md
-          - Single Model Serving:
+          - Model Serving Runtimes:
               - Supported Model Frameworks/Formats:
                 - Overview: modelserving/v1beta1/serving_runtime.md
                 - Tensorflow: modelserving/v1beta1/tensorflow/README.md
@@ -91,6 +91,7 @@ nav:
           - Debugging guide: developer/debug.md
     - Blog:
           - Releases:
+            - KServe 0.10 Release: blog/articles/2023-02-05-KServe-0.10-release.md
             - KServe 0.9 Release: blog/articles/2022-07-21-KServe-0.9-release.md
             - KServe 0.8 Release: blog/articles/2022-02-18-KServe-0.8-release.md
             - KServe 0.7 Release: blog/articles/2021-10-11-KServe-0.7-release.md

diff --git a/overrides/home.html b/overrides/home.html
@@ -84,12 +84,12 @@ <h2>KServe Components</h2>
             <div class="org-card--body">
               <div class="org-card--body-heading">
                 <h5 class="org-card--heading" id="org-card--heading">
-                  <a href="./modelserving/v1beta1/serving_runtime">Single Model Serving</a>
+                  <a href="./modelserving/v1beta1/serving_runtime">Model Serving</a>
                 </h5>
               </div>
               <div class="org-card--body-content">
                 <div class="org-card--body-content-wrapper">
-                  Provides Serverless deployment of single model inference on CPU/GPU for common ML frameworks
+                  Provides Serverless deployment for model inference on CPU/GPU with common ML frameworks
                   <a href="https://scikit-learn.org/">Scikit-Learn</a>, <a href="https://xgboost.readthedocs.io/">XGBoost</a>, <a href="https://www.tensorflow.org/">Tensorflow</a>, <a href="https://pytorch.org/">PyTorch</a> as well as pluggable custom model runtime.
                 </div>
               </div>

diff --git a/overrides/main.html b/overrides/main.html
@@ -2,6 +2,6 @@
 
 {% block announce %}
  <h1>
-   <b>KServe v0.9 is Released</b>, <a href="/website/0.9/blog/articles/2022-07-21-KServe-0.9-release/">Read blog &gt;&gt;</a>
+   <b>KServe v0.10 is Released</b>, <a href="/website/0.10/blog/articles/2023-02-05-KServe-0.10-release/">Read blog &gt;&gt;</a>
  </h1>
 {% endblock %}