Skip to content

Commit

Permalink
Add KFServing Debugging guide (kubeflow#829)
Browse files Browse the repository at this point in the history
* Add kfdef type on issue template

* Add KFServing debugging guide

* Fix docs

* Add request flow debugging

* Add debug commands

* Add performance debugging

* Add tracing image

* Add link to main README

* Istio ingress probe fix instruction

* Apply suggestions from code review

Co-authored-by: Animesh Singh <singhan@us.ibm.com>

* Address review comments

Co-authored-by: Animesh Singh <singhan@us.ibm.com>
  • Loading branch information
yuzisun and animeshsingh authored May 19, 2020
1 parent 1dc962b commit 2e13b6a
Show file tree
Hide file tree
Showing 6 changed files with 340 additions and 24 deletions.
1 change: 1 addition & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ about: Tell us about a problem you are experiencing
- Knative Version:
- KFServing Version:
- Kubeflow version:
- Kfdef:[k8s_istio/istio_dex/gcp_basic_auth/gcp_iap/aws/aws_cognito/ibm]
- Minikube version:
- Kubernetes version: (use `kubectl version`):
- OS (e.g. from `/etc/os-release`):
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,9 @@ curl -v -H "Host: sklearn-iris.default.example.com" http://localhost:8080/v1/mod
### KFServing API Reference
[KFServing API Docs](./docs/apis/README.md)

### KFServing Debugging Guide :star:
[Debug KFServing InferenceService](./docs/KFSERVING_DEBUG_GUIDE.md)

### Developer Guide
[Developer Guide](/docs/DEVELOPER_GUIDE.md).

Expand Down
24 changes: 0 additions & 24 deletions docs/DEVELOPER_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -316,27 +316,3 @@ It`s a red herring. To resolve it, please ensure you have logged into dockerhub
```

Please make sure not to deploy the inferenceservice in the `kfserving-system` or other namespaces where namespace has `control-plane` as a label. The `storage-initializer` init container does not get injected for deployments in those namespaces since they do not go through the mutating webhook.

6. When you deploy the tensorflow sample, you may get `IngressNotConfigured` error:

This often happens when KNative fails to probe the Istio ingress gateway to your inference service and you may find the HTTP error code in KNative `network-istio` pod logs.

If you are seeing HTTP 401 or 302, then you may have Auth turned on for `Istio Ingress Gateway` which blocks the Knative probes to the your service.

```shell
kubectl logs -l app=networking-istio -n knative-serving
[2020-02-11T18:16:21.419Z] "GET / HTTP/1.1" 404 NR "-" "-" 0 0 0 - "10.88.0.31" "Go-http-client/1.1" "4a8bd584-2323-4f40-9230-9797d890b9fb" "helloworld-go.default:80" "-" - - 10.88.1.13:80 10.88.0.31:36237 - -
[2020-02-11T18:16:21.419Z] "GET / HTTP/1.1" 404 NR "-" "-" 0 0 0 - "10.88.0.31" "Go-http-client/1.1" "7298dbfc-58bb-430f-92c5-cf39e97f63d7" "helloworld-go.default.svc:80" "-" - - 10.88.1.13:80 10.88.0.31:36239 - -
[2020-02-11T18:16:21.420Z] "GET / HTTP/1.1" 302 UAEX "-" "-" 0 269 21 21 "10.88.0.31" "Go-http-client/1.1" "27aa43fa-ac17-4a71-8ca2-b4d9fb772219" "helloworld-go.default.example.com:80" "-" - - 10.88.1.13:80 10.88.0.31:36249 - -
```

If you are seeing HTTP 403, then you may have `Istio RBAC` turned on which blocks the probes to your service, you can create Istio RBAC rule to allow the probes from `knative-serving` namespace or disable the istio sidecar injection by adding the `sidecar.istio.io/inject: false` annotation to the inference service.

```json
{"level":"error","ts":"2020-03-26T19:12:00.749Z","logger":"istiocontroller.ingress-controller.status-manager","caller":"ingress/status.go:366",
"msg":"Probing of http://flowers-sample-predictor-default.kubeflow-jeanarmel-luce.example.com:80/ failed, IP: 10.0.0.29:80, ready: false, error: unexpected status code: want [200], got 403 (depth: 0)",
"commit":"6b0e5c6","knative.dev/controller":"ingress-controller","stacktrace":"knative.dev/serving/pkg/reconciler/ingress.(*StatusProber).processWorkItem\n\t/home/prow/go/src/knative.dev/serving/pkg/reconciler/ingress/status.go:366\nknative.dev/serving/pkg/reconciler/ingress.(*StatusProber).Start.func1\n\t/home/prow/go/src/knative.dev/serving/pkg/reconciler/ingress/status.go:268"}
```

KNative has been addressing the probe issue in https://github.com/knative/serving/issues/6829 with best effort probes and KFServing has a temporary solution with KFServing Ingress Gateway in Kubeflow manifests until the fix is released, meanwhile we are working a proper AuthN/AuthZ story for KFServing in https://github.com/kubeflow/kfserving/issues/760.
Loading

0 comments on commit 2e13b6a

Please sign in to comment.