-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
http 503 responses during load test on GKE during a node auto-scale up #1797
Comments
woah the commented out area caused all labels to be added. |
@ahmetb: Those labels are not set on the issue: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@ahmetb some 503s may be due to our throttling. can you try changing concurrencyTarget in the autoscaling config map? If possible, can you give also include the error message in your distribution? There probably won't be too many different unique messages. |
The tool I'm using isn't making it easy to get the response body. If I can find time, I'll get back to looking at components for error/throttling messages and will try to get response body as well. But if you have time I encourage you to dig in, it's easy to repro on top of default cluster/app. |
I am running with latest Knative using a 1000 QPS tests for 300 seconds, without sidecar injection, with currencyTarget=2 and see this break down of error rate:
I will run more tests with sidecar proxy and add results here. |
When running with sidecar injection, concurrencyTarget = 2,
This increase is consistent with the fact that Pods are slower to start up with sidecar injection. |
Thanks for taking time to take a look at this. |
/area autoscale
/area networking
/kind bug
I use Knative on GKE (manual install) with node autoscaling enabled. When I send a n=100000 concurrency=200 (or 1000) I sometimes see some responses failing with HTTP 503 or 504.
Expected Behavior
All requests succeed.
Actual Behavior
-n=100000 -c=200
succeed with no errors many times.Steps to Reproduce the Problem
1b. Make sure GKE cluster is hovering around 3-4 nodes (not scaled up yet)
2. Deploy helloworld-go app.
3. Install hey
go get github.com/rakyll/hey
4.
hey -m GET -n 100000 -c 1000 -host helloworld-go.default.example.com http://35.188.214.219/
5. Observe: GKE node scales up
gcloud compute instances list
in a few seconds.6. Observe: during this scale up,
kubectl get pods
show a lot ofPending
pods (that don't come up fast enough til the load test completes)alternatively it shows:
The text was updated successfully, but these errors were encountered: