Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

admission control: improve idle CPU usage #66881

Closed
RaduBerinde opened this issue Jun 25, 2021 · 7 comments · Fixed by #69379
Closed

admission control: improve idle CPU usage #66881

RaduBerinde opened this issue Jun 25, 2021 · 7 comments · Fixed by #69379
Assignees
Labels
A-admission-control branch-master Failures and bugs on the master branch. C-performance Perf of queries or internals. Solution not expected to change functional behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.

Comments

@RaduBerinde
Copy link
Member

I have an empty local roachprod cluster running with no data and no load and each cockroach process takes about 8-10% of a CPU core. Others have observed that even with a SQL-only tenant process, we still see 5-8%.

This seems excessive to me. We should look at some profiles and see if there is any low hanging fruit for improving on this.

@RaduBerinde RaduBerinde added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Jun 25, 2021
@ajwerner
Copy link
Contributor

Is this golang/go#27707?

@RaduBerinde
Copy link
Member Author

The main culprit was determined to be the util/goschedstats loop. #66894 reduces the frequency of this significantly on 21.1. Hopefully in 21.2 we can switch to using the new /sched/latencies:seconds in runtime/metrics and collect that as needed (not with a fixed rate).

@petermattis
Copy link
Collaborator

The main culprit was determined to be the util/goschedstats loop.

Heh. The finger points back at yourself.

@RaduBerinde
Copy link
Member Author

RaduBerinde commented Jun 25, 2021

Yeah.. We tested that it has very little observable effect on workloads, but we didn't think that idle usage would be so different.

@RaduBerinde RaduBerinde changed the title investigate idle CPU usage admission control: improve idle CPU usage Jun 28, 2021
@RaduBerinde
Copy link
Member Author

#66894 fixed the problem on 21.1. This bug tracks fixing the problem on master, in a way that doesn't negatively impact the new admission control code.

The new metric in go 1.17 likely will not help because the admission control code relies on frequent updates, and the histograms are likely going to make things even slower.

One approach is to use a much reduced update frequency until and unless admission control is actually queueing things.

@andy-kimball
Copy link
Contributor

Note that this issue is a hard blocker for the Serverless launch. We'd like to launch on a beta version of 21.2 cut during the stabilization period, so we need to address this soon.

@RaduBerinde RaduBerinde added the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Aug 25, 2021
@blathers-crl
Copy link

blathers-crl bot commented Aug 25, 2021

Hi @RaduBerinde, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Aug 25, 2021
…rloaded

The kvSlotAdjuster keeps track of a history of CPULoad ticks
to decide whether sufficient fraction of that history was
underloaded or not. If underloaded we disable slot and token
enforcement in kvGranter and tokenGranter since the reduced
frequency of CPULoad could cause us to not adjust slots fast
enough. This information is fed back to goschedstats so that
it can adjust the period of the ticker.

Fixes cockroachdb#66881

Release justification: Fix for high-priority issue in new
functionality.

Release note: None
@RaduBerinde RaduBerinde added the branch-master Failures and bugs on the master branch. label Aug 25, 2021
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Aug 25, 2021
…rloaded

The goschestats makes the determination of the tick interval
every 1s, and either ticks at 1ms or 250ms. 250ms is used when
the cpu is very underloaded.
The admission control code disables slot and token enforcement
if the tick interval is greater than 1ms. This is done since
the reduced frequency of CPULoad could cause us to not adjust
slots fast enough.

Fixes cockroachdb#66881

Release justification: Fix for high-priority issue in new
functionality.

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Aug 26, 2021
…rloaded

The goschestats makes the determination of the tick interval
every 1s, and either ticks at 1ms or 250ms. 250ms is used when
the cpu is very underloaded.
The admission control code disables slot and token enforcement
if the tick interval is greater than 1ms. This is done since
the reduced frequency of CPULoad could cause us to not adjust
slots fast enough.

Fixes cockroachdb#66881

Release justification: Fix for high-priority issue in new
functionality.

Release note: None
sumeerbhola added a commit to sumeerbhola/cockroach that referenced this issue Aug 29, 2021
…rloaded

The goschestats makes the determination of the tick interval
every 1s, and either ticks at 1ms or 250ms. 250ms is used when
the cpu is very underloaded.
The admission control code disables slot and token enforcement
if the tick interval is greater than 1ms. This is done since
the reduced frequency of CPULoad could cause us to not adjust
slots fast enough.

Fixes cockroachdb#66881

Release justification: Fix for high-priority issue in new
functionality.

Release note: None
craig bot pushed a commit that referenced this issue Aug 30, 2021
69379: admission,goschedstats: reduce scheduler sampling frequency when unde… r=sumeerbhola a=sumeerbhola

…rloaded

The kvSlotAdjuster keeps track of a history of CPULoad ticks
to decide whether sufficient fraction of that history was
underloaded or not. If underloaded we disable slot and token
enforcement in kvGranter and tokenGranter since the reduced
frequency of CPULoad could cause us to not adjust slots fast
enough. This information is fed back to goschedstats so that
it can adjust the period of the ticker.

Fixes #66881

Release justification: Fix for high-priority issue in new
functionality.

Release note: None

69438: build: point people to `dev generate bazel` over `make bazel-generate` r=rail a=rickystewart

These two methods for generating Bazel files do the exact same thing,
but making this change might train people to use `dev` a little bit.

Release justification: Non-production code change
Release note: None

69494: ccl/serverccl: increase http client timeout in tests r=dhartunian a=dhartunian

Tests run under stress would sometimes timeout waiting on HTTP calls to
return. This change creates a custom HTTP client with a 15s timeout. The
original client has a default 3s timeout.

Resolves #68879

Release justification: non-production code change

Release note: None

Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
Co-authored-by: Ricky Stewart <ricky@cockroachlabs.com>
Co-authored-by: David Hartunian <davidh@cockroachlabs.com>
@craig craig bot closed this as completed in 6a8467d Aug 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-admission-control branch-master Failures and bugs on the master branch. C-performance Perf of queries or internals. Solution not expected to change functional behavior. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants