Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dramatic performance decrease in latest commit(4c6c506). #13068

Open
wilsonwang371 opened this issue Jun 2, 2021 · 15 comments
Open

Dramatic performance decrease in latest commit(4c6c506). #13068

wilsonwang371 opened this issue Jun 2, 2021 · 15 comments

Comments

@wilsonwang371
Copy link
Contributor

I was testing my changes and I found the latest main branch performance was very bad.

commit 4c6c506
Merge: c15af6d d669eb0
Author: Piotr Tabor ptab@google.com
Date: Tue Jun 1 17:19:05 2021 +0200

Merge pull request #13063 from serathius/integration-defer

integration: Use subtests to defer cluster.Terminate call

Compared with my previous data:

DATA | 0.0078 | 32 | 16 | 275.2097:35699.6517 | 275.7598:35770.8341 | 276.6643:35888.4949 | 280.6877:36410.3105 | 282.7565:36678.4655 |

Now it is:

DATA,.0078,32,16,71.3823:9259.6178,71.5968:9287.4621,71.4798:9272.2607,71.4114:9263.3926,71.8465:9319.8179

We can see in the same condition, read reduced from 275 to 71.

@ptabor @gyuho guys, please take a look.

@wilsonwang371 wilsonwang371 changed the title dramatically performance decrease in latest commit. Dramatic performance decrease in latest commit(4c6c506). Jun 2, 2021
@wilsonwang371
Copy link
Contributor Author

The columns are:
type,ratio,conn_size,value_size,iter1,iter2,iter3,iter4,iter5,comment
32 is the connection amount
16 is the value size
A:B is Read A pqs vs Write B qps

@gyuho
Copy link
Contributor

gyuho commented Jun 2, 2021

Those commits only change test files? Which tests are we running here?

@wilsonwang371
Copy link
Contributor Author

Those commits only change test files? Which tests are we running here?

I am running the mixed rw performance testing code that I committed recently. I am comparing the difference between commits 4c6c506 and 004081c

@ptabor
Copy link
Contributor

ptabor commented Jun 2, 2021

#13060 is a change that removed an optimization of count-only range-reads.
Does your tests depends on count-only range scans ?

@wilsonwang371
Copy link
Contributor Author

wilsonwang371 commented Jun 2, 2021

#13060 is a change that removed an optimization of count-only range-reads.
Does your tests depends on count-only range scans ?

My test is running mixed read/write with txn-put and txn-range. But I was using a limit in txn-range request to avoid too many data.

                               opts := []v3.OpOption{v3.WithRange(mixedTxnEndKey)}
				if rangeConsistency == "s" {
					opts = append(opts, v3.WithSerializable())
				}
				opts = append(opts, v3.WithPrefix(), v3.WithLimit(mixedTxnRangeLimit))
				req.op = v3.OpGet("", opts...)
				req.isWrite = false
				readOpsTotal++

@gyuho
Copy link
Contributor

gyuho commented Jun 2, 2021

Can we run the same test with #13060 reverted (run the test with the original @tangcong's patch)?

@gyuho
Copy link
Contributor

gyuho commented Jun 2, 2021

The original patch (which is reverted in #13060) only changed count-only range query path.

@wilsonwang371
Copy link
Contributor Author

Let me try to locate the patch that introduced this issue.

@tangcong
Copy link
Contributor

tangcong commented Jun 2, 2021

https://github.com/etcd-io/etcd/pull/11990/files#diff-61ba69e6e025cc974e9e65bc63c86947f7b6d72257032a15052cbb613a96c9b1

The performance of non-count-only scenes will also be affected(tr.s.kvindex.Revisions function ). @gyuho

@ptabor
Copy link
Contributor

ptabor commented Jun 2, 2021

@serathius

In
https://github.com/etcd-io/etcd/pull/13060/files#diff-c4842260ff90edd130afd1d3ac19cb782c4a59c98483070a7a7f817e4d2f13fdR146

in the not-count case seems we started to return total instead of 0

Before:

  return &RangeResult{KVs: nil, Count: 0, Rev: curRev}, nil

After:

  return &RangeResult{KVs: nil, Count: total, Rev: curRev}, nil

Did 3.4 was returning count for not-count queries ?

@serathius
Copy link
Member

serathius commented Jun 2, 2021

In 3.4 etcd was returning len(revpairs) as the count in both count and non count queries.

return &RangeResult{KVs: kvs, Count: len(revpairs), Rev: curRev}, nil

len(revpairs) should was always equal to count of all revisions as we didn't apply limit to Revisions

etcd/mvcc/index.go

Lines 106 to 120 in d19fbe5

func (ti *treeIndex) Revisions(key, end []byte, atRev int64) (revs []revision) {
if end == nil {
rev, _, _, err := ti.Get(key, atRev)
if err != nil {
return nil
}
return []revision{rev}
}
ti.visit(key, end, func(ki *keyIndex) {
if rev, _, _, err := ki.get(ti.lg, atRev); err == nil {
revs = append(revs, rev)
}
})
return revs
}

Proposed changes was implemented only after I have updated integration tests of Range function to test count with limit. Only after I have tested it on release-3.4 branch I have proposed it on main.

To verify if I didn't make any mistake I have just cherry-picked the tests to release-3.4 branch. serathius@31c9a27 and run integration tests https://travis-ci.com/github/serathius/etcd/builds/227769629

@ptabor Case that you have shown with Count: 0 is a early return from function in case of len(revpairs) == 0 so It should not be affect result as no results should imply that total = 0 (assuming that query with limit = 0 returns all results)

Please let me know if there is anything else I can help with.

@ptabor
Copy link
Contributor

ptabor commented Jun 3, 2021

Thank you for checking.

It seems that this optimization requires coordinated effort betweek k8s and etcd, e.g. k8s:

  • sends Range(Count=true, rev) requests first, gets cnt
  • sends Range(Count=false, rev, limit=200, NoCount=true) gets N responses...... so estimates Remaining to cnt - NThat would require etcd to introduce ano-count` mode, and k8s be able to use it when available.

@gyuho
Copy link
Contributor

gyuho commented Jun 3, 2021

For my understanding,

#11990 introduced a breaking change where Count would no longer count all keys under range, but only up to the Limit

#11990 impacted Kubernetes chunk API that relies on RangeResponse.Count because #11990 changed the semantics of treeIndex.Revisions. Previously treeIndex.Revisions returned the total rev-pairs no matter what (no limit). With #11990, treeIndex.Revisions returns rev-pairs only up to the specified limit which sets RangeResponse.Count <= limit. #13060 reverts this change to retain the previous behavior, RangeResponse.Count is always the total rev-pairs, in exchange for longer treeIndex.Revisions query time.

ref. https://github.com/kubernetes/kubernetes/blob/v1.21.1/staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go#L767-L777

var remainingItemCount *int64
// getResp.Count counts in objects that do not match the pred.
// Instead of returning inaccurate count for non-empty selectors, we return nil.
// Only set remainingItemCount if the predicate is empty.
if utilfeature.DefaultFeatureGate.Enabled(features.RemainingItemCount) {
	if pred.Empty() {
		c := int64(getResp.Count - pred.Limit)
		remainingItemCount = &c
	}
}

@ptabor @serathius @wilsonwang371 @tangcong

  1. As of etcdserver: Fix invalid count returned on Range with Limit #13060, it seems like we fix the chunk API breakage, right?
  2. Can we still make CountRevisions return when total == limit as in the original @tangcong's change mvcc: push down RangeOptions.limit argv into index tree to reduce memory overhead #11990. That way, we can make kube-apiserver side changes to send count-only query first, and do the following range queries with limit? Basically, kube-apiserver should not rely on Count field when such range query is requested with the limit field. I think we can keep @tangcong's change as long as we fix this kube-apiserver behavior?

@gyuho gyuho modified the milestones: etcd-v3.5, etcd-v3.6 Jun 3, 2021
@ptabor
Copy link
Contributor

ptabor commented Jun 3, 2021

@gyuho:

  1. Right. 3.5 does not have this improvement, but is correct. So we are good (but slow) .

  2. I don't think we should modify the wire-format in scope of v3 (i.e. semantic of requests, that is not driven by an explicit opt-in).
    In particular we need to make sure api-server e.g. 1.19 can still run on etcd-3.6+.

    Thus api-server e.g. 1.24 should be able to discover whether it runs against etcd-3.6 that supports 'enhanced limiting'
    (e.g. v3client.IsCapabilitySupported(FAST_RANGE_WITHOUT_COUNT) and if so call range in that mode.

@stale
Copy link

stale bot commented Sep 2, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

6 participants