Dramatic performance decrease in latest commit(4c6c506). #13068

wilsonwang371 · 2021-06-02T02:13:57Z

I was testing my changes and I found the latest main branch performance was very bad.

commit 4c6c506
Merge: c15af6d d669eb0
Author: Piotr Tabor ptab@google.com
Date: Tue Jun 1 17:19:05 2021 +0200

Merge pull request #13063 from serathius/integration-defer

integration: Use subtests to defer cluster.Terminate call

Compared with my previous data:

DATA | 0.0078 | 32 | 16 | 275.2097:35699.6517 | 275.7598:35770.8341 | 276.6643:35888.4949 | 280.6877:36410.3105 | 282.7565:36678.4655 |

Now it is:

DATA,.0078,32,16,71.3823:9259.6178,71.5968:9287.4621,71.4798:9272.2607,71.4114:9263.3926,71.8465:9319.8179

We can see in the same condition, read reduced from 275 to 71.

@ptabor @gyuho guys, please take a look.

The text was updated successfully, but these errors were encountered:

wilsonwang371 · 2021-06-02T03:04:23Z

The columns are:
type,ratio,conn_size,value_size,iter1,iter2,iter3,iter4,iter5,comment
32 is the connection amount
16 is the value size
A:B is Read A pqs vs Write B qps

gyuho · 2021-06-02T03:48:38Z

Those commits only change test files? Which tests are we running here?

wilsonwang371 · 2021-06-02T04:01:34Z

Those commits only change test files? Which tests are we running here?

I am running the mixed rw performance testing code that I committed recently. I am comparing the difference between commits 4c6c506 and 004081c

ptabor · 2021-06-02T06:43:17Z

#13060 is a change that removed an optimization of count-only range-reads.
Does your tests depends on count-only range scans ?

wilsonwang371 · 2021-06-02T06:48:34Z

#13060 is a change that removed an optimization of count-only range-reads.
Does your tests depends on count-only range scans ?

My test is running mixed read/write with txn-put and txn-range. But I was using a limit in txn-range request to avoid too many data.

                               opts := []v3.OpOption{v3.WithRange(mixedTxnEndKey)}
				if rangeConsistency == "s" {
					opts = append(opts, v3.WithSerializable())
				}
				opts = append(opts, v3.WithPrefix(), v3.WithLimit(mixedTxnRangeLimit))
				req.op = v3.OpGet("", opts...)
				req.isWrite = false
				readOpsTotal++

gyuho · 2021-06-02T06:51:04Z

Can we run the same test with #13060 reverted (run the test with the original @tangcong's patch)?

gyuho · 2021-06-02T06:51:45Z

The original patch (which is reverted in #13060) only changed count-only range query path.

wilsonwang371 · 2021-06-02T06:55:26Z

Let me try to locate the patch that introduced this issue.

tangcong · 2021-06-02T08:57:07Z

https://github.com/etcd-io/etcd/pull/11990/files#diff-61ba69e6e025cc974e9e65bc63c86947f7b6d72257032a15052cbb613a96c9b1

The performance of non-count-only scenes will also be affected（tr.s.kvindex.Revisions function ). @gyuho

ptabor · 2021-06-02T17:25:47Z

@serathius

In
https://github.com/etcd-io/etcd/pull/13060/files#diff-c4842260ff90edd130afd1d3ac19cb782c4a59c98483070a7a7f817e4d2f13fdR146

in the not-count case seems we started to return total instead of 0

Before:

  return &RangeResult{KVs: nil, Count: 0, Rev: curRev}, nil

After:

  return &RangeResult{KVs: nil, Count: total, Rev: curRev}, nil

Did 3.4 was returning count for not-count queries ?

serathius · 2021-06-02T18:24:33Z

In 3.4 etcd was returning len(revpairs) as the count in both count and non count queries.

etcd/mvcc/kvstore_txn.go

Line 171 in d19fbe5

return &RangeResult{KVs: kvs, Count: len(revpairs), Rev: curRev}, nil

len(revpairs) should was always equal to count of all revisions as we didn't apply limit to Revisions

etcd/mvcc/index.go

Lines 106 to 120 in d19fbe5

    
           func (ti *treeIndex) Revisions(key, end []byte, atRev int64) (revs []revision) { 
        
           	if end == nil { 
        
           		rev, _, _, err := ti.Get(key, atRev) 
        
           		if err != nil { 
        
           			return nil 
        
           		} 
        
           		return []revision{rev} 
        
           	} 
        
           	ti.visit(key, end, func(ki *keyIndex) { 
        
           		if rev, _, _, err := ki.get(ti.lg, atRev); err == nil { 
        
           			revs = append(revs, rev) 
        
           		} 
        
           	}) 
        
           	return revs 
        
           }

Proposed changes was implemented only after I have updated integration tests of Range function to test count with limit. Only after I have tested it on release-3.4 branch I have proposed it on main.

To verify if I didn't make any mistake I have just cherry-picked the tests to release-3.4 branch. serathius@31c9a27 and run integration tests https://travis-ci.com/github/serathius/etcd/builds/227769629

@ptabor Case that you have shown with Count: 0 is a early return from function in case of len(revpairs) == 0 so It should not be affect result as no results should imply that total = 0 (assuming that query with limit = 0 returns all results)

Please let me know if there is anything else I can help with.

ptabor · 2021-06-03T10:25:11Z

Thank you for checking.

It seems that this optimization requires coordinated effort betweek k8s and etcd, e.g. k8s:

sends Range(Count=true, rev) requests first, gets cnt
sends Range(Count=false, rev, limit=200, NoCount=true) gets N responses...... so estimates Remaining to cnt - NThat would require etcd to introduce ano-count` mode, and k8s be able to use it when available.

gyuho · 2021-06-03T11:41:52Z

For my understanding,

#11990 introduced a breaking change where Count would no longer count all keys under range, but only up to the Limit

#11990 impacted Kubernetes chunk API that relies on RangeResponse.Count because #11990 changed the semantics of treeIndex.Revisions. Previously treeIndex.Revisions returned the total rev-pairs no matter what (no limit). With #11990, treeIndex.Revisions returns rev-pairs only up to the specified limit which sets RangeResponse.Count <= limit. #13060 reverts this change to retain the previous behavior, RangeResponse.Count is always the total rev-pairs, in exchange for longer treeIndex.Revisions query time.

ref. https://github.com/kubernetes/kubernetes/blob/v1.21.1/staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go#L767-L777

var remainingItemCount *int64
// getResp.Count counts in objects that do not match the pred.
// Instead of returning inaccurate count for non-empty selectors, we return nil.
// Only set remainingItemCount if the predicate is empty.
if utilfeature.DefaultFeatureGate.Enabled(features.RemainingItemCount) {
	if pred.Empty() {
		c := int64(getResp.Count - pred.Limit)
		remainingItemCount = &c
	}
}

@ptabor @serathius @wilsonwang371 @tangcong

As of etcdserver: Fix invalid count returned on Range with Limit #13060, it seems like we fix the chunk API breakage, right?
Can we still make CountRevisions return when total == limit as in the original @tangcong's change mvcc: push down RangeOptions.limit argv into index tree to reduce memory overhead #11990. That way, we can make kube-apiserver side changes to send count-only query first, and do the following range queries with limit? Basically, kube-apiserver should not rely on Count field when such range query is requested with the limit field. I think we can keep @tangcong's change as long as we fix this kube-apiserver behavior?

ptabor · 2021-06-03T14:01:40Z

@gyuho:

Right. 3.5 does not have this improvement, but is correct. So we are good (but slow) .
I don't think we should modify the wire-format in scope of v3 (i.e. semantic of requests, that is not driven by an explicit opt-in).
In particular we need to make sure api-server e.g. 1.19 can still run on etcd-3.6+.

Thus api-server e.g. 1.24 should be able to discover whether it runs against etcd-3.6 that supports 'enhanced limiting'
(e.g. v3client.IsCapabilitySupported(FAST_RANGE_WITHOUT_COUNT) and if so call range in that mode.

stale · 2021-09-02T03:44:31Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

wilsonwang371 changed the title ~~dramatically performance decrease in latest commit.~~ Dramatic performance decrease in latest commit(4c6c506). Jun 2, 2021

This was referenced Jun 2, 2021

mvcc: push down RangeOptions.limit argv into index tree to reduce memory overhead #11990

Merged

etcdserver: Fix invalid count returned on Range with Limit #13060

Merged

gyuho modified the milestones: etcd-v3.5, etcd-v3.6 Jun 3, 2021

wilsonwang371 mentioned this issue Aug 10, 2021

server: add range request cache to treeindex #13283

Closed

stale bot added the stale label Sep 2, 2021

stale bot closed this as completed Sep 23, 2021

chaochn47 mentioned this issue May 4, 2023

Codify the the kubernetes-etcd contract into etcd robustness tests #15820

Open

15 tasks

chaochn47 reopened this May 8, 2023

stale bot removed the stale label May 8, 2023

chaochn47 added the stage/tracked label May 8, 2023

chaochn47 modified the milestones: etcd-v3.5, etcd-v3.6 May 8, 2023

chaochn47 mentioned this issue Jul 21, 2023

Documentation: add roadmap #16279

Merged

mehvaibh mentioned this issue Aug 30, 2023

Optimize RangeRequest Revision count #16510

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dramatic performance decrease in latest commit(4c6c506). #13068

Dramatic performance decrease in latest commit(4c6c506). #13068

wilsonwang371 commented Jun 2, 2021

wilsonwang371 commented Jun 2, 2021

gyuho commented Jun 2, 2021

wilsonwang371 commented Jun 2, 2021

ptabor commented Jun 2, 2021

wilsonwang371 commented Jun 2, 2021 •

edited

Loading

gyuho commented Jun 2, 2021

gyuho commented Jun 2, 2021

wilsonwang371 commented Jun 2, 2021

tangcong commented Jun 2, 2021

ptabor commented Jun 2, 2021

serathius commented Jun 2, 2021 •

edited

Loading

ptabor commented Jun 3, 2021

gyuho commented Jun 3, 2021

ptabor commented Jun 3, 2021

stale bot commented Sep 2, 2021

Dramatic performance decrease in latest commit(4c6c506). #13068

Dramatic performance decrease in latest commit(4c6c506). #13068

Comments

wilsonwang371 commented Jun 2, 2021

wilsonwang371 commented Jun 2, 2021

gyuho commented Jun 2, 2021

wilsonwang371 commented Jun 2, 2021

ptabor commented Jun 2, 2021

wilsonwang371 commented Jun 2, 2021 • edited Loading

gyuho commented Jun 2, 2021

gyuho commented Jun 2, 2021

wilsonwang371 commented Jun 2, 2021

tangcong commented Jun 2, 2021

ptabor commented Jun 2, 2021

serathius commented Jun 2, 2021 • edited Loading

ptabor commented Jun 3, 2021

gyuho commented Jun 3, 2021

ptabor commented Jun 3, 2021

stale bot commented Sep 2, 2021

wilsonwang371 commented Jun 2, 2021 •

edited

Loading

serathius commented Jun 2, 2021 •

edited

Loading