Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize BCE #34669

Open
hawkingrei opened this issue May 16, 2022 · 0 comments
Open

optimize BCE #34669

hawkingrei opened this issue May 16, 2022 · 0 comments
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@hawkingrei
Copy link
Member

hawkingrei commented May 16, 2022

Enhancement

Go is a memory safe language. In array/slice element indexing and subslice operations, Go runtime will check whether or not the involved indexes are out of range. If an index is out of range, a panic will be produced to prevent the invalid index from doing harm. This is called bounds check. Bounds checks make our code run safely, on the other hand, they also make our code run a little slower.

Since Go Toolchain 1.7, the standard Go compiler has used a new compiler backend, which based on SSA (static single-assignment form). SSA helps Go compilers effectively use optimizations like BCE (bounds check elimination) and CSE (common subexpression elimination). BCE can avoid some unnecessary bounds checks, and CSE can avoid some duplicate calculations, so that the standard Go compiler can generate more efficient programs. Sometimes the improvement effects of these optimizations are obvious.

This article will list some examples to show how BCE works with the standard Go compiler 1.7+.

For Go Toolchain 1.7+, we can use -gcflags="-d=ssa/check_bce/debug=1" compiler flag to show which code lines still need bounds checks.

Example

diff --git a/expression/builtin_time_vec.go b/expression/builtin_time_vec.go
index b6bab705d27bd..1db9d3436cc75 100644
--- a/expression/builtin_time_vec.go
+++ b/expression/builtin_time_vec.go
@@ -139,8 +139,10 @@ func (b *builtinFromUnixTime2ArgSig) vecEvalString(input *chunk.Chunk, result *c
 	result.ReserveString(n)
 	ds := buf1.Decimals()
 	fsp := b.tp.GetDecimal()
+	_ = buf1.NullBitmap[(n-1)/8]
+	_ = buf2.NullBitmap[(n-1)/8]
 	for i := 0; i < n; i++ {
-		if buf1.IsNull(i) || buf2.IsNull(i) {
+		if buf1.NullBitmap[i/8]&buf2.NullBitmap[i/8]&(1<<(uint(i)&7)) == 0 {
 			result.AppendNull()
 			continue
 		}
make vectorized-bench VB_FILE=Time VB_FUNC=builtinFromUnixTime2ArgSig 

Before

goos: darwin
goarch: amd64
pkg: github.com/pingcap/tidb/expression
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkVectorizedBuiltinTimeFuncGenerated
BenchmarkVectorizedBuiltinTimeFuncGenerated-16    	1000000000	         0.3045 ns/op	       0 B/op	       0 allocs/op
BenchmarkVectorizedBuiltinTimeFunc
BenchmarkVectorizedBuiltinTimeFunc-16             	1000000000	         0.06184 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	github.com/pingcap/tidb/expression	7.986s

After

goos: darwin
goarch: amd64
pkg: github.com/pingcap/tidb/expression
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
BenchmarkVectorizedBuiltinTimeFuncGenerated
BenchmarkVectorizedBuiltinTimeFuncGenerated-16          1000000000               0.2601 ns/op          0 B/op          0 allocs/op
BenchmarkVectorizedBuiltinTimeFunc
BenchmarkVectorizedBuiltinTimeFunc-16                   1000000000               0.05196 ns/op         0 B/op          0 allocs/op
PASS
ok      github.com/pingcap/tidb/expression      5.989s

17% performance improvement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

1 participant