-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Improvement in tds and ucs2 parsing - V1-Candidate #14
Conversation
tds.go
Outdated
// step through in 8 byte chunks. | ||
for readIndex = 0; readIndex < nlen8; readIndex += 8 { | ||
|
||
// defererence directly into the array as uint64s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed that now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address the lint issues. Looks good to me!
thx
Hi, I will fix the linting issues on Monday. |
Codecov Report
@@ Coverage Diff @@
## main #14 +/- ##
=======================================
Coverage ? 72.03%
=======================================
Files ? 23
Lines ? 5593
Branches ? 0
=======================================
Hits ? 4029
Misses ? 1318
Partials ? 246 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. When @stuartpa signs off I'll merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request covers some changes to improve performance in the tdsBuffer struct and ucs22str function.
buf.go
The tdsBuffer uint16 function is used heavily throughout the tds parsing code. The original implementation created a byte array and a slice
into it to be filled by a call to ReadFull. This calls through io.ReadFull back into tdsBuffer io.Reader Read function. This code path incurs
significant overhead when the buffer already has at least 2 bytes available. The new implementation checks the read position before calling
ReadFull. If we have at least two bytes available we read directly, otherwise fall back to the slow path.
uint32 and uint64 have also been changed in the same way for 4 & 8 byte reads.
uint16 - a loop calling uint16 10000 times
BenchmarkReadUint16Implementations/reference-12 4010 304633 ns/op 20045 B/op 10001 allocs/op
BenchmarkReadUint16Implementations/new_version-12 23412 50691 ns/op 32 B/op 1 allocs/op
uint32 - a loop calling uint32 5000 times
BenchmarkReadUint32Implementations/reference-12 6684 154147 ns/op 20045 B/op 5001 allocs/op
BenchmarkReadUint32Implementations/new_version-12 48321 25327 ns/op 32 B/op 1 allocs/op
uint64 - a loop calling uint64 2500 times
BenchmarkReadUint64Implementations/reference-12 3795 87688 ns/op 20045 B/op 2501 allocs/op
BenchmarkReadUint64Implementations/new_version-12 85941 14582 ns/op 32 B/op 1 allocs/op
tds.go - performance improvement on ucs22str function. It is responsible for reading 16bit ucs2 characters and producing a string.
Most of the strings we deal with are ascii so the full unicode handling path can be bypassed when they are found. We use unsafe to read the incoming bytes
in 8 byte chunks, as uint64s, masking the value to find invalid ascii. If found we fall back to a faster version of the original
implementation to deal with the unicode. Otherwise we have 8 bytes and 4 can be safely added to the output buffer. If we reach the end of the input
successfully the buffer can be converted directly to a string. This reduces allocations on the ascii path and is a significant speed improvement, up to 16x in our tests.
The faster unicode path avoids copying the byte data to a newly allocated uint16 array. It again uses unsafe, to create a uint16 slice to the original byte data.
This is safe because it is passed to directly utf16.Decode, which itself creates a copy as a []rune. This saves us an allocation, copying and uint16 conversion.
Reference Implementation
BenchmarkUcs22strReferenceAscii-12 12274105 98.6 ns/op 32 B/op 3 allocs/op
BenchmarkUcs22strReferenceMediumAscii-12 6721837 178 ns/op 96 B/op 3 allocs/op
BenchmarkUcs22strReferenceLongAscii-12 1538629 778 ns/op 448 B/op 3 allocs/op
BenchmarkUcs22strReferenceLongerAscii-12 169466 7115 ns/op 4608 B/op 3 allocs/op
BenchmarkUcs22strReferenceTrailingUnicode-12 4423173 274 ns/op 160 B/op 3 allocs/op
BenchmarkUcs22strReferenceLongEmojis-12 3166297 380 ns/op 304 B/op 3 allocs/op
New Implementation
BenchmarkUcs22strAscii-12 54690384 21.9 ns/op 3 B/op 1 allocs/op
BenchmarkUcs22strMediumAscii-12 35389251 33.4 ns/op 16 B/op 1 allocs/op
BenchmarkUcs22strLongAscii-12 17188800 70.1 ns/op 64 B/op 1 allocs/op
BenchmarkUcs22strLongerAscii-12 2864568 427 ns/op 640 B/op 1 allocs/op
BenchmarkUcs22strTrailingUnicode-12 4221280 259 ns/op 144 B/op 3 allocs/op
BenchmarkUcs22strLongEmojis-12 3549286 341 ns/op 272 B/op 3 allocs/op
buf.go
A tdsBuffer is created for each sql connection and internally holds two []byte buffers, one for reading and one for writing. They are allocated directly with
make. Each time a connection is closed, they are garbage collected. We added a sync.Pool managed pool of 64k []byte slices. These are created as required and
placed back in the pool when the tdsBuffer is closed. Each pooled slice is used for the read and write buffers by reslicing half for each.
Connections are generally long lived so this is a minor improvement to garbage collection overhead at the expense of slightly increased memory usage. Happy to discuss this further if you deem it not required.
All benchmarks done on an 8core i7, 16gb