Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Improvement in tds and ucs2 parsing - V1-Candidate #14

Merged
merged 5 commits into from
Jun 21, 2022

Conversation

PeteBassettBet365
Copy link
Collaborator

This pull request covers some changes to improve performance in the tdsBuffer struct and ucs22str function.

buf.go
The tdsBuffer uint16 function is used heavily throughout the tds parsing code. The original implementation created a byte array and a slice
into it to be filled by a call to ReadFull. This calls through io.ReadFull back into tdsBuffer io.Reader Read function. This code path incurs
significant overhead when the buffer already has at least 2 bytes available. The new implementation checks the read position before calling
ReadFull. If we have at least two bytes available we read directly, otherwise fall back to the slow path.
uint32 and uint64 have also been changed in the same way for 4 & 8 byte reads.

uint16 - a loop calling uint16 10000 times
BenchmarkReadUint16Implementations/reference-12 4010 304633 ns/op 20045 B/op 10001 allocs/op
BenchmarkReadUint16Implementations/new_version-12 23412 50691 ns/op 32 B/op 1 allocs/op

uint32 - a loop calling uint32 5000 times
BenchmarkReadUint32Implementations/reference-12 6684 154147 ns/op 20045 B/op 5001 allocs/op
BenchmarkReadUint32Implementations/new_version-12 48321 25327 ns/op 32 B/op 1 allocs/op

uint64 - a loop calling uint64 2500 times
BenchmarkReadUint64Implementations/reference-12 3795 87688 ns/op 20045 B/op 2501 allocs/op
BenchmarkReadUint64Implementations/new_version-12 85941 14582 ns/op 32 B/op 1 allocs/op

tds.go - performance improvement on ucs22str function. It is responsible for reading 16bit ucs2 characters and producing a string.
Most of the strings we deal with are ascii so the full unicode handling path can be bypassed when they are found. We use unsafe to read the incoming bytes
in 8 byte chunks, as uint64s, masking the value to find invalid ascii. If found we fall back to a faster version of the original
implementation to deal with the unicode. Otherwise we have 8 bytes and 4 can be safely added to the output buffer. If we reach the end of the input
successfully the buffer can be converted directly to a string. This reduces allocations on the ascii path and is a significant speed improvement, up to 16x in our tests.
The faster unicode path avoids copying the byte data to a newly allocated uint16 array. It again uses unsafe, to create a uint16 slice to the original byte data.
This is safe because it is passed to directly utf16.Decode, which itself creates a copy as a []rune. This saves us an allocation, copying and uint16 conversion.

Reference Implementation
BenchmarkUcs22strReferenceAscii-12 12274105 98.6 ns/op 32 B/op 3 allocs/op
BenchmarkUcs22strReferenceMediumAscii-12 6721837 178 ns/op 96 B/op 3 allocs/op
BenchmarkUcs22strReferenceLongAscii-12 1538629 778 ns/op 448 B/op 3 allocs/op
BenchmarkUcs22strReferenceLongerAscii-12 169466 7115 ns/op 4608 B/op 3 allocs/op
BenchmarkUcs22strReferenceTrailingUnicode-12 4423173 274 ns/op 160 B/op 3 allocs/op
BenchmarkUcs22strReferenceLongEmojis-12 3166297 380 ns/op 304 B/op 3 allocs/op

New Implementation
BenchmarkUcs22strAscii-12 54690384 21.9 ns/op 3 B/op 1 allocs/op
BenchmarkUcs22strMediumAscii-12 35389251 33.4 ns/op 16 B/op 1 allocs/op
BenchmarkUcs22strLongAscii-12 17188800 70.1 ns/op 64 B/op 1 allocs/op
BenchmarkUcs22strLongerAscii-12 2864568 427 ns/op 640 B/op 1 allocs/op
BenchmarkUcs22strTrailingUnicode-12 4221280 259 ns/op 144 B/op 3 allocs/op
BenchmarkUcs22strLongEmojis-12 3549286 341 ns/op 272 B/op 3 allocs/op

buf.go
A tdsBuffer is created for each sql connection and internally holds two []byte buffers, one for reading and one for writing. They are allocated directly with
make. Each time a connection is closed, they are garbage collected. We added a sync.Pool managed pool of 64k []byte slices. These are created as required and
placed back in the pool when the tdsBuffer is closed. Each pooled slice is used for the read and write buffers by reslicing half for each.
Connections are generally long lived so this is a minor improvement to garbage collection overhead at the expense of slightly increased memory usage. Happy to discuss this further if you deem it not required.

All benchmarks done on an 8core i7, 16gb

@ghost
Copy link

ghost commented May 4, 2022

CLA assistant check
All CLA requirements met.

@PeteBassettBet365 PeteBassettBet365 changed the title Performance Improvement in tds and ucs2 parsing Performance Improvement in tds and ucs2 parsing - V1-Candidate May 4, 2022
@shueybubbles shueybubbles requested a review from stuartpa May 11, 2022 17:57
@shueybubbles shueybubbles added enhancement New feature or request v1-candidate Nominate PR or Issue for V1 project labels May 11, 2022
@shueybubbles shueybubbles requested review from gambtho and imiller31 May 11, 2022 18:10
@shueybubbles shueybubbles requested a review from kenvanhyning May 26, 2022 14:57
tds.go Outdated
// step through in 8 byte chunks.
for readIndex = 0; readIndex < nlen8; readIndex += 8 {

// defererence directly into the array as uint64s
Copy link
Collaborator

@stuartpa stuartpa Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defererence

nit: typo #Resolved

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed that now

Copy link
Collaborator

@shueybubbles shueybubbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the lint issues. Looks good to me!
thx

@PeteBassettBet365
Copy link
Collaborator Author

PeteBassettBet365 commented Jun 17, 2022

Hi, I will fix the linting issues on Monday.

@codecov-commenter
Copy link

codecov-commenter commented Jun 17, 2022

Codecov Report

❗ No coverage uploaded for pull request base (main@2c7e9d5). Click here to learn what that means.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main      #14   +/-   ##
=======================================
  Coverage        ?   72.03%           
=======================================
  Files           ?       23           
  Lines           ?     5593           
  Branches        ?        0           
=======================================
  Hits            ?     4029           
  Misses          ?     1318           
  Partials        ?      246           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2c7e9d5...1ebb25d. Read the comment docs.

buf.go Show resolved Hide resolved
Copy link
Collaborator

@shueybubbles shueybubbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. When @stuartpa signs off I'll merge.

Copy link
Collaborator

@stuartpa stuartpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@shueybubbles shueybubbles merged commit f386d5f into microsoft:main Jun 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request v1-candidate Nominate PR or Issue for V1 project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants