Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RingDelayBuffer: ring buffer for delay handling #4852

Merged
merged 43 commits into from
Aug 29, 2022

Conversation

davidchocholaty
Copy link
Contributor

The new ring buffer is introduced. The use case for the ring buffer is
for delay handling, but it also could be used as a classic ring buffer
with a jumping option with the reading position. It is based
on the classic known ring buffer. The extensions are,
that the ring delay buffer allows moving with the reading position
subject to certain rules. Another difference between the classic
ring buffer is, that the ring delay buffer offers to read zero values,
which were not written by using the write method and write position.
Both of these two specific properties are based on the cross-fading
between changes of two delays.

This commit adds a new ring buffer. This ring buffer serves
for handling a delay. For the first use case was specially created
for the EngineEffectsDelay, which handles the delay, of the effects
(in a chain). The ring delay buffer allows moving
with the reading position subject to certain rules. Another difference
between the classic ring buffer is, that the ring delay buffer offers
to read zero values which were not written by using the write method
and write position. Both of these two specific properties are based
on the cross-fading between changes of two delays
and the classic ring buffer cannot be used.
The commit adds tests for the ring buffer for handling delay.
The test set includes tests for testing the RingDelayBuffer::isEmpty,
RingDelayBuffer::isFull, RingDelayBuffer::clear, then tests
for checking the number of available items for reading and writing
and at last tests for reading and writing without skipping the position
with the read position and including skip with the read position
on both sides and with both variants (circle around / not circle
around).
The commit adds the original license for the ring buffer
from the Portable Audio I/O Library. The ring buffer
from the pa_ringbuffer.c was used as a template for some parts of code
in RingDelayBuffer. Based on that, the original license is added
to cpp file and header. In the header, the modified and added functions
are described.
This commit adds clearing of the m_jumpLeftAroundMask.
When the RingDelayBuffer::clear method is called,
the mentioned variable for masking, when the left side jump
crossed the left side of the delay buffer, is set to zero.
This commit allows, that the size of the jump with the reading position
to the right can be equal to the number of reading available items.
This commit solves the maximum size of the jump to the left
for the reading position. Based on that, the comments for ASSERTs
are updated and the zero size jump is handled separately.
The commit adds benchmarks for testing the RingDelayBuffer.
The benchmarks test the RingDelayBuffer::write, RingDelayBuffer::read
and RingDelayBuffer::moveReadPositionBy for the case without skipping,
the case with a jump to the left without circling and jump to the left
with circling around.
@Swiftb0y
Copy link
Member

Thanks. Before I go into detail reviewing this. it would make sense to first introduce the std::/gsl::span shim we discussed.

src/util/ringdelaybuffer.cpp Outdated Show resolved Hide resolved
src/util/ringdelaybuffer.h Outdated Show resolved Hide resolved
Copy link
Member

@daschuer daschuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some thoughts

class RingDelayBufferTest : public MixxxTest {
protected:
void SetUp() override {
m_pRingDelayBuffer = new RingDelayBuffer(m_ringDelayBufferSize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can become an unique_ptr

m_ringMask(bufferSize - 1),
m_jumpLeftAroundMask(0),
m_buffer(bufferSize) {
// TODO(davidchocholaty) Handle to allow only power of two for the size of the ring delay buffer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably already now this function:

inline int roundUpToPowerOf2(int v) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO not needed, also there is std::bit_ceil in C++20's <bit> header. It's constexpr and also likely faster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark supports that its at least equal, though I guess the performance of bit_ceil depends very much on whats available on the target architecture (which you can't specify on Quickbench). https://quick-bench.com/q/3FOpFZ_ZaAfJOtWWl1Xx7MFC7JQ

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both for your recommendations. Anyway, after code refactorization which removes the binary operations, the "power of two" condition is no longer required.


// Set the ring delay buffer items to 0.
CSAMPLE* bufferData = m_buffer.data();
memset(bufferData, 0, sizeof(*bufferData) * bufferSize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be not necessary. The buffer should never return uninitialized zeros. In out of memory situation the usage code needs to fade to zero to avoid clicks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initializing values on zero was created, due to it is possible to move with m_readPos to left over the left delay buffer bound. For the current code version, it is allowed in the beginning too. Depending on that, the read position can read values, which weren't written by m_writePos. If I'm thinking about it, not a good practice. It is possible to allow skipping to the left only if the items were written by m_writePos with VERIFY_OR_DEBUG_ASSERT.

itemsToRead = available;
}

const SINT position = m_readPos & m_ringMask;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if the concept of a member variable m_readPos is still suitable.

For my understanding we have the current sample at m_writePos - 1 and the delayed samples before that so the actual read position is m_writePos - "current delay" - itemsToRead

We need to make sure this does not hit the end of the buffer.

Idea: Pass the delay along to the read function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this idea. Sounds cool and I think it cleans up the code a lot. I have just one thing on my mind for two cases. Based on the proposed new version, the ring delay buffer will behave a little bit differently.

  1. If, for example, 8 samples will be written into the ring delay buffer, and then the different amount of samples will be read with zero delays, just propose 4 samples, so, for the previous version, the ring buffer will read the first half of the written samples by index 0, however, the new version will read the second half of the written samples (8 - 0 - 4).

  2. If the write method will be called two times in a row without calling the read method, the situation will be quite similar to the first one.

For our future use case for EngineEffectsDelay, this cannot happen. Anyway, if I will think only about RingDelayBuffer, it may be possible. What do you think about that? I don't want to throw down somehow this idea, I really like it and I would like to use it. I'm only thinking about possible cases, which can happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to re-open a discussion about this idea. I would like to discuss just one thing which I noticed. If the new version will be used, then a lot of checks of delay value have to be run for every read() call, probably all code in moveReadPositionBy(). Can I ask what is your view on this situation?

@Swiftb0y
Copy link
Member

Another inspiration for a singlethreaded Ringbuffer: https://hg.sr.ht/~breakfastquay/rubberband/browse/src/common/SingleThreadRingBuffer.h?rev=tip

This commit removes unnecessary explicitly created virtual destructor.
The commit removes the variable for storing the size
of the delay buffer. Instead of the mentioned variable,
the SampleBuffer::size function is used, due to the type
of the delay buffer, which is mixxx::SampleBuffer.
This commit improves const correctness a few functions and parameters.
The newly const is added to functions: RingDelayBuffer::isFull,
RingDelayBuffer::getReadAvailable, RingDelayBuffer::getWriteAvailable.
Then the numItems parameter for RingDelayBuffer::read and
RingDelayBuffer::write is const and the jumpSize parameter
for RingDelayBuffer::moveReadPositionBy.
The commit removes the information, that the ring buffer is safe
for single-thread only. This information is a rewrite before the class.
This commit replaces the use of RingDelayBuffer* with unique_ptr.
@davidchocholaty
Copy link
Contributor Author

Another inspiration for a singlethreaded Ringbuffer: https://hg.sr.ht/~breakfastquay/rubberband/browse/src/common/SingleThreadRingBuffer.h?rev=tip

Thank you for this tip.

@davidchocholaty
Copy link
Contributor Author

davidchocholaty commented Jul 19, 2022

I would like to post a little info about the possible choice of using only memcpy for RingDelayBuffer or SampleUtil::copy instead (vectorized loop for SSE on 32-bit). Because my system uses 64-bit system (Ubuntu 22.04 LTS) and CPU (Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz) offers SSE and SSE2, the memcpy will be used for SampleUtil::copy too. I used benchmarks with standard deviation. The statistics including standard deviation can be computed as ./mixxx-test --benchmark --benchmark_repetitions=20 --benchmark_filter=BM_WriteReadWholeBuffer for only RingDelayBuffer benchmarks and 20 repetitions for example. As I expected, on my system the SampleUtil::copy version takes a little bit more time (function calls, evaluation of conditions, etc.). Another choice could be std::copy, but IMO the current RingDelayBuffer look can't be used and based on the survey from SampleUtil::copy takes the most time. I would like to ask you, which version from your point of view you prefer because we talked with @Swiftb0y that we should also take into account the standard deviation of benchmark results, but I can't test it for vectorized loop version. For interest the result of benchmarks:

memcpy:

----------------------------------------------------------------------------------------------
Benchmark                                                    Time             CPU   Iterations
----------------------------------------------------------------------------------------------
BM_WriteReadWholeBufferNoSkip/64                           517 ns          523 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           517 ns          523 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          520 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          520 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          520 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          519 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           513 ns          519 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           515 ns          521 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           516 ns          522 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          520 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           515 ns          520 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          519 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          520 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          519 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          520 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           513 ns          519 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          520 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           513 ns          519 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          519 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           515 ns          521 ns      1317352
BM_WriteReadWholeBufferNoSkip/64_mean                      514 ns          520 ns           20
BM_WriteReadWholeBufferNoSkip/64_median                    514 ns          520 ns           20
BM_WriteReadWholeBufferNoSkip/64_stddev                   1.23 ns         1.14 ns           20
BM_WriteReadWholeBufferNoSkip/64_cv                       0.24 %          0.22 %            20
BM_WriteReadWholeBufferNoSkip/512                          589 ns          593 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          596 ns          600 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          594 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          590 ns          594 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          593 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          593 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          590 ns          595 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          593 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          593 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          592 ns          596 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          590 ns          595 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          594 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          594 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          594 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          594 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          594 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          594 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          591 ns          596 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          593 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          590 ns          595 ns      1180062
BM_WriteReadWholeBufferNoSkip/512_mean                     590 ns          594 ns           20
BM_WriteReadWholeBufferNoSkip/512_median                   589 ns          594 ns           20
BM_WriteReadWholeBufferNoSkip/512_stddev                  1.58 ns         1.58 ns           20
BM_WriteReadWholeBufferNoSkip/512_cv                      0.27 %          0.27 %            20
BM_WriteReadWholeBufferNoSkip/4096                        1205 ns         1242 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1204 ns         1241 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1224 ns         1262 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1207 ns         1244 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1210 ns         1247 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1206 ns         1243 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1207 ns         1244 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1211 ns         1248 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1207 ns         1244 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1220 ns         1258 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1221 ns         1259 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1219 ns         1257 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1225 ns         1263 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1224 ns         1263 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1220 ns         1258 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1221 ns         1259 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1219 ns         1257 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1200 ns         1238 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1199 ns         1237 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1200 ns         1238 ns       560769
BM_WriteReadWholeBufferNoSkip/4096_mean                   1213 ns         1250 ns           20
BM_WriteReadWholeBufferNoSkip/4096_median                 1210 ns         1247 ns           20
BM_WriteReadWholeBufferNoSkip/4096_stddev                 9.00 ns         9.21 ns           20
BM_WriteReadWholeBufferNoSkip/4096_cv                     0.74 %          0.74 %            20
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          522 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 529 ns          534 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          522 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          522 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          522 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          522 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          522 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          523 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          523 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 519 ns          525 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          523 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          523 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          523 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          523 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          523 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          523 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 519 ns          524 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          522 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 516 ns          522 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          522 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64_mean            518 ns          523 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/64_median          517 ns          523 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/64_stddev         2.83 ns         2.77 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/64_cv             0.55 %          0.53 %            20
BM_WriteReadWholeBufferSkipLeftNoCircle/512                582 ns          587 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                595 ns          600 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          589 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                582 ns          588 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                582 ns          588 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          588 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                582 ns          588 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                582 ns          587 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                582 ns          588 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                586 ns          592 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          589 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          588 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          589 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          589 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          588 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                582 ns          588 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                582 ns          588 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                596 ns          602 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                581 ns          586 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                581 ns          586 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512_mean           584 ns          589 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/512_median         583 ns          588 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/512_stddev        4.12 ns         4.10 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/512_cv            0.71 %          0.70 %            20
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1163 ns         1202 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1173 ns         1210 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1177 ns         1217 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1168 ns         1208 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1162 ns         1202 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1164 ns         1203 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1164 ns         1203 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1162 ns         1202 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1162 ns         1202 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1193 ns         1231 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1196 ns         1234 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1193 ns         1230 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1203 ns         1241 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1195 ns         1233 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1197 ns         1235 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1197 ns         1234 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1194 ns         1231 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1162 ns         1202 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1163 ns         1202 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1162 ns         1202 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096_mean         1177 ns         1216 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/4096_median       1170 ns         1209 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/4096_stddev       16.0 ns         15.1 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/4096_cv           1.36 %          1.24 %            20
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   531 ns          537 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   520 ns          526 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   520 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   527 ns          532 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   527 ns          533 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   520 ns          526 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   521 ns          527 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   522 ns          529 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          526 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          526 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   520 ns          526 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64_mean              521 ns          527 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/64_median            519 ns          526 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/64_stddev           3.32 ns         3.27 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/64_cv               0.64 %          0.62 %            20
BM_WriteReadWholeBufferSkipLeftCircle/512                  591 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  599 ns          605 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  592 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  593 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  593 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  592 ns          597 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  594 ns          599 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  591 ns          597 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  592 ns          597 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  591 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  593 ns          599 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  592 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  592 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  592 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  591 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  591 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  591 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  594 ns          600 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  591 ns          597 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  591 ns          597 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512_mean             592 ns          598 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/512_median           592 ns          598 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/512_stddev          1.84 ns         1.83 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/512_cv              0.31 %          0.31 %            20
BM_WriteReadWholeBufferSkipLeftCircle/4096                1210 ns         1255 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1216 ns         1260 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1224 ns         1268 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1215 ns         1260 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1210 ns         1254 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1209 ns         1254 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1209 ns         1254 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1210 ns         1255 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1210 ns         1255 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1289 ns         1334 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1289 ns         1333 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1288 ns         1332 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1301 ns         1345 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1289 ns         1333 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1289 ns         1333 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1289 ns         1333 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1288 ns         1332 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1210 ns         1254 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1210 ns         1254 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1210 ns         1255 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096_mean           1243 ns         1288 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/4096_median         1216 ns         1260 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/4096_stddev         39.7 ns         39.4 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/4096_cv             3.19 %          3.06 %            20

SampleUtil::copy:

----------------------------------------------------------------------------------------------
Benchmark                                                    Time             CPU   Iterations
----------------------------------------------------------------------------------------------
BM_WriteReadWholeBufferNoSkip/64                           516 ns          520 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           521 ns          525 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           522 ns          526 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           516 ns          520 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           517 ns          521 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           516 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           516 ns          520 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          520 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           516 ns          520 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64_mean                      516 ns          520 ns           20
BM_WriteReadWholeBufferNoSkip/64_median                    515 ns          519 ns           20
BM_WriteReadWholeBufferNoSkip/64_stddev                   1.99 ns         1.92 ns           20
BM_WriteReadWholeBufferNoSkip/64_cv                       0.38 %          0.37 %            20
BM_WriteReadWholeBufferNoSkip/512                          580 ns          585 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          588 ns          592 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          582 ns          586 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          581 ns          585 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          581 ns          585 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          581 ns          585 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          581 ns          585 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          583 ns          587 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          581 ns          586 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          582 ns          586 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          580 ns          585 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          580 ns          585 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          580 ns          584 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          580 ns          584 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          580 ns          584 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          580 ns          584 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          580 ns          585 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          582 ns          586 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          581 ns          584 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          581 ns          585 ns      1198103
BM_WriteReadWholeBufferNoSkip/512_mean                     581 ns          585 ns           20
BM_WriteReadWholeBufferNoSkip/512_median                   581 ns          585 ns           20
BM_WriteReadWholeBufferNoSkip/512_stddev                  1.83 ns         1.83 ns           20
BM_WriteReadWholeBufferNoSkip/512_cv                      0.31 %          0.31 %            20
BM_WriteReadWholeBufferNoSkip/4096                        1219 ns         1248 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1224 ns         1254 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1245 ns         1274 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1224 ns         1254 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1221 ns         1251 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1218 ns         1247 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1217 ns         1246 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1220 ns         1249 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1217 ns         1246 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1207 ns         1238 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1207 ns         1238 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1206 ns         1237 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1211 ns         1241 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1204 ns         1235 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1201 ns         1232 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1204 ns         1235 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1203 ns         1234 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1217 ns         1246 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1219 ns         1248 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1217 ns         1246 ns       559876
BM_WriteReadWholeBufferNoSkip/4096_mean                   1215 ns         1245 ns           20
BM_WriteReadWholeBufferNoSkip/4096_median                 1217 ns         1246 ns           20
BM_WriteReadWholeBufferNoSkip/4096_stddev                 10.2 ns         9.63 ns           20
BM_WriteReadWholeBufferNoSkip/4096_cv                     0.84 %          0.77 %            20
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          523 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 531 ns          535 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 521 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 519 ns          523 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 521 ns          525 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 521 ns          525 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 521 ns          525 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64_mean            521 ns          525 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/64_median          520 ns          524 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/64_stddev         2.46 ns         2.44 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/64_cv             0.47 %          0.46 %            20
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                590 ns          593 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                585 ns          588 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          588 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          588 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                585 ns          589 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          588 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                585 ns          588 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                585 ns          589 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          588 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512_mean           584 ns          588 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/512_median         584 ns          587 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/512_stddev        1.39 ns         1.34 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/512_cv            0.24 %          0.23 %            20
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1189 ns         1216 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1194 ns         1221 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1201 ns         1228 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1191 ns         1218 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1188 ns         1215 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1188 ns         1216 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1188 ns         1216 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1190 ns         1217 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1189 ns         1217 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1191 ns         1221 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1190 ns         1221 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1193 ns         1224 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1193 ns         1223 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1194 ns         1225 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1187 ns         1218 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1188 ns         1218 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1187 ns         1218 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1189 ns         1216 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1188 ns         1215 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1189 ns         1215 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096_mean         1190 ns         1219 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/4096_median       1189 ns         1218 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/4096_stddev       3.28 ns         3.80 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/4096_cv           0.28 %          0.31 %            20
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          528 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   532 ns          536 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   525 ns          530 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          528 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   525 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          528 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   526 ns          531 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   525 ns          530 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          528 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64_mean              525 ns          529 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/64_median            524 ns          529 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/64_stddev           1.89 ns         1.77 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/64_cv               0.36 %          0.33 %            20
BM_WriteReadWholeBufferSkipLeftCircle/512                  607 ns          611 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  616 ns          621 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  607 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  608 ns          613 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          611 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          611 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          611 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  611 ns          616 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  608 ns          613 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  609 ns          614 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512_mean             607 ns          613 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/512_median           606 ns          612 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/512_stddev          2.35 ns         2.27 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/512_cv              0.39 %          0.37 %            20
BM_WriteReadWholeBufferSkipLeftCircle/4096                1242 ns         1284 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1247 ns         1289 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1271 ns         1313 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1246 ns         1288 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1240 ns         1282 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1243 ns         1285 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1241 ns         1283 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1242 ns         1284 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1241 ns         1283 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1234 ns         1276 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1235 ns         1277 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1234 ns         1276 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1239 ns         1281 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1237 ns         1279 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1233 ns         1275 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1233 ns         1275 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1236 ns         1278 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1241 ns         1283 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1243 ns         1285 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1241 ns         1283 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096_mean           1241 ns         1283 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/4096_median         1241 ns         1283 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/4096_stddev         8.25 ns         8.29 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/4096_cv             0.66 %          0.65 %            20

@davidchocholaty
Copy link
Contributor Author

davidchocholaty commented Jul 19, 2022

I think, that I should just summarize some of your reviews and the current code look. I would like to discuss, the problem, for most of the review's indirectly points. The problem, why the current code look is in some cases a little bit over-complicated is, that to follow the same behaviour as in EngineEffectsDelay, read of uninitialized (zero values) has to be enabled. The situation may arise for example for the following workflow:

Precondition: ring delay buffer is empty with uninitialized (zero) values and the read and the write positions are zero.

  1. Write 8 samples into the ring delay buffer and read them:
index:                            0,  1,  2,  3,  4,  5,  6,  7,  8
-  -  -  -  -  -  -  -  -  -  - -----------------------------------------------
... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 0 | 0 | ...
-  -  -  -  -  -  -  -  -  -  - -----------------------------------------------
                                                                  ^
                                                                  readPos, writePos
  1. Write another 8 samples and read with delay, for example, 12 samples
index:                            0,  1,  2,  3,  4,  5,  6,  7,  8,  9,   10,  11,  ...
-  -  -  -  -  -  -  -  -  -  - ---------------------------------------------------------
... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | ...
-  -  -  -  -  -  -  -  -  -  - ---------------------------------------------------------
                  ^                                                                   ^
                  readPos                                                             writePos

In the shown situation, the uninitialized (zero) values have to be read. In EngineEffectsDelay this situation can arise for crossfading between two delays.

Now, here are two options:

  1. Allow the mentioned situation:
  • After ring delay buffer allocation all values have to be explicitly set to zero.
  • A little bit more complicated read position handling has to be used, but still the current code look can be refactorized a lot.
  • Follow the delay buffer workflow as in EngineEffectsDelay and EngineFilterDelay (here is this situation possible too).
  1. Not allow the mentioned situation:
  • The ring delay buffer doesn't behave properly for this situation and the VERIFY_OR_DEBUG_ASSERT has to be used.
  • Easier calculations for reading position.

Now, I would like to open a small discussion about this behaviour problem.

@Swiftb0y
Copy link
Member

I don't understand how it can happen that the read position is ahead of the write position. That would mean that you are trying to read more samples than given. When used in the filter delay, that would mean that there was a negative delay. Can you elaborate on the exact circumstances that would lead to this?

Irregardless of this, I still don't understand why the read position calculation has to as complicated as it currently is.

@davidchocholaty
Copy link
Contributor Author

Irregardless of this, I still don't understand why the read position calculation has to be as complicated as it currently is.

The read position calculation can be of course simplified by using modulo operation instead of AND operation. That's of course possible. I didn't mention it that strongly with the phrase _" but still the current code look can be refactorized a lot." _.

The reason, why I dwell so much on the mentioned situation is, that with simplification this situation must be taken into account. Based on your's and daschuer's reviews, the code can be simplified of course a lot, but IMO it makes sense to dig through simplification with the final decision for the mentioned situation because it is not at all common for a ring buffer as known and doesn't make sense to reimplement something without clear vision what it can and can't do.

@davidchocholaty
Copy link
Contributor Author

davidchocholaty commented Jul 21, 2022

I don't understand how it can happen that the read position is ahead of the write position. That would mean that you are trying to read more samples than given. When used in the filter delay, that would mean that there was a negative delay. Can you elaborate on the exact circumstances that would lead to this?

Yeah, I understand that it is not easy to understand this situation without all the cases around. I think, that the best will be an example with calculations from the EngineEffectsDelay::process method. Let's assume the example from the previous drawing. Maybe it would be cool to have engineeffectsdelay.cpp code open, I will follow it.

Just for info, the EngineEffectsDelay::process code isn't exactly the same as RingDelayBuffer one-by-one, but if this situation is possible, using RingDelayBuffer for the mentioned method, the ring buffer should handle this situation.

For the following examples, I will assume kMaxDelay = 192000, but it doesn't matter now.

So, the first process call (zero delay, write 8 samples, then read 8 samples):

int delaySourcePos =
            (m_delayBufferWritePos + kMaxDelay - m_currentDelaySamples) %
            kMaxDelay;

delaySourcePos = (0 + 192000 - 0) % 192000 = 0

int oldDelaySourcePos =
                (m_delayBufferWritePos + kMaxDelay - m_prevDelaySamples) %
                kMaxDelay;

oldDelaySourcePos = (0 + 192000 - 0) % 192000 = 0

The second process call (delay 12 samples (a possible situation, but shouldn't be so common), write another 8 samples and read 8 samples):

int delaySourcePos =
            (m_delayBufferWritePos + kMaxDelay - m_currentDelaySamples) %
            kMaxDelay;

delaySourcePos = (8 + 192000 - 12) % 192000 = 191996 % 192000 = 191996

int oldDelaySourcePos =
                (m_delayBufferWritePos + kMaxDelay - m_prevDelaySamples) %
                kMaxDelay;

oldDelaySourcePos = (8 + 192000 - 0) % 192000 = 8

This is the situation, which I have in mind because at the start with a clear buffer, there isn't written data on 191996 index. Please, let me now if it is a little bit more clearer. I would like to know, "if we are on the same page" with this problem.

@Swiftb0y
Copy link
Member

Ah, I think understand. Yes in that case we should just have the buffer pre-filled with zeros IMO.

@davidchocholaty
Copy link
Contributor Author

Okay, perfect. Thank you. So now when we agreed to allow the mentioned special case, I will simplify all the calculations asap.

@daschuer
Copy link
Member

daschuer commented Jul 23, 2022

When it happens that you read out the initial zeros, you will hear a click sound when the real samples starts. This can be avoided by fading in the fist input buffer.

@davidchocholaty
Copy link
Contributor Author

When it happens that you read out the initial zeros, you will hear a click sound when the real samples starts. This can be avoided by fading in the fist input buffer.

Good point. Thank you for this tip.

This commit removes invasive manipulation with handling sizes
and memsetting. Instead, the fill method is used.
This commit replaces manual handling with memcpy
by using SampleUtil::copy. This provides using a vectorized loop
for 32-bit SSE instead of memcpy (based on benchmarking).
Copy link
Member

@daschuer daschuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thank you.
I have added some suggestions for improvements.

m_buffer.fill(0);
}

void RingDelayBuffer::copy(const ReadableSlice pSourceBuffer,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a free function now, we this can go to an an anonymous namespace or if we think we can use it elsewhere, it can be moved sample.cpp.
This is not a plain copy it is a copyRing() or such.

Does it work if the source sourcePos AND destPos are not 0? I think not. We may either assert that or add the case with three copy calls.

Copy link
Contributor Author

@davidchocholaty davidchocholaty Aug 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mentioned problems should be fixed, to summarise:

  • copy() renamed to copyRing()
  • copyRing() is moved into an anonymous namespace and serves as a helper function only

Does it work if the source sourcePos AND destPos are not 0? I think not. We may either assert that or add the case with three copy calls.

I think I understand, what you mean. IMO for the case that such a huge numItems value will be provided, which will be many times greater than the size of the source and destination buffer too, it would be needed much more copies than just three. It cannot occur for the actual usage but may be possible. Based on that multiple copies would not be used now, I would prefer the assert-way solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an other (unused) case in mind for three copies: Copy from a ring to a ring.

Source:
123456789

Destination
123456789

Copy 8 samples
Read pointer at 5 write pointer at 7
Copy
5...7 to 7...9 (3)
7...9 to 1..3 (3)
1..2 to 4..5

This can't happen if one of the pointers points to the start.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asserting that this not happens works for me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I missed that you did it already.
Thank you.

SINT RingDelayBuffer::read(CSAMPLE* pBuffer, const SINT itemsToRead, const SINT delayItems) {
const SINT shift = itemsToRead + delayItems;

if (shift > m_buffer.size()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (shift > m_buffer.size()) {
VERIFY_OR_DEBUG_ASSERT(shift <= m_buffer.size()) {

}

SINT RingDelayBuffer::write(const CSAMPLE* pBuffer, const SINT itemsToWrite) {
if (itemsToWrite > m_buffer.size()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (itemsToWrite > m_buffer.size()) {
VERIFY_OR_DEBUG_ASSERT(itemsToWrite <= m_buffer.size()) {

src/util/ringdelaybuffer.cpp Outdated Show resolved Hide resolved
This commit replaces the basic if statements for testing
the invalid amount of item values with VERIFY_OR_DEBUG_ASSERT.
This commit renames the RingDelayBuffer::copy function
into RingDelayBuffer:copyRing.
The commit moves the RingDelayBuffer::copyRing into an anonymous
namespace and removes the function from the RingDelayBuffer class.
So, the function serves as a helper function only.
The commit adds handling structure for the situation when
the number of items to copy causes, that both buffers have to cross
their upper bounds and circle around at least once. For the current
RingDelayBuffer::copyRing usage this situation is not required, so,
the VERIFY_OR_DEBUG_ASSERT is added.
src/util/ringdelaybuffer.cpp Outdated Show resolved Hide resolved
Comment on lines 51 to 59
using ReadableSlice = mixxx::SampleBuffer::ReadableSlice;
using WritableSlice = mixxx::SampleBuffer::WritableSlice;

namespace {
SINT copyRing(const ReadableSlice pSourceBuffer,
SINT sourcePos,
const WritableSlice pDestBuffer,
SINT destPos,
const SINT numItems) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to deprecate mixxx::SampleBuffer::*Slice. Why not use std::span instead?

Copy link
Contributor Author

@davidchocholaty davidchocholaty Aug 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, std::span can be used. I thought, that I should rather work with span through SampleBuffer and forgot, that I can call the mixxx::spanutil::spanFromPtrLen() directly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, you just use spanFromPtrLen as an adapter so to speak and then try to use std::span as much as possible in any APIs that take the usual (pointer, size) pair.

src/util/ringdelaybuffer.cpp Outdated Show resolved Hide resolved
src/util/ringdelaybuffer.cpp Show resolved Hide resolved
src/util/ringdelaybuffer.cpp Outdated Show resolved Hide resolved
This commit adds the default value assignment for copiedItems variable.
The initialization is added to avoid potentially uninitialized memory.
This commits tries to avoid the uninitialized memory due to creating
the variable before the if-else statement without assigning
the default value. One solution could be assigning the default value,
anyway, some IDEs can warn that the variable is unused and primarily
it will hide real errors like that the variable is not set
in one branch. Based on that, the lambda solution is used.
This commit replaces the mixxx::SampleBuffer::ReadableSlice
and mixxx::SampleBuffer::WritableSlice with std::span
and the mixxx::spanutil::spanFromPtrLen helper function.
This commit renames the pSourceBuffer and pDestBuffer in copyRing
function onto sourceBuffer and destBuffer without the use
of the 'p' prefix, which is used for pointers.
This commit introduces the std::span in RingDelayBufferTest.
The span is primarily used
in the RingDelayBufferTest::AssertIdenticalBufferEquals
and for all calls of this function.
@davidchocholaty
Copy link
Contributor Author

Just a little sum up. To make this PR ready, the last thing from my side is to finish the documentation and descriptive comments, if the previous changes will pass your reviews. The last thing, that I have on my mind is, that the mentioned licence for mixxx/lib/portaudio/pa_ringbuffer.c is likely no longer required due to major previous changes and the creation of a new implementation during this PR. What do you think?

This commit renames the m_firstInputBuffer onto m_firstInputChunk
due to it better describes the nature of the situation. Actually,
in a general view, the user code has not had to pass
to the RingDelayBuffer::write function the input buffer at once
but could break it into small chunks.
This commit adds the documentation comments for the RingDelayBuffer
class and adds some description comments for the code parts
for better understanding.
This commit improves the const-correctness by making
the RingDelayBuffer::size function returned value
as a constant expression due to it can be evaluated
in the compile time.
@Swiftb0y
Copy link
Member

Just a little sum up. To make this PR ready, the last thing from my side is to finish the documentation and descriptive comments, if the previous changes will pass your reviews. The last thing, that I have on my mind is, that the mentioned licence for mixxx/lib/portaudio/pa_ringbuffer.c is likely no longer required due to major previous changes and the creation of a new implementation during this PR. What do you think?

IANAL, but I don't think its required anymore...

This commits introduces std::span in the RingDelayBuffer API. So,
the parameter for RingDelayBuffer::read is only the destination buffer
(using span) and delay items. The RingDelayBuffer::write has
just one parameter, the source buffer (span as well). Depended tests
are upgraded using span in the function callings too.
This commit removes the licence
for the mixxx/lib/portaudio/pa_ringbuffer.c. The licence is no
longer required due to major previous changes and the creation
of a new implementation.
Copy link
Member

@Swiftb0y Swiftb0y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the nitpicks again....

src/test/ringdelaybuffer_test.cpp Outdated Show resolved Hide resolved
This commit changes to avoid creating a span from the SampleBuffer's
data using SampleBuffer::data and the implemented SampleBuffer::span
function is used instead.
@davidchocholaty
Copy link
Contributor Author

sorry for the nitpicks again....

Absolutely no problem, this is how I should have written it the first time, I just did it too much automatically. Now, it should be fixed. The indent was kept to make it a little bit more readable.

Copy link
Member

@Swiftb0y Swiftb0y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll go ahead and merge now so you can build of this.

@Swiftb0y Swiftb0y merged commit c9d9581 into mixxxdj:main Aug 29, 2022
@davidchocholaty
Copy link
Contributor Author

Perfect, thank you very much for merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants