RingDelayBuffer: ring buffer for delay handling #4852

davidchocholaty · 2022-07-12T13:57:57Z

The new ring buffer is introduced. The use case for the ring buffer is
for delay handling, but it also could be used as a classic ring buffer
with a jumping option with the reading position. It is based
on the classic known ring buffer. The extensions are,
that the ring delay buffer allows moving with the reading position
subject to certain rules. Another difference between the classic
ring buffer is, that the ring delay buffer offers to read zero values,
which were not written by using the write method and write position.
Both of these two specific properties are based on the cross-fading
between changes of two delays.

This commit adds a new ring buffer. This ring buffer serves for handling a delay. For the first use case was specially created for the EngineEffectsDelay, which handles the delay, of the effects (in a chain). The ring delay buffer allows moving with the reading position subject to certain rules. Another difference between the classic ring buffer is, that the ring delay buffer offers to read zero values which were not written by using the write method and write position. Both of these two specific properties are based on the cross-fading between changes of two delays and the classic ring buffer cannot be used.

The commit adds tests for the ring buffer for handling delay. The test set includes tests for testing the RingDelayBuffer::isEmpty, RingDelayBuffer::isFull, RingDelayBuffer::clear, then tests for checking the number of available items for reading and writing and at last tests for reading and writing without skipping the position with the read position and including skip with the read position on both sides and with both variants (circle around / not circle around).

The commit adds the original license for the ring buffer from the Portable Audio I/O Library. The ring buffer from the pa_ringbuffer.c was used as a template for some parts of code in RingDelayBuffer. Based on that, the original license is added to cpp file and header. In the header, the modified and added functions are described.

This commit adds clearing of the m_jumpLeftAroundMask. When the RingDelayBuffer::clear method is called, the mentioned variable for masking, when the left side jump crossed the left side of the delay buffer, is set to zero.

This commit allows, that the size of the jump with the reading position to the right can be equal to the number of reading available items.

This commit solves the maximum size of the jump to the left for the reading position. Based on that, the comments for ASSERTs are updated and the zero size jump is handled separately.

The commit adds benchmarks for testing the RingDelayBuffer. The benchmarks test the RingDelayBuffer::write, RingDelayBuffer::read and RingDelayBuffer::moveReadPositionBy for the case without skipping, the case with a jump to the left without circling and jump to the left with circling around.

Swiftb0y · 2022-07-12T14:29:27Z

Thanks. Before I go into detail reviewing this. it would make sense to first introduce the std::/gsl::span shim we discussed.

src/util/ringdelaybuffer.cpp

src/util/ringdelaybuffer.h

daschuer

I have added some thoughts

daschuer · 2022-07-14T05:54:49Z

src/test/ringdelaybuffer_test.cpp

+class RingDelayBufferTest : public MixxxTest {
+  protected:
+    void SetUp() override {
+        m_pRingDelayBuffer = new RingDelayBuffer(m_ringDelayBufferSize);


this can become an unique_ptr

daschuer · 2022-07-14T05:58:32Z

src/util/ringdelaybuffer.cpp

+          m_ringMask(bufferSize - 1),
+          m_jumpLeftAroundMask(0),
+          m_buffer(bufferSize) {
+    // TODO(davidchocholaty) Handle to allow only power of two for the size of the ring delay buffer.


You probably already now this function:

mixxx/src/util/math.h

Line 41 in 9b9fbaa

inline int roundUpToPowerOf2(int v) {

IMO not needed, also there is std::bit_ceil in C++20's <bit> header. It's constexpr and also likely faster.

Benchmark supports that its at least equal, though I guess the performance of bit_ceil depends very much on whats available on the target architecture (which you can't specify on Quickbench). https://quick-bench.com/q/3FOpFZ_ZaAfJOtWWl1Xx7MFC7JQ

Thank you both for your recommendations. Anyway, after code refactorization which removes the binary operations, the "power of two" condition is no longer required.

daschuer · 2022-07-14T06:00:25Z

src/util/ringdelaybuffer.cpp

+
+    // Set the ring delay buffer items to 0.
+    CSAMPLE* bufferData = m_buffer.data();
+    memset(bufferData, 0, sizeof(*bufferData) * bufferSize);


This should be not necessary. The buffer should never return uninitialized zeros. In out of memory situation the usage code needs to fade to zero to avoid clicks.

Initializing values on zero was created, due to it is possible to move with m_readPos to left over the left delay buffer bound. For the current code version, it is allowed in the beginning too. Depending on that, the read position can read values, which weren't written by m_writePos. If I'm thinking about it, not a good practice. It is possible to allow skipping to the left only if the items were written by m_writePos with VERIFY_OR_DEBUG_ASSERT.

daschuer · 2022-07-14T06:12:37Z

src/util/ringdelaybuffer.cpp

+        itemsToRead = available;
+    }
+
+    const SINT position = m_readPos & m_ringMask;


I am not sure if the concept of a member variable m_readPos is still suitable.

For my understanding we have the current sample at m_writePos - 1 and the delayed samples before that so the actual read position is m_writePos - "current delay" - itemsToRead

We need to make sure this does not hit the end of the buffer.

Idea: Pass the delay along to the read function?

Thank you so much for this idea. Sounds cool and I think it cleans up the code a lot. I have just one thing on my mind for two cases. Based on the proposed new version, the ring delay buffer will behave a little bit differently.

If, for example, 8 samples will be written into the ring delay buffer, and then the different amount of samples will be read with zero delays, just propose 4 samples, so, for the previous version, the ring buffer will read the first half of the written samples by index 0, however, the new version will read the second half of the written samples (8 - 0 - 4).

If the write method will be called two times in a row without calling the read method, the situation will be quite similar to the first one.

For our future use case for EngineEffectsDelay, this cannot happen. Anyway, if I will think only about RingDelayBuffer, it may be possible. What do you think about that? I don't want to throw down somehow this idea, I really like it and I would like to use it. I'm only thinking about possible cases, which can happen.

I would like to re-open a discussion about this idea. I would like to discuss just one thing which I noticed. If the new version will be used, then a lot of checks of delay value have to be run for every read() call, probably all code in moveReadPositionBy(). Can I ask what is your view on this situation?

Swiftb0y · 2022-07-14T21:11:39Z

Another inspiration for a singlethreaded Ringbuffer: https://hg.sr.ht/~breakfastquay/rubberband/browse/src/common/SingleThreadRingBuffer.h?rev=tip

This commit removes unnecessary explicitly created virtual destructor.

The commit removes the variable for storing the size of the delay buffer. Instead of the mentioned variable, the SampleBuffer::size function is used, due to the type of the delay buffer, which is mixxx::SampleBuffer.

This commit improves const correctness a few functions and parameters. The newly const is added to functions: RingDelayBuffer::isFull, RingDelayBuffer::getReadAvailable, RingDelayBuffer::getWriteAvailable. Then the numItems parameter for RingDelayBuffer::read and RingDelayBuffer::write is const and the jumpSize parameter for RingDelayBuffer::moveReadPositionBy.

The commit removes the information, that the ring buffer is safe for single-thread only. This information is a rewrite before the class.

This commit replaces the use of RingDelayBuffer* with unique_ptr.

davidchocholaty · 2022-07-15T12:14:03Z

Another inspiration for a singlethreaded Ringbuffer: https://hg.sr.ht/~breakfastquay/rubberband/browse/src/common/SingleThreadRingBuffer.h?rev=tip

Thank you for this tip.

davidchocholaty · 2022-07-19T08:16:21Z

I would like to post a little info about the possible choice of using only memcpy for RingDelayBuffer or SampleUtil::copy instead (vectorized loop for SSE on 32-bit). Because my system uses 64-bit system (Ubuntu 22.04 LTS) and CPU (Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz) offers SSE and SSE2, the memcpy will be used for SampleUtil::copy too. I used benchmarks with standard deviation. The statistics including standard deviation can be computed as ./mixxx-test --benchmark --benchmark_repetitions=20 --benchmark_filter=BM_WriteReadWholeBuffer for only RingDelayBuffer benchmarks and 20 repetitions for example. As I expected, on my system the SampleUtil::copy version takes a little bit more time (function calls, evaluation of conditions, etc.). Another choice could be std::copy, but IMO the current RingDelayBuffer look can't be used and based on the survey from SampleUtil::copy takes the most time. I would like to ask you, which version from your point of view you prefer because we talked with @Swiftb0y that we should also take into account the standard deviation of benchmark results, but I can't test it for vectorized loop version. For interest the result of benchmarks:

memcpy:

----------------------------------------------------------------------------------------------
Benchmark                                                    Time             CPU   Iterations
----------------------------------------------------------------------------------------------
BM_WriteReadWholeBufferNoSkip/64                           517 ns          523 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           517 ns          523 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          520 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          520 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          520 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          519 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           513 ns          519 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           515 ns          521 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           516 ns          522 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          520 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           515 ns          520 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          519 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          520 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          519 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          520 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           513 ns          519 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          520 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           513 ns          519 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           514 ns          519 ns      1317352
BM_WriteReadWholeBufferNoSkip/64                           515 ns          521 ns      1317352
BM_WriteReadWholeBufferNoSkip/64_mean                      514 ns          520 ns           20
BM_WriteReadWholeBufferNoSkip/64_median                    514 ns          520 ns           20
BM_WriteReadWholeBufferNoSkip/64_stddev                   1.23 ns         1.14 ns           20
BM_WriteReadWholeBufferNoSkip/64_cv                       0.24 %          0.22 %            20
BM_WriteReadWholeBufferNoSkip/512                          589 ns          593 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          596 ns          600 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          594 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          590 ns          594 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          593 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          593 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          590 ns          595 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          593 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          593 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          592 ns          596 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          590 ns          595 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          594 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          594 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          594 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          594 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          594 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          594 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          591 ns          596 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          589 ns          593 ns      1180062
BM_WriteReadWholeBufferNoSkip/512                          590 ns          595 ns      1180062
BM_WriteReadWholeBufferNoSkip/512_mean                     590 ns          594 ns           20
BM_WriteReadWholeBufferNoSkip/512_median                   589 ns          594 ns           20
BM_WriteReadWholeBufferNoSkip/512_stddev                  1.58 ns         1.58 ns           20
BM_WriteReadWholeBufferNoSkip/512_cv                      0.27 %          0.27 %            20
BM_WriteReadWholeBufferNoSkip/4096                        1205 ns         1242 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1204 ns         1241 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1224 ns         1262 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1207 ns         1244 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1210 ns         1247 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1206 ns         1243 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1207 ns         1244 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1211 ns         1248 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1207 ns         1244 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1220 ns         1258 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1221 ns         1259 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1219 ns         1257 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1225 ns         1263 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1224 ns         1263 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1220 ns         1258 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1221 ns         1259 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1219 ns         1257 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1200 ns         1238 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1199 ns         1237 ns       560769
BM_WriteReadWholeBufferNoSkip/4096                        1200 ns         1238 ns       560769
BM_WriteReadWholeBufferNoSkip/4096_mean                   1213 ns         1250 ns           20
BM_WriteReadWholeBufferNoSkip/4096_median                 1210 ns         1247 ns           20
BM_WriteReadWholeBufferNoSkip/4096_stddev                 9.00 ns         9.21 ns           20
BM_WriteReadWholeBufferNoSkip/4096_cv                     0.74 %          0.74 %            20
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          522 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 529 ns          534 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          522 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          522 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          522 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          522 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          522 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          523 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          523 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 519 ns          525 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          523 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          523 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          523 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          523 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          523 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          523 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 519 ns          524 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          522 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 516 ns          522 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 517 ns          522 ns      1340491
BM_WriteReadWholeBufferSkipLeftNoCircle/64_mean            518 ns          523 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/64_median          517 ns          523 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/64_stddev         2.83 ns         2.77 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/64_cv             0.55 %          0.53 %            20
BM_WriteReadWholeBufferSkipLeftNoCircle/512                582 ns          587 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                595 ns          600 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          589 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                582 ns          588 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                582 ns          588 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          588 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                582 ns          588 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                582 ns          587 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                582 ns          588 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                586 ns          592 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          589 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          588 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          589 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          589 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          588 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                582 ns          588 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                582 ns          588 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                596 ns          602 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                581 ns          586 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512                581 ns          586 ns      1191962
BM_WriteReadWholeBufferSkipLeftNoCircle/512_mean           584 ns          589 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/512_median         583 ns          588 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/512_stddev        4.12 ns         4.10 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/512_cv            0.71 %          0.70 %            20
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1163 ns         1202 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1173 ns         1210 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1177 ns         1217 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1168 ns         1208 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1162 ns         1202 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1164 ns         1203 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1164 ns         1203 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1162 ns         1202 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1162 ns         1202 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1193 ns         1231 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1196 ns         1234 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1193 ns         1230 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1203 ns         1241 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1195 ns         1233 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1197 ns         1235 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1197 ns         1234 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1194 ns         1231 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1162 ns         1202 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1163 ns         1202 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1162 ns         1202 ns       582032
BM_WriteReadWholeBufferSkipLeftNoCircle/4096_mean         1177 ns         1216 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/4096_median       1170 ns         1209 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/4096_stddev       16.0 ns         15.1 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/4096_cv           1.36 %          1.24 %            20
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   531 ns          537 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   520 ns          526 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   520 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   527 ns          532 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   527 ns          533 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   520 ns          526 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   521 ns          527 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   522 ns          529 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          526 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          526 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   520 ns          526 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64                   519 ns          525 ns      1333871
BM_WriteReadWholeBufferSkipLeftCircle/64_mean              521 ns          527 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/64_median            519 ns          526 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/64_stddev           3.32 ns         3.27 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/64_cv               0.64 %          0.62 %            20
BM_WriteReadWholeBufferSkipLeftCircle/512                  591 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  599 ns          605 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  592 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  593 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  593 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  592 ns          597 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  594 ns          599 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  591 ns          597 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  592 ns          597 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  591 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  593 ns          599 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  592 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  592 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  592 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  591 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  591 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  591 ns          598 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  594 ns          600 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  591 ns          597 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512                  591 ns          597 ns      1171693
BM_WriteReadWholeBufferSkipLeftCircle/512_mean             592 ns          598 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/512_median           592 ns          598 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/512_stddev          1.84 ns         1.83 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/512_cv              0.31 %          0.31 %            20
BM_WriteReadWholeBufferSkipLeftCircle/4096                1210 ns         1255 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1216 ns         1260 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1224 ns         1268 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1215 ns         1260 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1210 ns         1254 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1209 ns         1254 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1209 ns         1254 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1210 ns         1255 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1210 ns         1255 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1289 ns         1334 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1289 ns         1333 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1288 ns         1332 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1301 ns         1345 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1289 ns         1333 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1289 ns         1333 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1289 ns         1333 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1288 ns         1332 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1210 ns         1254 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1210 ns         1254 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096                1210 ns         1255 ns       557719
BM_WriteReadWholeBufferSkipLeftCircle/4096_mean           1243 ns         1288 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/4096_median         1216 ns         1260 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/4096_stddev         39.7 ns         39.4 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/4096_cv             3.19 %          3.06 %            20

SampleUtil::copy:

----------------------------------------------------------------------------------------------
Benchmark                                                    Time             CPU   Iterations
----------------------------------------------------------------------------------------------
BM_WriteReadWholeBufferNoSkip/64                           516 ns          520 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           521 ns          525 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           522 ns          526 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           516 ns          520 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           517 ns          521 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           516 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           516 ns          520 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          520 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           516 ns          520 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64                           515 ns          519 ns      1305947
BM_WriteReadWholeBufferNoSkip/64_mean                      516 ns          520 ns           20
BM_WriteReadWholeBufferNoSkip/64_median                    515 ns          519 ns           20
BM_WriteReadWholeBufferNoSkip/64_stddev                   1.99 ns         1.92 ns           20
BM_WriteReadWholeBufferNoSkip/64_cv                       0.38 %          0.37 %            20
BM_WriteReadWholeBufferNoSkip/512                          580 ns          585 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          588 ns          592 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          582 ns          586 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          581 ns          585 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          581 ns          585 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          581 ns          585 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          581 ns          585 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          583 ns          587 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          581 ns          586 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          582 ns          586 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          580 ns          585 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          580 ns          585 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          580 ns          584 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          580 ns          584 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          580 ns          584 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          580 ns          584 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          580 ns          585 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          582 ns          586 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          581 ns          584 ns      1198103
BM_WriteReadWholeBufferNoSkip/512                          581 ns          585 ns      1198103
BM_WriteReadWholeBufferNoSkip/512_mean                     581 ns          585 ns           20
BM_WriteReadWholeBufferNoSkip/512_median                   581 ns          585 ns           20
BM_WriteReadWholeBufferNoSkip/512_stddev                  1.83 ns         1.83 ns           20
BM_WriteReadWholeBufferNoSkip/512_cv                      0.31 %          0.31 %            20
BM_WriteReadWholeBufferNoSkip/4096                        1219 ns         1248 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1224 ns         1254 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1245 ns         1274 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1224 ns         1254 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1221 ns         1251 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1218 ns         1247 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1217 ns         1246 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1220 ns         1249 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1217 ns         1246 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1207 ns         1238 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1207 ns         1238 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1206 ns         1237 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1211 ns         1241 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1204 ns         1235 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1201 ns         1232 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1204 ns         1235 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1203 ns         1234 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1217 ns         1246 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1219 ns         1248 ns       559876
BM_WriteReadWholeBufferNoSkip/4096                        1217 ns         1246 ns       559876
BM_WriteReadWholeBufferNoSkip/4096_mean                   1215 ns         1245 ns           20
BM_WriteReadWholeBufferNoSkip/4096_median                 1217 ns         1246 ns           20
BM_WriteReadWholeBufferNoSkip/4096_stddev                 10.2 ns         9.63 ns           20
BM_WriteReadWholeBufferNoSkip/4096_cv                     0.84 %          0.77 %            20
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          523 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 531 ns          535 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 521 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 519 ns          523 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 521 ns          525 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 521 ns          525 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 521 ns          525 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64                 520 ns          524 ns      1319954
BM_WriteReadWholeBufferSkipLeftNoCircle/64_mean            521 ns          525 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/64_median          520 ns          524 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/64_stddev         2.46 ns         2.44 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/64_cv             0.47 %          0.46 %            20
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                590 ns          593 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                585 ns          588 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          588 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          588 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                585 ns          589 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          588 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                585 ns          588 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                585 ns          589 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                584 ns          588 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512                583 ns          587 ns      1192974
BM_WriteReadWholeBufferSkipLeftNoCircle/512_mean           584 ns          588 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/512_median         584 ns          587 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/512_stddev        1.39 ns         1.34 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/512_cv            0.24 %          0.23 %            20
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1189 ns         1216 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1194 ns         1221 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1201 ns         1228 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1191 ns         1218 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1188 ns         1215 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1188 ns         1216 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1188 ns         1216 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1190 ns         1217 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1189 ns         1217 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1191 ns         1221 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1190 ns         1221 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1193 ns         1224 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1193 ns         1223 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1194 ns         1225 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1187 ns         1218 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1188 ns         1218 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1187 ns         1218 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1189 ns         1216 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1188 ns         1215 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096              1189 ns         1215 ns       573397
BM_WriteReadWholeBufferSkipLeftNoCircle/4096_mean         1190 ns         1219 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/4096_median       1189 ns         1218 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/4096_stddev       3.28 ns         3.80 ns           20
BM_WriteReadWholeBufferSkipLeftNoCircle/4096_cv           0.28 %          0.31 %            20
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          528 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   532 ns          536 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   525 ns          530 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          528 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   525 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          528 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   526 ns          531 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   525 ns          530 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          528 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64                   524 ns          529 ns      1324806
BM_WriteReadWholeBufferSkipLeftCircle/64_mean              525 ns          529 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/64_median            524 ns          529 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/64_stddev           1.89 ns         1.77 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/64_cv               0.36 %          0.33 %            20
BM_WriteReadWholeBufferSkipLeftCircle/512                  607 ns          611 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  616 ns          621 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  607 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  608 ns          613 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          611 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          611 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          611 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  611 ns          616 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  608 ns          613 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  609 ns          614 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512                  606 ns          612 ns      1146359
BM_WriteReadWholeBufferSkipLeftCircle/512_mean             607 ns          613 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/512_median           606 ns          612 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/512_stddev          2.35 ns         2.27 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/512_cv              0.39 %          0.37 %            20
BM_WriteReadWholeBufferSkipLeftCircle/4096                1242 ns         1284 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1247 ns         1289 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1271 ns         1313 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1246 ns         1288 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1240 ns         1282 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1243 ns         1285 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1241 ns         1283 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1242 ns         1284 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1241 ns         1283 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1234 ns         1276 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1235 ns         1277 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1234 ns         1276 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1239 ns         1281 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1237 ns         1279 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1233 ns         1275 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1233 ns         1275 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1236 ns         1278 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1241 ns         1283 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1243 ns         1285 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096                1241 ns         1283 ns       546739
BM_WriteReadWholeBufferSkipLeftCircle/4096_mean           1241 ns         1283 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/4096_median         1241 ns         1283 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/4096_stddev         8.25 ns         8.29 ns           20
BM_WriteReadWholeBufferSkipLeftCircle/4096_cv             0.66 %          0.65 %            20

davidchocholaty · 2022-07-19T10:47:59Z

I think, that I should just summarize some of your reviews and the current code look. I would like to discuss, the problem, for most of the review's indirectly points. The problem, why the current code look is in some cases a little bit over-complicated is, that to follow the same behaviour as in EngineEffectsDelay, read of uninitialized (zero values) has to be enabled. The situation may arise for example for the following workflow:

Precondition: ring delay buffer is empty with uninitialized (zero) values and the read and the write positions are zero.

Write 8 samples into the ring delay buffer and read them:

index:                            0,  1,  2,  3,  4,  5,  6,  7,  8
-  -  -  -  -  -  -  -  -  -  - -----------------------------------------------
... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 0 | 0 | ...
-  -  -  -  -  -  -  -  -  -  - -----------------------------------------------
                                                                  ^
                                                                  readPos, writePos

Write another 8 samples and read with delay, for example, 12 samples

index:                            0,  1,  2,  3,  4,  5,  6,  7,  8,  9,   10,  11,  ...
-  -  -  -  -  -  -  -  -  -  - ---------------------------------------------------------
... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | ...
-  -  -  -  -  -  -  -  -  -  - ---------------------------------------------------------
                  ^                                                                   ^
                  readPos                                                             writePos

In the shown situation, the uninitialized (zero) values have to be read. In EngineEffectsDelay this situation can arise for crossfading between two delays.

Now, here are two options:

Allow the mentioned situation:

After ring delay buffer allocation all values have to be explicitly set to zero.
A little bit more complicated read position handling has to be used, but still the current code look can be refactorized a lot.
Follow the delay buffer workflow as in EngineEffectsDelay and EngineFilterDelay (here is this situation possible too).

Not allow the mentioned situation:

The ring delay buffer doesn't behave properly for this situation and the VERIFY_OR_DEBUG_ASSERT has to be used.
Easier calculations for reading position.

Now, I would like to open a small discussion about this behaviour problem.

Swiftb0y · 2022-07-20T14:54:19Z

I don't understand how it can happen that the read position is ahead of the write position. That would mean that you are trying to read more samples than given. When used in the filter delay, that would mean that there was a negative delay. Can you elaborate on the exact circumstances that would lead to this?

Irregardless of this, I still don't understand why the read position calculation has to as complicated as it currently is.

davidchocholaty · 2022-07-21T13:38:26Z

Irregardless of this, I still don't understand why the read position calculation has to be as complicated as it currently is.

The read position calculation can be of course simplified by using modulo operation instead of AND operation. That's of course possible. I didn't mention it that strongly with the phrase _" but still the current code look can be refactorized a lot." _.

The reason, why I dwell so much on the mentioned situation is, that with simplification this situation must be taken into account. Based on your's and daschuer's reviews, the code can be simplified of course a lot, but IMO it makes sense to dig through simplification with the final decision for the mentioned situation because it is not at all common for a ring buffer as known and doesn't make sense to reimplement something without clear vision what it can and can't do.

davidchocholaty · 2022-07-21T13:48:21Z

I don't understand how it can happen that the read position is ahead of the write position. That would mean that you are trying to read more samples than given. When used in the filter delay, that would mean that there was a negative delay. Can you elaborate on the exact circumstances that would lead to this?

Yeah, I understand that it is not easy to understand this situation without all the cases around. I think, that the best will be an example with calculations from the EngineEffectsDelay::process method. Let's assume the example from the previous drawing. Maybe it would be cool to have engineeffectsdelay.cpp code open, I will follow it.

Just for info, the EngineEffectsDelay::process code isn't exactly the same as RingDelayBuffer one-by-one, but if this situation is possible, using RingDelayBuffer for the mentioned method, the ring buffer should handle this situation.

For the following examples, I will assume kMaxDelay = 192000, but it doesn't matter now.

So, the first process call (zero delay, write 8 samples, then read 8 samples):

int delaySourcePos =
            (m_delayBufferWritePos + kMaxDelay - m_currentDelaySamples) %
            kMaxDelay;

delaySourcePos = (0 + 192000 - 0) % 192000 = 0

int oldDelaySourcePos =
                (m_delayBufferWritePos + kMaxDelay - m_prevDelaySamples) %
                kMaxDelay;

oldDelaySourcePos = (0 + 192000 - 0) % 192000 = 0

The second process call (delay 12 samples (a possible situation, but shouldn't be so common), write another 8 samples and read 8 samples):

int delaySourcePos =
            (m_delayBufferWritePos + kMaxDelay - m_currentDelaySamples) %
            kMaxDelay;

delaySourcePos = (8 + 192000 - 12) % 192000 = 191996 % 192000 = 191996

int oldDelaySourcePos =
                (m_delayBufferWritePos + kMaxDelay - m_prevDelaySamples) %
                kMaxDelay;

oldDelaySourcePos = (8 + 192000 - 0) % 192000 = 8

This is the situation, which I have in mind because at the start with a clear buffer, there isn't written data on 191996 index. Please, let me now if it is a little bit more clearer. I would like to know, "if we are on the same page" with this problem.

Swiftb0y · 2022-07-21T18:30:41Z

Ah, I think understand. Yes in that case we should just have the buffer pre-filled with zeros IMO.

davidchocholaty · 2022-07-22T14:17:16Z

Okay, perfect. Thank you. So now when we agreed to allow the mentioned special case, I will simplify all the calculations asap.

daschuer · 2022-07-23T03:17:08Z

When it happens that you read out the initial zeros, you will hear a click sound when the real samples starts. This can be avoided by fading in the fist input buffer.

davidchocholaty · 2022-07-29T05:39:13Z

When it happens that you read out the initial zeros, you will hear a click sound when the real samples starts. This can be avoided by fading in the fist input buffer.

Good point. Thank you for this tip.

This commit removes invasive manipulation with handling sizes and memsetting. Instead, the fill method is used.

This commit replaces manual handling with memcpy by using SampleUtil::copy. This provides using a vectorized loop for 32-bit SSE instead of memcpy (based on benchmarking).

daschuer

Cool, thank you.
I have added some suggestions for improvements.

daschuer · 2022-08-21T18:03:51Z

src/util/ringdelaybuffer.cpp

+    m_buffer.fill(0);
+}
+
+void RingDelayBuffer::copy(const ReadableSlice pSourceBuffer,


Since this is a free function now, we this can go to an an anonymous namespace or if we think we can use it elsewhere, it can be moved sample.cpp.
This is not a plain copy it is a copyRing() or such.

Does it work if the source sourcePos AND destPos are not 0? I think not. We may either assert that or add the case with three copy calls.

The mentioned problems should be fixed, to summarise:

copy() renamed to copyRing()

copyRing() is moved into an anonymous namespace and serves as a helper function only

Does it work if the source sourcePos AND destPos are not 0? I think not. We may either assert that or add the case with three copy calls.

I think I understand, what you mean. IMO for the case that such a huge numItems value will be provided, which will be many times greater than the size of the source and destination buffer too, it would be needed much more copies than just three. It cannot occur for the actual usage but may be possible. Based on that multiple copies would not be used now, I would prefer the assert-way solution.

I have an other (unused) case in mind for three copies: Copy from a ring to a ring.

Source:
123456789

Destination
123456789

Copy 8 samples
Read pointer at 5 write pointer at 7
Copy
5...7 to 7...9 (3)
7...9 to 1..3 (3)
1..2 to 4..5

This can't happen if one of the pointers points to the start.

Asserting that this not happens works for me.

Oh I missed that you did it already.
Thank you.

daschuer · 2022-08-21T18:14:20Z

src/util/ringdelaybuffer.cpp

+SINT RingDelayBuffer::read(CSAMPLE* pBuffer, const SINT itemsToRead, const SINT delayItems) {
+    const SINT shift = itemsToRead + delayItems;
+
+    if (shift > m_buffer.size()) {


Suggested change

if (shift > m_buffer.size()) {

VERIFY_OR_DEBUG_ASSERT(shift <= m_buffer.size()) {

daschuer · 2022-08-21T18:15:03Z

src/util/ringdelaybuffer.cpp

+}
+
+SINT RingDelayBuffer::write(const CSAMPLE* pBuffer, const SINT itemsToWrite) {
+    if (itemsToWrite > m_buffer.size()) {


Suggested change

if (itemsToWrite > m_buffer.size()) {

VERIFY_OR_DEBUG_ASSERT(itemsToWrite <= m_buffer.size()) {

src/util/ringdelaybuffer.cpp

This commit replaces the basic if statements for testing the invalid amount of item values with VERIFY_OR_DEBUG_ASSERT.

This commit renames the RingDelayBuffer::copy function into RingDelayBuffer:copyRing.

The commit moves the RingDelayBuffer::copyRing into an anonymous namespace and removes the function from the RingDelayBuffer class. So, the function serves as a helper function only.

The commit adds handling structure for the situation when the number of items to copy causes, that both buffers have to cross their upper bounds and circle around at least once. For the current RingDelayBuffer::copyRing usage this situation is not required, so, the VERIFY_OR_DEBUG_ASSERT is added.

src/util/ringdelaybuffer.cpp

Swiftb0y · 2022-08-23T15:52:12Z

src/util/ringdelaybuffer.cpp

+using ReadableSlice = mixxx::SampleBuffer::ReadableSlice;
+using WritableSlice = mixxx::SampleBuffer::WritableSlice;
+
+namespace {
+SINT copyRing(const ReadableSlice pSourceBuffer,
+        SINT sourcePos,
+        const WritableSlice pDestBuffer,
+        SINT destPos,
+        const SINT numItems) {


I'm planning to deprecate mixxx::SampleBuffer::*Slice. Why not use std::span instead?

Of course, std::span can be used. I thought, that I should rather work with span through SampleBuffer and forgot, that I can call the mixxx::spanutil::spanFromPtrLen() directly.

Right, you just use spanFromPtrLen as an adapter so to speak and then try to use std::span as much as possible in any APIs that take the usual (pointer, size) pair.

src/util/ringdelaybuffer.cpp

This commit adds the default value assignment for copiedItems variable. The initialization is added to avoid potentially uninitialized memory.

This commits tries to avoid the uninitialized memory due to creating the variable before the if-else statement without assigning the default value. One solution could be assigning the default value, anyway, some IDEs can warn that the variable is unused and primarily it will hide real errors like that the variable is not set in one branch. Based on that, the lambda solution is used.

This commit replaces the mixxx::SampleBuffer::ReadableSlice and mixxx::SampleBuffer::WritableSlice with std::span and the mixxx::spanutil::spanFromPtrLen helper function.

This commit renames the pSourceBuffer and pDestBuffer in copyRing function onto sourceBuffer and destBuffer without the use of the 'p' prefix, which is used for pointers.

This commit introduces the std::span in RingDelayBufferTest. The span is primarily used in the RingDelayBufferTest::AssertIdenticalBufferEquals and for all calls of this function.

davidchocholaty · 2022-08-27T20:06:31Z

Just a little sum up. To make this PR ready, the last thing from my side is to finish the documentation and descriptive comments, if the previous changes will pass your reviews. The last thing, that I have on my mind is, that the mentioned licence for mixxx/lib/portaudio/pa_ringbuffer.c is likely no longer required due to major previous changes and the creation of a new implementation during this PR. What do you think?

This commit renames the m_firstInputBuffer onto m_firstInputChunk due to it better describes the nature of the situation. Actually, in a general view, the user code has not had to pass to the RingDelayBuffer::write function the input buffer at once but could break it into small chunks.

This commit adds the documentation comments for the RingDelayBuffer class and adds some description comments for the code parts for better understanding.

This commit improves the const-correctness by making the RingDelayBuffer::size function returned value as a constant expression due to it can be evaluated in the compile time.

src/util/ringdelaybuffer.h

Swiftb0y · 2022-08-29T14:09:29Z

Just a little sum up. To make this PR ready, the last thing from my side is to finish the documentation and descriptive comments, if the previous changes will pass your reviews. The last thing, that I have on my mind is, that the mentioned licence for mixxx/lib/portaudio/pa_ringbuffer.c is likely no longer required due to major previous changes and the creation of a new implementation during this PR. What do you think?

IANAL, but I don't think its required anymore...

This commits introduces std::span in the RingDelayBuffer API. So, the parameter for RingDelayBuffer::read is only the destination buffer (using span) and delay items. The RingDelayBuffer::write has just one parameter, the source buffer (span as well). Depended tests are upgraded using span in the function callings too.

This commit removes the licence for the mixxx/lib/portaudio/pa_ringbuffer.c. The licence is no longer required due to major previous changes and the creation of a new implementation.

Swiftb0y

sorry for the nitpicks again....

src/test/ringdelaybuffer_test.cpp

This commit changes to avoid creating a span from the SampleBuffer's data using SampleBuffer::data and the implemented SampleBuffer::span function is used instead.

davidchocholaty · 2022-08-29T19:04:53Z

sorry for the nitpicks again....

Absolutely no problem, this is how I should have written it the first time, I just did it too much automatically. Now, it should be fixed. The indent was kept to make it a little bit more readable.

Swiftb0y

Thanks. I'll go ahead and merge now so you can build of this.

davidchocholaty · 2022-08-29T20:06:01Z

Perfect, thank you very much for merging.

davidchocholaty added 7 commits July 12, 2022 15:44

RingDelayBuffer: clearing of m_jumpLeftAroundMask

785f636

This commit adds clearing of the m_jumpLeftAroundMask. When the RingDelayBuffer::clear method is called, the mentioned variable for masking, when the left side jump crossed the left side of the delay buffer, is set to zero.

RingDelayBuffer: allow equality for the right jump

a155157

This commit allows, that the size of the jump with the reading position to the right can be equal to the number of reading available items.

RingDelayBuffer: solve jump left maximum size

eeb6e6d

This commit solves the maximum size of the jump to the left for the reading position. Based on that, the comments for ASSERTs are updated and the zero size jump is handled separately.

github-actions bot added build code quality labels Jul 12, 2022

Swiftb0y requested changes Jul 13, 2022

View reviewed changes

src/util/ringdelaybuffer.cpp Outdated Show resolved Hide resolved

src/util/ringdelaybuffer.h Outdated Show resolved Hide resolved

daschuer reviewed Jul 14, 2022

View reviewed changes

davidchocholaty added 5 commits July 15, 2022 07:45

RingDelayBuffer: remove unnecessary destructor

1dd8830

This commit removes unnecessary explicitly created virtual destructor.

RingDelayBuffer: use the size of SampleBuffer

8f26438

The commit removes the variable for storing the size of the delay buffer. Instead of the mentioned variable, the SampleBuffer::size function is used, due to the type of the delay buffer, which is mixxx::SampleBuffer.

RingDelayBuffer: add single-threaded only comment

248f38e

The commit removes the information, that the ring buffer is safe for single-thread only. This information is a rewrite before the class.

RingDelayBufferTest: use of unique_ptr

4e6a1ef

This commit replaces the use of RingDelayBuffer* with unique_ptr.

davidchocholaty added 2 commits July 29, 2022 08:13

RingDelayBuffer: replace zeros memset with fill

d6565e7

This commit removes invasive manipulation with handling sizes and memsetting. Instead, the fill method is used.

RingDelayBuffer: replace with SampleUtil::copy

0bfd168

This commit replaces manual handling with memcpy by using SampleUtil::copy. This provides using a vectorized loop for 32-bit SSE instead of memcpy (based on benchmarking).

davidchocholaty requested a review from daschuer August 21, 2022 14:35

daschuer requested changes Aug 21, 2022

View reviewed changes

davidchocholaty added 4 commits August 22, 2022 15:27

RingDelayBuffer: change to VERIFY_OR_DEBUG_ASSERT

e3638b4

This commit replaces the basic if statements for testing the invalid amount of item values with VERIFY_OR_DEBUG_ASSERT.

RingDelayBuffer: rename copy function to copyRing

b95d6be

This commit renames the RingDelayBuffer::copy function into RingDelayBuffer:copyRing.

RingDelayBuffer: move copyRing into a namespace

b73a2b5

The commit moves the RingDelayBuffer::copyRing into an anonymous namespace and removes the function from the RingDelayBuffer class. So, the function serves as a helper function only.

Swiftb0y reviewed Aug 23, 2022

View reviewed changes

RingDelayBuffer: add default value assignment

0f4729f

This commit adds the default value assignment for copiedItems variable. The initialization is added to avoid potentially uninitialized memory.

daschuer approved these changes Aug 25, 2022

View reviewed changes

davidchocholaty added 5 commits August 26, 2022 08:26

Merge remote-tracking branch 'upstream/main' into ring_delay_buffer

0828535

RingDelayBuffer: replace slices by std::span

6e212af

This commit replaces the mixxx::SampleBuffer::ReadableSlice and mixxx::SampleBuffer::WritableSlice with std::span and the mixxx::spanutil::spanFromPtrLen helper function.

RingDelayBuffer: rename without pointer prefix

ac54694

This commit renames the pSourceBuffer and pDestBuffer in copyRing function onto sourceBuffer and destBuffer without the use of the 'p' prefix, which is used for pointers.

RingDelayBufferTest: introduce std::span

6ee6603

This commit introduces the std::span in RingDelayBufferTest. The span is primarily used in the RingDelayBufferTest::AssertIdenticalBufferEquals and for all calls of this function.

davidchocholaty requested a review from Swiftb0y August 27, 2022 20:06

davidchocholaty added 4 commits August 28, 2022 09:44

RingDelayBuffer: add documentation comments

74b52ea

This commit adds the documentation comments for the RingDelayBuffer class and adds some description comments for the code parts for better understanding.

Merge remote-tracking branch 'upstream/main' into ring_delay_buffer

9addec2

RingDelayBuffer: size returned value as constexpr

8076e29

This commit improves the const-correctness by making the RingDelayBuffer::size function returned value as a constant expression due to it can be evaluated in the compile time.

Swiftb0y reviewed Aug 29, 2022

View reviewed changes

src/util/ringdelaybuffer.h Outdated Show resolved Hide resolved

davidchocholaty added 2 commits August 29, 2022 18:24

RingDelayBuffer: remove the unnecessary licence

6c22c40

This commit removes the licence for the mixxx/lib/portaudio/pa_ringbuffer.c. The licence is no longer required due to major previous changes and the creation of a new implementation.

Swiftb0y reviewed Aug 29, 2022

View reviewed changes

src/test/ringdelaybuffer_test.cpp Outdated Show resolved Hide resolved

RingDelayBuffer: use SampleBuffer's span

b686546

This commit changes to avoid creating a span from the SampleBuffer's data using SampleBuffer::data and the implemented SampleBuffer::span function is used instead.

Swiftb0y approved these changes Aug 29, 2022

View reviewed changes

Swiftb0y merged commit c9d9581 into mixxxdj:main Aug 29, 2022

	if (shift > m_buffer.size()) {
	VERIFY_OR_DEBUG_ASSERT(shift <= m_buffer.size()) {

	if (itemsToWrite > m_buffer.size()) {
	VERIFY_OR_DEBUG_ASSERT(itemsToWrite <= m_buffer.size()) {

RingDelayBuffer: ring buffer for delay handling #4852

RingDelayBuffer: ring buffer for delay handling #4852

Conversation

davidchocholaty commented Jul 12, 2022

Swiftb0y commented Jul 12, 2022

daschuer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Swiftb0y commented Jul 14, 2022

davidchocholaty commented Jul 15, 2022

davidchocholaty commented Jul 19, 2022 • edited Loading

davidchocholaty commented Jul 19, 2022 • edited Loading

Swiftb0y commented Jul 20, 2022

davidchocholaty commented Jul 21, 2022

davidchocholaty commented Jul 21, 2022 • edited Loading

Swiftb0y commented Jul 21, 2022

davidchocholaty commented Jul 22, 2022

daschuer commented Jul 23, 2022 • edited Loading

davidchocholaty commented Jul 29, 2022

daschuer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidchocholaty Aug 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidchocholaty Aug 23, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidchocholaty commented Aug 27, 2022

Swiftb0y commented Aug 29, 2022

Swiftb0y left a comment

Choose a reason for hiding this comment

davidchocholaty commented Aug 29, 2022

Swiftb0y left a comment

Choose a reason for hiding this comment

davidchocholaty commented Aug 29, 2022

davidchocholaty commented Jul 19, 2022 •

edited

Loading

davidchocholaty commented Jul 19, 2022 •

edited

Loading

davidchocholaty commented Jul 21, 2022 •

edited

Loading

daschuer commented Jul 23, 2022 •

edited

Loading

davidchocholaty Aug 22, 2022 •

edited

Loading

davidchocholaty Aug 23, 2022 •

edited

Loading