-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add missing Rubber Band padding, preventing it from eating the initial transient #11120
Add missing Rubber Band padding, preventing it from eating the initial transient #11120
Conversation
These functions still use raw pointers so there's a bunch of stuff that's difficult to catch that can still go wrong, but this at least removes the need for having to manually deallocate these structures. And this class didn't adhere to the rule of five, so it was very easy to misuse it and cause double frees and use after frees.
This needs to be done or else Rubber Band will eat up the initial transient by introducing a short fade in/2048 sample low-pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, the changes seem very sensible to me so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM. I'm not comfortable with merging until someone can test whether the R2 stretcher still sounds good.
To avoid looking up virtual function calls in case this class is used as a superclass for something else.
7016e0f
to
1769fea
Compare
true, my fault for suggesting it wrong. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking care. I think this needs to be tweaked a bit. See my comments. My findings are from the V2stetcher.
@@ -19,7 +17,7 @@ namespace { | |||
// TODO (XXX): this should be removed. It is only needed to work around | |||
// a Rubberband 1.3 bug. | |||
// This is the default increment from RubberBand 1.8.1. | |||
size_t kRubberBandBlockSize = 256; | |||
constexpr size_t kRubberBandBlockSize = 256; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you able to update the comment?
Our oldest supported version is 1.8.2 from Ubuntu Focal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in, completely remove the block size limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
I have double checked, the bug has been fixed in Rubberband 1.7
breakfastquay/rubberband@c26dc1d
By the way, Rubberband will never request more than getPreferredStartPad() * 2
https://github.com/breakfastquay/rubberband/blob/de56cd114a678003dfef17abbd74ebd9203964eb/src/finer/R3Stretcher.cpp#L565
Maybe we can use this info to improve our buffers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#else | ||
// This _should_ be equal to the latency in older Rubber Band versions: | ||
// https://github.com/breakfastquay/rubberband/blob/c5f99d5ff2cba2f4f1def6c38c7843bbb9ac7a78/main/main.cpp#L652 | ||
size_t remaining_padding = m_pRubberBand->getLatency(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input and output padding is different by the playback rate.
The interesting commit is this:
breakfastquay/rubberband@72654b0
So we need to do here something similar:
int toPad = int(round(toDrop * frequencyshift));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah good catch. Looking at the changes to the actual time stretchers it seems like that's how you get the new getStartPad()
behavior in terms of getLatency()
. For the R3 time stretcher the getStartDelay()
value is different from the old getLatency()
value, so I'll emulate that and then push the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or well maybe the value was incorrect in that screenshot? Looking at https://github.com/breakfastquay/rubberband/blob/v1.8.2/src/StretcherImpl.cpp#L826-L831 Never mind, now I'm confusing myself. For the R2 algorithm they're the same, with the R3 algorithm they're different but I'm not sure if that was intentional. And that would only affect beta versions of librubberband anyways so it's not a concern.getLatency()
should indeed be identical to getStartDelay()
(like the documentation suggests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
#else | ||
size_t padding_to_drop = m_pRubberBand->getLatency(); | ||
#endif | ||
while (padding_to_drop > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to call while m_pRubberBand->available() before this loop.
In this regards, I am not sure if we can pull out all padding immediately, or if it needs to be stay in until we have provided the real samples of the same size.
At least in the rubberband main, this is done not immediately after filling in the padding.
I am afraid dropping at this point is not correct, because we likely need to keep the rubber band buffers filled with at least the startDelay() to perform windowing around the current sample.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also also concerned how big the internal buffer of rubberband is, since I see here memory allocation:
https://github.com/breakfastquay/rubberband/blob/9001b5459c1c424a71be0829c8988a25a9edf490/src/faster/StretcherProcess.cpp#L487
Which will likely block our real time thread.
The buffer is initial m_sWindowSize * 2;
4096 @ 48 kHz
The default start pads is m_aWindowSize / 2;
1024 @ 48 kHz
All this makes kind of sense. To retrieve 1 sample, we need half of a window samples before the sample and half a window silence after it.
In case of Mixxx, and after seeking within a track, it is not true that these samples are zero. So I think it is required to use the real samples before the start instead of zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The confusing point is that we also have the available() function. In theory there must not bee any output available before not receive more than a whole m_aWindowSize. Is this the case?
I think the initial getSamplesRequired() return m_aWindowSize.
It would be interesting how many samples are available after passing the requested amount.
If it is 0 or m_aWindowSize / 2?
In the later case it is clear why a padding of m_aWindowSize / 2 is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also also concerned how big the internal buffer of rubberband is, since I see here memory allocation:
https://github.com/breakfastquay/rubberband/blob/9001b5459c1c424a71be0829c8988a25a9edf490/src/faster/StretcherProcess.cpp#L487
Which will likely block our real time thread.
The padding is half of the window size, so that should definitely fit in Rubber Band's buffers. And thanks to the m_pRubberBand->setTimeRatio(2.0)
call the buffer should be able to fit at least two of those windows before reallocating.
In case of Mixxx, and after seeking within a track, it is not true that these samples are zero. So I think it is required to use the real samples before the start instead of zero.
That was part of my experiments in solving #11125, but it's a bit more difficult than it seems. Especially because the read ahead manager is shared by multiple things and reading data through it invokes a bunch of side effects. Older samples also may not be cached (I've noticed this causing issues when beat jumping somewhere for the first time that weren't present when not reading past samples). I have not yet had time to dig into this further, so if you have suggestions for how to compensate for the latency here without messing with the rest of the deck's state (so switching back and forth between linear interpolation and Rubber Band time stretching should be seamless and not shift any timings around) then that would be great.
The confusing point is that we also have the available() function. In theory there must not bee any output available before not receive more than a whole m_aWindowSize. Is this the case?
Yes, like I mentioned before, they're using an STFT (Short-time Fourier Transform). It's an overlap-add process where you a window function to a chunk of the input data, take the discrete Fourier transform of that, process the frequency domain signal, take the inverse discrete Fourier transform of that, apply another window function, and then add that to the output buffer, partially overlapping the previous output. The reason why the required padding is half of the input buffer is because most window functions used for these things are raised cosine windows, like the Hann/Hanning function I linked below. Because you apply (= multiple the chunk of audio with) the window twice, once before the DFT, and once after the IDFT, you've effectively applied the square of the window function to the signal if you didn't do anything to the signal in the frequency domain. And if you try adding up squared Hann functions with a quarter period difference between them, you'll notice that after a period the resulting function forms a flat line. That explains why the required padding is half a window, and why you get that smooth fade in if you don't add the padding.
Some wikipedia inks if you're curious about how and why this all works:
https://en.wikipedia.org/wiki/Short-time_Fourier_transform
https://en.wikipedia.org/wiki/Overlap%E2%80%93add_method
https://en.wikipedia.org/wiki/Hann_function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But yeah I guess you are right, we shouldn't be reading and dropping samples immediately. We should be doing the same thing the rubberband cli is doing, and process the audio as normal but drop the first getStartDelay()
samples of it. That also aligns with the STFT primer I posted above heh.
e1742ec
to
4ebe88f
Compare
This of course didn't work correctly before. We should drop the first n samples from the output, but we can't do that before processing more input. This also fixes the uncompensated latency/clicks issue from mixxxdj#11125 in the process.
@daschuer I fixed the way padding is dropped (wasn't thinking straight heh). Like I explained in the STFT primer in #11120 (comment), the scaled silence samples should be dropped from the output. You can't just drop them beforehand (I think I wrote this PR when it was already late at night heh). The padding is now correctly dropped, fixing both the transient eating problem and also fixing #11125 (which was sort of caused by the previous incorrect padding dropping behavior). With these changes scratching while having keylock enabled is finally smooth. As is beat jumping to audio that is already cached. Please give this PR another try with these changes! |
I have investigated this a bit more and the issue is that we have no reset() call in case of looping. |
SINT frames_available = m_pRubberBand->available(); | ||
SINT frames_to_read = math_min(frames_available, frames); | ||
const SINT frames_available = m_pRubberBand->available(); | ||
const SINT frames_to_read = math_min(frames_available, frames); | ||
SINT received_frames = static_cast<SINT>(m_pRubberBand->retrieve( | ||
m_bufferPtrs.data(), frames_to_read)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to read here frames_to_read + frame_offset;
Even though the buffer is more than big enough, it would be correct to check for its size.
We may consider to use a way smaller buffer than MAX_BUFFER_LEN b.t.w.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, did you miss to push a commit?
The function expects frames
and we need to discard frame_offset
from the retrieved buffer. So we need to retrieve frames + frame_offset
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 755a4a4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linked code makes that the function returns less frames than it might could have read. So we should not clamp the value to frames , but to frames + frame_offset in line 132
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we should not. I can't check what m_pRubberBand->available()
returns here, but logically it would return 1024 if it's doing overlap-add with four times overlap. At a 1.0 pitch ratio/when the time stretcher should not be making any adjustments, we need to throw away half a window's worth of samples, which is 2048 in this case. So in that situation the first two calls to this function must read and return 0 samples or else we've been reading the processed padding that we're trying to throw away here.
The linked code makes that the function returns less frames than it might could have read.
It can only read a small window of output at a time. Presumably this is 1/4th of the FFT window size. So 1024 samples. Which means that the first two calls to this function (each producing 1024 samples of garbage output) should be thrown away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a bit debug code to shows the issue after a seek:
qDebug() << "padding = " << m_remainingPaddingInOutput << "frames = " << frames << "available = " << frames_available;
debug [Engine] padding = 1025 frames = 256 available = 0
debug [Engine] padding = 1025 frames = 256 available = 512
debug [Engine] padding = 769 frames = 256 available = 768
debug [Engine] padding = 513 frames = 256 available = 1024
debug [Engine] padding = 257 frames = 256 available = 1280
debug [Engine] padding = 1 frames = 256 available = 1536
debug [Engine] padding = 0 frames = 1 available = 1792
debug [Engine] padding = 0 frames = 256 available = 1791
debug [Engine] padding = 0 frames = 256 available = 1535
debug [Engine] padding = 0 frames = 256 available = 1279
In the second call, we read only 256 frames from the available 512. The available frames are stacked up before they are reduced again. We need to read all available frames in that case. This needs to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah I see what you mean now! Fixed in d3aa210.
Somewhat related, would there be any interest in exposing the Rubberband R3 + short window option? I added it for myself here: robbert-vdh@af570f4 Its performance is in line with R2 (and also no huge spikes when searching), and quality wise it's more or less a sidegrade to R2. It doesn't have R2's time domain transient preservation, but it also doesn't introduce spurious transients and it does still sound better than R2 without transient preservation. Hmm maybe not. It usually does better than R2, but it sometimes also transforms kickdrums into smeary messes.
// before using it. Otherwise it will eat add a short fade-in, destroying | ||
// the initial transient. | ||
size_t remaining_padding = getPreferredStartPad(); | ||
std::fill_n(m_buffers[0].span().begin(), kRubberBandBlockSize, 0.0f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment that this filling zeros is the correct solution here, because we always fade in from zero after seek to avoid click sounds. This means that the first half of the buffer needs to be zero anyway.
I have verified it with a recording in Audacity that this looks good.
My fist assumption to use here real samples is correct but they have zero gain = 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the idea of the zero padding is that if the STFT process is working correctly, then those samples will not have any impact on the rest of the output. See my comment above. I added a link to the comment in the code.
@@ -19,7 +17,7 @@ namespace { | |||
// TODO (XXX): this should be removed. It is only needed to work around | |||
// a Rubberband 1.3 bug. | |||
// This is the default increment from RubberBand 1.8.1. | |||
size_t kRubberBandBlockSize = 256; | |||
constexpr size_t kRubberBandBlockSize = 256; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
I have double checked, the bug has been fixed in Rubberband 1.7
breakfastquay/rubberband@c26dc1d
By the way, Rubberband will never request more than getPreferredStartPad() * 2
https://github.com/breakfastquay/rubberband/blob/de56cd114a678003dfef17abbd74ebd9203964eb/src/finer/R3Stretcher.cpp#L565
Maybe we can use this info to improve our buffers.
This mostly reverts adb034b as this bug should in theory only affect librubberband 1.3 and the oldest supported version is currently 1.8.2.
SINT frames_available = m_pRubberBand->available(); | ||
SINT frames_to_read = math_min(frames_available, frames); | ||
const SINT frames_available = m_pRubberBand->available(); | ||
const SINT frames_to_read = math_min(frames_available, frames); | ||
SINT received_frames = static_cast<SINT>(m_pRubberBand->retrieve( | ||
m_bufferPtrs.data(), frames_to_read)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, did you miss to push a commit?
The function expects frames
and we need to discard frame_offset
from the retrieved buffer. So we need to retrieve frames + frame_offset
.
This was comparing against the wrong thing in 755a4a4.
Instead of reading his in small blocks and then throwing away all output a couple times until we threw away all of the padding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Thank you.
We have one big pending issue, that is not touched by this PR: In case of seeks we do the following in one callback:
This means with the default 1024 Buffer size This is roughly three times the CPU. So it seems the cross fading during a seek of the stretched samples is probably a bad idea due to the high risk of buffer underruns effects all Tracks and the tempo. It would be great if we could move the cross fading to the input side and keep rubber band settled. This will delay seeks by extra 1025 though ... mmmm. Ideas? |
:-/ Oh That's odd. I was wrong. The additional processing of the Padding does make this CPU load issue worse ... |
Not quite. Keep in mind that Rubber Band processes the input in windows of n samples at a time. With the R2 algorithm (and default window size settings) that's 2048 samples, and with the R3 algorithm that's 4096 samples. If the Rubberband process function needs to produce, say, 128 samples, then it always needs to run 2048/4096 samples through the algorithm before it starts producing output. So when you add the padding the next window after the first one will be processed a little sooner, but after a reset the time stretcher will still only process a single window's worth of samples. |
return received_frames; | ||
} | ||
|
||
void EngineBufferScaleRubberBand::deinterleaveAndProcess( | ||
const CSAMPLE* pBuffer, SINT frames, bool flush) { | ||
DEBUG_ASSERT(frames <= static_cast<ssize_t>(m_buffers[0].size())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I try to compile this locally on Windows with VS2022, it fails with an error, that identifier ssize_t is undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn't realize ssize_t
isn't from the standard library, and it compiles fine with MSVC on the CI. Probably through some transitive dependency. I guess ptrdiff_t
would be the portable alternative that comes closest to ssize_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afaik its part of the C standard library. I guess you could also #include <cstddef>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't appear to be. Libusb and the other libraries Mixxx uses that also use ssize_t
seem to explicitly define it on Windows.
`ssize_t` is only officially defined on POSIX, and apparently this results in compiler errors with Visual Studio 2022: mixxxdj#11120 (comment) `ptrdiff_t` is the closest type defined by the standard.
…and-padding Add missing Rubber Band padding, preventing it from eating the initial transient
…and-padding Add missing Rubber Band padding, preventing it from eating the initial transient
…-rubberband-padding" This reverts commit cec1978.
…and-padding Add missing Rubber Band padding, preventing it from eating the initial transient
…-rubberband-padding" This reverts commit cec1978.
…and-padding Add missing Rubber Band padding, preventing it from eating the initial transient
…-rubberband-padding" This reverts commit cec1978.
…and-padding Add missing Rubber Band padding, preventing it from eating the initial transient
…-rubberband-padding" This reverts commit cec1978.
This fixes #11093. See #11093 (comment) for more information on what's going on here. In short, both the docs and the faq mention that you should add the requested amount of padding (and then drop that padding again) after initializing or resetting the time stretching engine. If you don't do this, then any initial transient will be replaced by a short fade in. With the r3/finer algorithm this is 2048 samples, which becomes very audible. This PR adds the required padding routine, fixing the problem See the linked issue for example videos of how this behaved before and after the change. I also replaced the manual memory management with STL data structures while I was at it and fixed some memory safety issues by explicitly deleting the copy and move constructors/assignment operators. Previously invoking these by accident would result in double frees and use after frees.