Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Spleeter adding small padding to output audio files #437

Closed
geraldoramos opened this issue Jul 1, 2020 · 6 comments
Closed

[Bug] Spleeter adding small padding to output audio files #437

geraldoramos opened this issue Jul 1, 2020 · 6 comments
Labels
bug Something isn't working invalid This doesn't seem right

Comments

@geraldoramos
Copy link

geraldoramos commented Jul 1, 2020

Description

During an effort to reduce memory footprint by splitting input files in chunks of 30 seconds, discussed on this thread we noticed that Spleeter is adding a tiny padding after each output stem file, what makes a small gap when stitching back the 30's chunks in one single stem. Sometimes this gap can be unnoticeable, but when processing a song and mixing it back, it is easy to spot the hiccup in the song. Also, after analyzing the waveform, it's clear that a gap is added by Spleeter:

image

In order to make sure it is related to Spleeter, I've tried separating and stitching other files not processed via Spleeter and the stitching was flawless. During the entire experiment, I've used only lossless(wav) files to avoid issues with padding that some lossy files would cause.

Here is the file that generated the waveform above, you can notice a hiccup (gap) every 30 seconds when listening carefully.

Step to reproduce

1 - Use an example wav file that has more than 30 seconds and split it into 30s chunks using FFmpeg or Sox. You can rename your file to myfile.wav to reuse the code below:

FFmpeg: ffmpeg -i myfile.wav -f segment -segment_time 30 -c copy myfile-%03d.wav
Sox: sox myfile.wav myfile-.wav trim 0 30 : newfile : restart

2 - Process all the chunks using Spleeter:

spleeter separate -i myfile-* -p spleeter:2stems -B tensorflow -o out

3 - Move first 2 accompaniment stems together for stitching:

mv ./out/myfile-002/accompaniment.wav ./out/myfile-001/accompaniment2.wav
cd ./out/myfile-001

4 - Stitch accompaniment.wav and accompaniment2.wav using Sox or FFmpeg:

FFmpeg: ffmpeg -f concat -safe 0 -i <(for f in ./accompaniment*.wav; do echo "file '$PWD/$f'"; done) -c copy output.wav
Sox: sox accompaniment.wav accompaniment2.wav output.wav

5 - Listen to output.wav and notice the hiccup during the transition at ~30s.

You can also use this shell script by @amo13

Environment

OS Linux using Docker
Installation type Conda
RAM available 6GB
Hardware spec Docker using 8 CPUs

Additional context

Stitching discussion

@geraldoramos geraldoramos added bug Something isn't working invalid This doesn't seem right labels Jul 1, 2020
@geraldoramos geraldoramos changed the title [Bug] Spleeter adding small padding to output audio file [Bug] Spleeter adding small padding to output audio files Jul 1, 2020
@romi1502
Copy link
Member

romi1502 commented Jul 1, 2020

Hi @geraldoramos,
thank you for the detailed issue.
Yes there is indeed an issue at the beginning of reconstructed signals. This is due to a strange behavior of the STFT of tensorflow that spleeter does not compensate: the first window of the STFT starts at the first sample while it should be centered on the first sample. To compensate for that, we should pad the beginning of the input of the STFT with zeros and remove the padded portion after separation.
This is usually not a big deal if you process a full track (as songs commonly have already a bit of silence at the beginning).I'll have a look for a quick fix on this aspect of the issue.

That being said, even after solving this first problem, you'll still have troubles at borders if you try to process chunked segment of audio without doing overlap: this is inherent to STFT processing with overlap and add reconstruction. The result of the first chunk can actually leak a bit on the next chunk and if you don't take this into account, you may still have glitches. So if you want no glitch, you need to do a bit of overlap between your chunks (which, by the way, will solve the first problem too).

@geraldoramos
Copy link
Author

geraldoramos commented Jul 2, 2020

@romi1502 The overlap idea worked like a charm, I just need to find a good way to automate it, thanks a lot for the tip. Not sure if you want to keep this open as you said there is actually an issue around this. Feel free to close if it makes sense =)

For those interested in this issue, I don't have a code to share at the moment, I just used sox to split an audio file with a one-second overlap, then removed the additional second after processed with Spleeter, then concatenated the stems without the extra second.

@amo13
Copy link

amo13 commented Jul 2, 2020

I do have an automation of this overlapping process. Will drop anonther variant of my script here when I get home.
Basically, it needs to process the input audio twice but with the second processing doing one 15 seconds chunk and then again 30s chunks for the rest. Then it takes 3s around the crack in the first processing from the second one and puts everything back together. It's probably not ideal but maybe someone will have a good idea how to make it better.
I'll post it later here when I get back to my pc

@amo13
Copy link

amo13 commented Jul 3, 2020

separate-overlap.zip --> see below
This is the modified script automating the overlapping at the junctions. Please just try it out and compare the results. I wrote that a few weeks back and don't remember the details right now. Unfortunately, I also can't really look into this more deeply in the next days, but feel free to ask something about it or to just try and modify it!

@redbar0n
Copy link

redbar0n commented May 6, 2021

hey @amo13, thanks for your great work with separate-overlap.sh!

Would you mind pushing it to a repo? I have improved it, and fixed some bugs. It should now also be compatible with older Bash (< 4) and macOS versions. (Specifically bash 3.2 which Apple shipped with all versions up until macOS Catalina. Then they only started encouraging people to manually upgrade to use zsh as the shell instead.) I'd be happy to fork it from your repo and send you a pull request there.

(Alternatively, I could upload my version as a gist, and post a link to it here, together with the full changelog. But I'm not sure the comment section to this closed issue would longer be the right place for this.)

@amo13
Copy link

amo13 commented May 6, 2021

Good call! I thought about it a while back and wasn't sure if anyone would use this anyway...

Here it is: https://github.com/amo13/spleeter-wrapper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

5 participants