-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Here is a bash script separating longer files with limited RAM #391
Comments
@amo13 thanks for sharing 😄 I encountered the same issue and asked about it here: #162 (comment). One of the Spleeter collaborators kindly answered:
I haven't tried the suggested solution yet as I haven't worked on my project that uses Spleeter for a while. Unfortunately I'm also not familiar enough with audio processing to judge how difficult it would be to implement this. |
Yes, this idea also came to my mind, but it would mean processing everything twice while splitting at different offsets and then programmatically doing a lot of cutting and joining and praying that the pieces will fit. Thanks for the hint though! |
Since this didn't get out of my mind, I did the tedious fighting with bash and ffmpeg... The result is: How it works:
Downside:
Use the script by putting your audio file in the same folder and calling
|
@amo13 thank you so much for staying on this and sharing your solution! 😄 Regarding
I remember coming across an issue that mentioned differing input and output durations. Perhaps the answer given there might also apply here?
I haven't inspected the source code, but it could be that Spleeter performs some format conversion implicitly. Are the input files you tested MP3? The difference in duration documented in #96 was about 30ms, which is quite a bit less than 200ms. Still, perhaps splitting the audio exacerbated the effect. This is just a semi-educated guess, I'm no expert 😉 In any case, thanks again! |
I wonder if we could minimize the duplicate work that has to performed without significantly degrading the final quality. Given that the script only replaces 3 seconds of audio around the cracks, could we make spleeter re-do just those 3 seconds for each crack with a few more seconds for context? |
Linking #155 for reference. There's some interesting discussion there that also touches on splitting/stitching the audio. |
Yes of course, you are absolutely right. I should have thought of this! Also, I didn't compare the processing times with and without the script, but I'm kind of curious. |
You are right, it fixes the issue. Thank you. |
Ok, so here is the new version of the script using the tensorflow backend. It processes the input only once and has no cracks in the sound. Edit: The script has seen many improvements in the meantime (special thanks to goes to @redbar0n) and is now available in its own repo here! |
Hi all I had to modify
because deleted the separate files
I consider redundant operation. Sorry my bad english |
Nice work @amo13 . If I can make one suggestion, rather than doing the work of splitting up the input files, one can use the spleeter/spleeter/commands/__init__.py Lines 54 to 60 in 243b323
This way, you can process a single file iteratively using spleeter, rather than splitting it up manually beforehand. |
@amo13 Hey there, thanks a lot for providing your stitching code example. It gives good results but the transition between the 30 seconds blocks is noticeable, it looks like FFmpeg is adding a small gap between concatenations, do you experience this as well? Or is this something related to my setup? Here is an example (used your code to separate the tracks, then mixed all the tracks back using Logic. Logic interface showing the small gap every ~30sHere is an audio sample demonstrating the gap (you should notice it between 2 and 3 seconds)https://www.dropbox.com/s/1f0qz92yaoqedhl/gap.mp3?dl=0 PS: I've used your most updated code example. Thanks! |
Without testing your example on my setup, I can remember that I also had tiny bits of something added at every junction. The split, separated and joined audio was always a tiny bit longer than the original. |
@geraldoramos I just listened to your audio sample and I can confirm that I do not get gaps like that with my script. I have no idea why you would get those though... |
@geraldoramos are the files you're concatenating MP3s? If so, these answers on Stack Overflow may explain what's happening: That first answer also links to this page, which goes into more detail (emphasis mine):
If this is in fact what's happening, I think you should be able to eliminate the gaps by using a lossless format (e.g. FLAC) or a lossy format with "gapless encoding" support for the segments and concatenating those. If I'm reading that page right, I also think you can safely convert the file to MP3 after concatenating, if that's the output format you want. |
I also use mp3 as input format, just stuff downloaded with youtube-dl to mp3. The script converts the input mp3 to wav and works with wave from there on, splitting in parts, separating with spleeter, putting back together the pieces and finally only does it convert back to the input format, eg mp3 |
Thanks, guys! @amo13 and @mickdekkers I've replaced FFmpeg with I've tried mp3, wav, flac, and all end up with the same issue. It's super weird that @amo13 is not having the same issue with his setup. I'm using docker using miniconda I will keep digging and will let you guys know if I make any progress. @amo13 if you can send an output example from your end it would be awesome, so I can analyze the audio as well and compare it with mine. |
@geraldoramos I hope it helps figuring out the issue =) |
Hi @amo13, after examining your output audio (vocals.zip), it seems like it also has the small milliseconds gap (that I can notice if listening carefully), please take a look at the waveform with a lot of zooming. I can notice when transitioning from 30-31seconds that audio has a hiccup. This is exactly the same gap I found in my experiments. It's more noticeable if it's music and if you mix all the processed stems and play it like the original. Looks like this extra space is being added by Spleeter. I've tried splitting a song in 30 seconds chunks without the Spleeter part and connecting it back using sox and FFmpeg, using wav files. For those, I see perfect stitching without this mini gap. This makes me think that Spleeter processing is somehow adding this very small padding at the end of each steam. I'm always using lossless files (wav) everywhere during these tests so it will prevent the things @mickdekkers pointed out. Let me know what you think. Geraldo I've tried the latest Spleeter version as well as 1.4.8, and it is happening on both. |
@geraldoramos , good that you take a closer look, I guess you are right. Using only wav is smart and certainly prevents potential mp3 issues with padding. As for the added padding time, I agree, the only possible conclusion is that spleeter is adding it somehow. |
You might want to open a new issue and reference the last parts of this discussion here. Would you please link to this issue if you do so? |
Sounds good, will do it shortly! |
The latest version of this bash script can now be found at: https://github.com/amo13/spleeter-wrapper |
I've thought a bit about this, and it looks like this would be sub-optimal:
and against Spleeter's recommendation to batch process:
Quotes from: Correct me if I'm wrong. |
I present to you a bash script to separate an audio file by splitting it into parts, separating then one by one and finally joining them back together:
Here was the first version of the script which would lead to cracks every 30s in the resulting stems. Find the new version of the script in the post below.
The script is for linux and assumes that you have (mini)conda installed.
Feel free to use it and to modify it to your needs!
The following problem is now solved:
Still, I need help in figuring out the last piece of the puzzle: When joining the separated parts back together to full-length stems, you can hear a crack in the audio where the original audio has been split and the parts have been joined together at the end.
Here is the visualized symptom:
Does anyone have an idea how to fix this?
The text was updated successfully, but these errors were encountered: