Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] How can you extend the length beyond 10 minutes? #114

Closed
all-the-good-ones-are-gone opened this issue Nov 18, 2019 · 15 comments
Labels
question Further information is requested RTMP Read The Manual Please

Comments

@all-the-good-ones-are-gone

I'm working with live recordings, which looks promising so far, but have not worked out what file needs tweaking to read beyond 600.0 seconds.

I'd hate to have to break everything into a bunch of 10min. segments, process, merge back together.

@all-the-good-ones-are-gone all-the-good-ones-are-gone added the question Further information is requested label Nov 18, 2019
@mmoussallam mmoussallam added the RTMP Read The Manual Please label Nov 18, 2019
@all-the-good-ones-are-gone
Copy link
Author

I would love to, but I cannot locate anything relevant, which was the reason for the ticket.

Can you link the area of the block of text that explains what needs changing?

@greatfinders
Copy link

when you use separate command add - d 9000 (replace 9000 by duration in seconds you need)

@all-the-good-ones-are-gone
Copy link
Author

Awesome, @greatfinders, thank you!
Once I rebuilt everything (had tried screwing with random files with values of 600) it works, sort of.

There's no errors or warnings showing, and I cannot find any logs, but when I have too large a duration it plain doesn't write any output.

So setting -d 700 gave me a pair of 117.8MiB WAV files, but 3600 results in nothing but cycles gone by.

@greatfinders
Copy link

greatfinders commented Nov 20, 2019

Awesome, @greatfinders, thank you!
Once I rebuilt everything (had tried screwing with random files with values of 600) it works, sort of.

There's no errors or warnings showing, and I cannot find any logs, but when I have too large a duration it plain doesn't write any output.

So setting -d 700 gave me a pair of 117.8MiB WAV files, but 3600 results in nothing but cycles gone by.

i didn't really tried it , no result mean that the task was killed cause of power lacking (it required lot of ram and CPU) , for what i remember (read faq it's written inside) it is said you can only settle maximum or settle the duration limit, they will include a start offset later so if you have no result , just split into 600s duration files , use it and then fuse them

@all-the-good-ones-are-gone
Copy link
Author

Huh. That appears to be true, but doesn't make sense. Last night I could use -d 999 but -d 1000 would fail, but it's only using at greatest 21.8GiB RAM. I have 32 installed, and during testings, ~3.9GiB were reported as inactive when it failed. It's definitely not CPU bound, as it never hit 300% and I have nearly 1200% available when idling.

I don't suppose setting up a swap file would necessarily help. I guess I'll have to split the files up into 15Min chunks before processing.

@all-the-good-ones-are-gone
Copy link
Author

FYI everything's good now. I added a 16GiB swap file, and was able to process longer.

Trying an hour long file, it appears to have written a 29 minute file, which makes me happy, as at least I know how far it made it and can expand the file accordingly.

@redbar0n
Copy link

Why can't Spleeter split a large file into 10 minute segments, process them separately, then merge it back together into a single file output?

It would solve the extended length issue, without resorting to using up all the memory, and potentially hacky ways of manipulating memory swap file size (which can't be done in macOS anyways).

@all-the-good-ones-are-gone
Copy link
Author

@redbar0n
Copy link

I get that. But maybe spleeter should remove the noise spikes at the starts instead?

It seems unreasonable to force people to mess with or upgrade their RAM just for it to work on longer files.
How did you get to use zram on macOS anyway? I thought it was just for Linux.

@all-the-good-ones-are-gone
Copy link
Author

@redbar0n
Copy link

Seems like this bash script might help:
#391 (comment)

Although it should ideally be a part of Spleeter itself.

@redbar0n
Copy link

I recently contributed to the aforementioned bash script, which might help manage both RAM and HDD space concerns in general: https://github.com/amo13/spleeter-wrapper

@shabeepk
Copy link

when you use separate command add - d 9000 (replace 9000 by duration in seconds you need)

This worked for me. I do however think the default should be greater and should scale automatically based on CPU and RAM availability ...

@DoodleBears
Copy link

while I using GPU to handler LARGE .wav audio file (4 hours long, ~3GB), and the defult -d 600 will ends up Out Of Memory.
GPU: RTX 3060 LAPTOP (6BG VRAM)

  1. So I set the step to 60 using auditok to separate the file into small 60sec .wav file 0-60.wav, 60-120.wav, ...
  2. pass to spleeter -> 0-60_vocals.wav, 60-120_vocals.wav, ...
  3. use ffmepg to combine multiple wav files into one vocals.wav
  4. hope this help

Here is the code that handle multiple audios files

speech_file_paths = []
audio_region_paths = []

self.step_seconds = 60 # separate file into 60s long
seconds = 0
file_index = 0
audio_region = None
all_audio_length = 0.0
while True:
    # NOTE: Read the audio file. If the audio file is too short, merge it with the next audio file (if there is one).
    # NOTE: If the audio_region is the remaining part of the previous region, there is no need to read the file again.
    if audio_region == None:
        logger.warning(f"Reading file {file_index+1}: {self.audio_file_paths[file_index]}")
        audio_region = auditok.load(self.audio_file_paths[file_index])
        file_index += 1
        all_audio_length += audio_region.duration
        
    # NOTE: If the audio file is too short, and there is another audio file, merge it with the next one.
    while audio_region.duration < self.step_seconds and file_index < len(self.audio_file_paths):
        logger.warning(f"Reading file {file_index+1}: {self.audio_file_paths[file_index]}")
        next_audio_region = auditok.load(self.audio_file_paths[file_index])
        all_audio_length += next_audio_region.duration
        audio_region += next_audio_region
        file_index += 1
    
    current_file_seconds = 0
    while True:
        audio_region_path = os.path.join(clip_path, f"{get_time_point_name(seconds)}-{get_time_point_name(seconds+self.step_seconds)}.wav")
        speech_file_path = os.path.join(clip_path, f"{get_time_point_name(seconds)}-{get_time_point_name(seconds+self.step_seconds)}_vocals.wav")
        # NOTE: here is the feature of [auditok](https://github.com/amsehili/auditok)
        clip_region:auditok.AudioRegion = audio_region.seconds[current_file_seconds:current_file_seconds+self.step_seconds]
        is_end_of_clip = clip_region.duration < self.step_seconds
        if os.path.exists(audio_region_path):
            logger.debug(f"Audio segment already exists: {audio_region_path}")
            seconds += self.step_seconds
            current_file_seconds += self.step_seconds
            continue
            
        if not is_end_of_clip:
            audio_region_paths.append(audio_region_path)
            # NOTE: here I save the file
            clip_region.save(audio_region_path)
            logger.debug(f"Split audio into {self.step_seconds}-second segments: {audio_region_path}")
            
            speech_file_paths.append(speech_file_path)
            
            seconds += self.step_seconds
            current_file_seconds += self.step_seconds
        else:
            # NOTE: If this is the last clip, merge it with the next audio file.
            audio_region = clip_region
            break
        
    # NOTE: If all audio files have been read, exit the loop.
    if file_index == len(self.audio_file_paths):
         # NOTE: If there is a remaining part, save it.
        if audio_region.duration > 0:
            audio_region_paths.append(audio_region_path)
            audio_region.save(audio_region_path)
            logger.debug(f"Split audio into {self.step_seconds}-second segments: {audio_region_path}")
            
            speech_file_paths.append(speech_file_path)
        break
        
    
logger.warning(f"Start separating vocals. Total audio length: {get_time_point_name(all_audio_length)}")
self.separate_speech_from_audio(input_paths=audio_region_paths, output_folder=clip_path)
logger.warning(f"Separation completed. Total audio length: {get_time_point_name(all_audio_length)}")

# NOTE: Merge the separated audio clips into one complete file.
self.merge_speeches(audio_files=speech_file_paths)

@all-the-good-ones-are-gone
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested RTMP Read The Manual Please
Projects
None yet
Development

No branches or pull requests

6 participants