-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pyannote/speaker-diarization-3.0 runs slower than pyannote/speaker-diarization@2.1 #499
Comments
Brilliant, thank you. I thought I was crazy. Your fix worked for me. Went from around 8 minutes to 30 second for diarization on 2 speaker ~45minute audio file. |
thank you! |
Oh, man, thank you. I thought I was going crazy too, not long ago everything was working fast and now it's very very slow .... |
hey @kaihe-stori, I tried to use your approach, but I still get the error for onnxruntime
Any suggestions how can I deal with that? |
Great find @kaihe-stori, you could send PR to README if you want |
Did you get the error during package installation or running code? Run my commands only after you install whisperx (thus faster-whisper). |
Sure, happy to do it. Is "Limitations" a good section to put this in? |
I think setup is. best! https://github.com/m-bain/whisperX#setup-%EF%B8%8F |
thanks.. that worked :) |
Thanks ! I was investing it and did not get what was happening. |
Hey! Noticed this problem while executing and monitoring GPU usage. Tried this approach and still am getting 0% GPU usages when it comes to the diarization step - can you further explain at which point you are executing the 3 lines of code you mentioned?
I was able to get the old 2.1 model working fine w/ the GPU but for whatever reason, using the workaround for the newer model isn't working. Ty for bringing light to this issue! Context/TLDR:
|
Off-the-cuff question, but is there any reason to believe that the newer "3.0" versions of pyannote segmentation/diarization are worse than "2.1" for WhisperX diarization quality (not speed, in this case)? I just made a couple of transcripts with 3.0 for the first time, and I wasn't happy with the quality of the speaker segmentation and thus speaker recognition. I've been quite pleased in the past with the previous models with WhisperX. Just anecdotal, I haven't investigated this. |
It should not, pyannote 3.0 integrate a new model that is supposed to get better results especially on overlapping discussions. You can see their results on public database on the release note here. On the other subject, uninstall reinstall do not work for me either. And that is a big problem. |
Yes, the workaround is intended to be right after installing whisperx to alter the dependencies, beforing running your application code. Not sure about Mac, as my experience is on AWS (a GPU instance with pytorch+cuda container). Sorry. |
np, ty for the response! will be setting up a proper environment shortly, figured I would see if I was understanding corectly 👍 |
:( The current setup.py change breaks Mac, which sucks. Not sure I understood / I tried solution people pointed out about the slower pyannote, but no matter what I do there is no onnxruntime available as @dylorr pointed out for mac. So I don't think the solution really helps Mac users out. Seems like this has to do with the pyanote dependency and not really whisperx, so I created a docker container following the advice for the pyannote issue where it said to use a 3.0.0 or lower, by cloning the whisperx and modifying the setup.py. For those on Mac, here is the repository / docker image if you want to use it and just have a plug-and-go solution. I was trying to get something working in the setup.py to detect the environment but didn't try too hard / was running into a bit of issues, so just decided to hard-code it for now to 3.0.0 and maybe that detect OS can be a future thing. Link to the modified repo: I didn't test the Docker image too hard, I just made sure the below worked:
also just made sure that import whisperx work for python, and since the cli in my docker container is just a bash shell passing the arguments to the python function, i assume it works for any python scripts too. |
Still no solution to make it work directly on onnx gpu without having to uninstall force reinstall ? |
For me it's not working too, even after update to
I've got cuda 11.8 and cudnn 8.7.0.84 |
It's not working for me either with GPU. But if I remember well, PyTorch 2.1 requires Cuda 12+ to work. |
It has builds for lower cuda versions. Im using 11.8 here |
How did you fixed it? I ran the commands only after installing whisperx and this error message popped up during installation and running code. It stop showing the error after reinstalling onnxruntime but the performance issues continue happening. |
Same |
Here is some changes that can fix pyannotate performance, you can try it (already merged) (i described what helped me) |
is this problem solved ? really hectic. how we can go back to previosu version of the whisperx which has running without problem on the gpu |
It seems it is not solved. Right now the all lib is kind of useless without gpu acceleration on diarization. I did not find a way to solve it for now... @m-bain Submited changes but it did not change that it is not using the gpu for now. |
Pyannote just released 3.1 without onnx. |
Unfortunately, this issue seems to be back for me. I had no problems whatsoever but then upgraded to the latest version this week and now diarization takes ages to complete with high CPU and RAM load. Maybe this is related to this issue? |
Try increasing OMP_NUM_THREADS, it works for me |
thanks fo the suggestion. to what value did you increase it? currently trying 4 |
it's just weird because it seemed to run seemlessly and complete within a few minutes before - now it's pretty much stuck - using similar files. |
Same here, two days ago it became too slow. |
That's weird, the only thing that did change is the usage of torchaudio>=2.2 |
So after discovering that it didn't work, I just cleaned my whole environment and started from scratch - following the instructions in the readme file:
|
Awesome, thanks for the hint! It seemed like I fixed it by reinstalling torchaudio et al., so now my GPU is running at full capacity again during diarization. It still takes very long, though. Is the new diarization model so much more resource hungry? Also, maybe we should edit the readme, then? I ran the following to reinstall torch-2.0.0+cu118 & torchaudio-2.0.1+cu118 ((Windows, CUDA 11.8):
|
We can upgrade the requirment.txt file to upgrade to avoid that. |
Fresh install, diarization painfully slow (1 hour for for 1h12m of audio on M2 pro). Not sure how to debug this further...
|
I'm not sure if this is the same issue, but if I change embedded_batch_size and segmentation_batch_size both to 8 (default was 32 for both), I get a very fast diarization. Take a look at this issue, it might be the same: #688 |
whisperX/setup.py
Line 22 in 07fafa3
Currently pyannote.audio is pinned to 3.0.0, but it has been reported that it performed slower because the embeddings model ran on CPU. As a result a new release 3.0.1 fixed it by replacing
onnxruntime
withonnxruntime-gpu
.It makes sense for whisperX to update pyannote.audio to 3.0.1, however, there is a conflict with faster_whisper on
onnxruntime
, as discussed here. Until it is resolved on the faster_whisper side, installing both will end uponnxruntime
still in CPU mode and thus slower performance.My current workaround is running the following commands post installation
Alternative, use the old 2.1 model.
The text was updated successfully, but these errors were encountered: