You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am not all that sure about silero-vad as the Number Detector and Language Classifier sort of make it a bit 'fat' for just VAD.
Maybe there are simpler and easier ways to chunk spoken audio to fit beam search lengths of incoming realtime audio?
Z-yq haven't looked much but likely a simpler lower parameter model than silero could be used.
Also I think farfield and BSS/Beamforming are likely wireless distributed arrays and ASR central due to the possible diversification of use zonal systems could use.
No description provided.
The text was updated successfully, but these errors were encountered: