-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
another server, based on stream #1418
Conversation
Interesting! Will look further, but first need to make some long-time pending updates (#1422) and after that will come back to this example |
I like it, but so far it's only done with modularization, it would be nice to write another websocker-server. I would like to do this work, do you have plans to add a web-socket-server? Also I found out that recording audio with a sample rate of 48KHz and then converting it to 16KHz gives better recognition. I would also like to add support for |
@litongjava Well, if this approach gets integrated, then yes please write a web-socket server. I actually want something different - a ROS-like pub-sub node. We can all have the server we want, with the encoding we want. |
It'd be great if the server didn't "have to" capture the audio. Then, this stream transcription approach can be incorporated into other apps with real-time audio/video data streams. The host app would convert to 16khz audio and keep throwing it to the stream server. |
@codesoda Yes, me too. Hence the LocalSDLMicrophone class separation. I also want the final result to be encoded into some protocol other than json - hence the WhisperOutput/WhisperEncoder classes. |
I have completed the WebSocket service, as it was quite complex, so I created a separate project for it. The project can be found at https://github.com/litongjava/whisper-cpp-server. After compiling, run the following command to start the server:
Then, navigate to the web directory at https://github.com/litongjava/whisper-cpp-server/tree/main/web and open the However, my tests indicate that the results of the speech recognition are not very satisfactory |
Looks like there are several attempts to make servers with this code - here is another! This approach doesn't actually make a server - it simply factorizes the 'stream' example into components. This would support many ways of creating servers and data encodings - json/http, protobuf/grpc, flexbuffers, message pack, etc.
Happy to make changes and/or fold in with another server-ization branch.