Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

another server, based on stream #1418

Closed
wants to merge 8 commits into from
Closed

another server, based on stream #1418

wants to merge 8 commits into from

Conversation

colinator
Copy link

Looks like there are several attempts to make servers with this code - here is another! This approach doesn't actually make a server - it simply factorizes the 'stream' example into components. This would support many ways of creating servers and data encodings - json/http, protobuf/grpc, flexbuffers, message pack, etc.

Happy to make changes and/or fold in with another server-ization branch.

@ggerganov
Copy link
Owner

Interesting! Will look further, but first need to make some long-time pending updates (#1422) and after that will come back to this example

@litongjava
Copy link
Contributor

I like it, but so far it's only done with modularization, it would be nice to write another websocker-server. I would like to do this work, do you have plans to add a web-socket-server?

Also I found out that recording audio with a sample rate of 48KHz and then converting it to 16KHz gives better recognition. I would also like to add support for

@colinator
Copy link
Author

@litongjava Well, if this approach gets integrated, then yes please write a web-socket server. I actually want something different - a ROS-like pub-sub node. We can all have the server we want, with the encoding we want.

@codesoda
Copy link
Contributor

codesoda commented Nov 7, 2023

It'd be great if the server didn't "have to" capture the audio. Then, this stream transcription approach can be incorporated into other apps with real-time audio/video data streams. The host app would convert to 16khz audio and keep throwing it to the stream server.

@colinator
Copy link
Author

@codesoda Yes, me too. Hence the LocalSDLMicrophone class separation. I also want the final result to be encoded into some protocol other than json - hence the WhisperOutput/WhisperEncoder classes.

@ggerganov ggerganov mentioned this pull request Nov 16, 2023
@litongjava
Copy link
Contributor

I have completed the WebSocket service, as it was quite complex, so I created a separate project for it. The project can be found at https://github.com/litongjava/whisper-cpp-server.

After compiling, run the following command to start the server:

./cmake-build-debug/whisper_server_base_on_uwebsockets -m models/ggml-base.en.bin

Then, navigate to the web directory at https://github.com/litongjava/whisper-cpp-server/tree/main/web and open the index.html file.

However, my tests indicate that the results of the speech recognition are not very satisfactory

@colinator colinator closed this by deleting the head repository Dec 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants