This project is a test page to demonstrate Automatic Speech Recognition (ASR) using OpenAI's Whisper model running locally. It consists of a simple webpage that captures audio from the user's microphone, sends it to a custom endpoint, and displays the transcribed text and the time it took to render the transcription.
- Start and stop recording with a button
- Auto-end recording after a specified duration of silence
- Utilizes a Docker container to run the ASR webservice locally
- Uses a proxy to avoid CORS issues
- Node.js
- Docker
- Clone this repository:
git clone https://github.com/voiceflow-gallagan/whisper-asr-demo.git
- Change to the project directory:
cd whisper-asr-demo
- Install the required dependencies:
npm install
- Pull and run the Docker container for the ASR webservice: Do not hesitate to check the openai-whisper-asr-webservice repo for more details and/or to load a different model.
docker run -d -p 9000:9000 -e ASR_MODEL=base.en onerahmet/openai-whisper-asr-webservice:latest
- Start the proxy server:
nodemon proxy.js
The proxy server should now be running at http://localhost:3000. This is needed to avoid CORS issues when making requests to the ASR webservice while running the demo locally.
- Open the
index.html
file in your browser to use the ASR demo.
- Click the "Start Recording" button to start capturing audio from your microphone.
- Speak into the microphone, and the application will automatically transcribe your speech.
- The recording will stop automatically after a specified duration of silence, or you can manually stop it by clicking the "Stop Recording" button.
- The transcribed text and the time it took to render the transcription will be displayed on the page.
You can set the duration of silence after which the recording will stop by changing the silenceDuration
variable in the app.js
file.
You can change the endpoint that the demo will send the audio to by changing the whisperEndpoint
variable in the app.js
file.
If you change the endpoint, you will also need to change the target
variable in the proxy.js
file.
You can host and run the custom ASR webservice on your own server and use it in your next Voiceflow Voice Assistant integration.