DIY-Astra is a Flask application that utilizes computer vision and natural language processing to create an interactive AI assistant. The application captures live video feed from a webcam, analyzes the captured images using the Google AI API, and generates text responses based on the visual input. The generated text responses are then converted to audio using the ElevenLabs API and played back to the user.
- Live video feed capture from the webcam
- Image analysis using the Google AI API
- Text generation based on visual input
- Text-to-speech conversion using the ElevenLabs API
- Real-time audio playback of generated responses
- Web-based user interface for interaction and control
To run the DIY-Astra application, you need to have the following dependencies installed:
- Python 3.x
- Flask
- Flask-SocketIO
- OpenCV (cv2)
- Pydub
- Google Generative AI Client Library
- Pillow (PIL)
- Requests
You also need to have valid API keys for the following services:
- Google AI API (GOOGLE_API_KEY)
- ElevenLabs API (ELEVENLABS_API_KEY)
-
Clone the repository:
git clone https://github.com/your-username/diy-astra.git
-
Navigate to the project directory:
cd diy-astra
-
Install the required dependencies:
pip install -r requirements.txt
-
Set up the API keys:
- Replace
GOOGLE_API_KEY
inapp.py
with your Google AI API key. - Replace
ELEVENLABS_API_KEY
inapp.py
with your ElevenLabs API key.
- Replace
-
Run the application:
python app.py
-
Open your web browser and navigate to
http://localhost:5001
to access the DIY-Astra interface.
- Make sure your webcam is connected and accessible.
- Launch the DIY-Astra application by running
python app.py
. - The application will open in your default web browser.
- The live video feed from your webcam will be displayed in the interface.
- DIY-Astra will continuously capture images, analyze them using the Google AI API, and generate text responses based on the visual input.
- The generated text responses will be displayed in the text container below the video feed.
- The text responses will also be converted to audio using the ElevenLabs API and played back in real-time.
- You can stop the application by clicking the "Stop" button in the interface. To resume, click the "Resume" button.
app.py
: The main Flask application file containing the server-side logic.templates/index.html
: The HTML template for the user interface.static/css/styles.css
: The CSS stylesheet for styling the user interface.static/js/script.js
: The JavaScript file for client-side interactions and socket communication.requirements.txt
: The list of required Python dependencies.
Contributions are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.
This project is licensed under the MIT License.