A FastAPI-based REST API that provides text-to-speech (TTS) capabilities for Swahili text using fine-tuned Hugging Face models.
- Converts Swahili text to speech using two different models:
- Benny model:
Benjamin-png/swahili-mms-tts-finetuned
- Briget model:
Benjamin-png/swahili-mms-tts-Briget_580_clips-finetuned
- Benny model:
- Automatic Swahili language detection
- CORS support for cross-origin requests
- Efficient model caching
- Returns audio in WAV format
pip install -r requirements.txt
To start the server:
python tts_linkedin.py
The server will run on http://0.0.0.0:8000
Endpoint: POST /tts/benny
Request Body:
{
"text": "Your Swahili text here"
}
Endpoint: POST /tts/briget
Request Body:
{
"text": "Your Swahili text here"
}
Both endpoints return:
- Content-Type:
audio/wav
- Binary audio data in WAV format
400 Bad Request
: If the provided text is not in Swahili- Standard HTTP error codes for other failure cases
Using curl:
curl -X POST "http://localhost:8000/tts/briget" \
-H "Content-Type: application/json" \
-d '{"text":"Kijana huyu ni msataarabu sana sana"}' \
--output output.wav
Using Python requests:
import requests
response = requests.post(
"http://localhost:8000/tts/briget",
json={"text": "Kijana huyu ni msataarabu sana sana"}
)
with open("output.wav", "wb") as f:
f.write(response.content)
- Uses PyTorch for model inference
- Models are automatically loaded on first use and cached
- Runs on GPU if available, falls back to CPU
- Audio is generated at the model's native sampling rate
- Output is converted to 16-bit PCM WAV format
- The API uses
lru_cache
for efficient model loading - Language detection is performed using the
langdetect
library - CORS is configured to allow all origins, methods, and headers
- Audio processing uses scipy for WAV file generation
- The server uses FastAPI's automatic request validation with Pydantic models
You can also run the API using Docker:
# Build the Docker image
docker build -t swahili-tts .
# Run the container
docker run -d -p 8000:8000 swahili-tts