mirror of
https://github.com/tcsenpai/whisperapp.git
synced 2025-06-02 13:20:03 +00:00
Whisper Transcription Web App
A user-friendly web application for transcribing audio and video files using OpenAI's Whisper model, powered by Gradio and faster-whisper.
Features
- 🎙️ Transcribe audio and video files
- 🚀 GPU acceleration support
- 🌐 Multiple language support
- 📱 Responsive and modern UI
- 🔄 Multiple model options (tiny to large-v3)
- ⚙️ Configurable settings via config.ini
Requirements
- Python 3.8+
- CUDA-capable GPU (recommended)
- FFmpeg (for audio/video processing)
Installation
- Clone this repository:
git clone <repository-url>
cd whisperapp
- Create a virtual environment and activate it:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install uv (recommended package installer):
curl -LsSf https://astral.sh/uv/install.sh | sh
- Install the required packages using uv:
uv pip install -r requirements.txt
Configuration
The application can be configured through the config.ini
file. Here are the available settings:
Whisper Settings
default_model
: Default Whisper model to usedevice
: Device to use (cuda/cpu)compute_type
: Computation type (float16/float32)beam_size
: Beam size for transcriptionvad_filter
: Enable/disable voice activity detection
App Settings
max_duration
: Maximum audio duration in secondsserver_name
: Server hostnameserver_port
: Server portshare
: Enable/disable public sharing
Models and Languages
available_models
: Comma-separated list of available modelsavailable_languages
: Comma-separated list of supported languages
Usage
- Start the application:
python app.py
-
Open your web browser and navigate to
http://localhost:7860
-
Upload an audio or video file and select your preferred model and language settings
-
Click "Transcribe" and wait for the results
Model Options
- tiny: Fastest, lowest accuracy
- base: Good balance of speed and accuracy
- small: Better accuracy, moderate speed
- medium: High accuracy, slower
- large-v1/v2/v3: Highest accuracy, slowest
Tips
- For better accuracy, use larger models (medium, large)
- Processing time increases with model size
- GPU is recommended for faster processing
- Maximum audio duration is configurable in config.ini
- Use uv for faster package installation and dependency resolution
License
MIT License
Description
Languages
Python
100%