# Whisper Transcription Web App A user-friendly web application for transcribing audio and video files using OpenAI's Whisper model, powered by Gradio and faster-whisper. ## Features - 🎙️ Transcribe audio and video files - 🚀 GPU acceleration support - 🌐 Multiple language support - 📱 Responsive and modern UI - 🔄 Multiple model options (tiny to large-v3) - ⚙️ Configurable settings via config.ini ## Requirements - Python 3.8+ - CUDA-capable GPU (recommended) - FFmpeg (for audio/video processing) ## Installation 1. Clone this repository: ```bash git clone cd whisperapp ``` 2. Create a virtual environment and activate it: ```bash python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate ``` 3. Install uv (recommended package installer): ```bash curl -LsSf https://astral.sh/uv/install.sh | sh ``` 4. Install the required packages using uv: ```bash uv pip install -r requirements.txt ``` ## Configuration The application can be configured through the `config.ini` file. Here are the available settings: ### Whisper Settings - `default_model`: Default Whisper model to use - `device`: Device to use (cuda/cpu) - `compute_type`: Computation type (float16/float32) - `beam_size`: Beam size for transcription - `vad_filter`: Enable/disable voice activity detection ### App Settings - `max_duration`: Maximum audio duration in seconds - `server_name`: Server hostname - `server_port`: Server port - `share`: Enable/disable public sharing ### Models and Languages - `available_models`: Comma-separated list of available models - `available_languages`: Comma-separated list of supported languages ## Usage 1. Start the application: ```bash python app.py ``` 2. Open your web browser and navigate to `http://localhost:7860` 3. Upload an audio or video file and select your preferred model and language settings 4. Click "Transcribe" and wait for the results ## Model Options - tiny: Fastest, lowest accuracy - base: Good balance of speed and accuracy - small: Better accuracy, moderate speed - medium: High accuracy, slower - large-v1/v2/v3: Highest accuracy, slowest ## Tips - For better accuracy, use larger models (medium, large) - Processing time increases with model size - GPU is recommended for faster processing - Maximum audio duration is configurable in config.ini - Use uv for faster package installation and dependency resolution ## License MIT License