mirror of
https://github.com/tcsenpai/whisperapp.git
synced 2025-06-05 14:45:20 +00:00
Whisper Transcription Web App
A user-friendly web application for transcribing audio and video files using OpenAI's Whisper model, powered by Gradio and faster-whisper.
Features
- 🎙️ Transcribe audio and video files
- 🚀 GPU acceleration support
- 🌐 Multiple language support
- 📱 Responsive and modern UI
- 🔄 Multiple model options (tiny to large-v3)
- ⚙️ Configurable settings via config.ini
- 📺 YouTube video support with subtitle extraction
Requirements
- Python 3.10+
- CUDA-capable GPU (recommended)
- FFmpeg (for audio/video processing)
- uv package manager
Installation
- Clone this repository:
git clone <repository-url>
cd whisperapp
- Install uv (if you just pip install you might break your environment):
curl -LsSf https://astral.sh/uv/install.sh | sh
- Create a venv with uv:
uv venv --python=3.10
- Install the required packages using uv:
uv pip install -r requirements.txt
Configuration
The application can be configured through the config.ini
file. Here are the available settings:
Whisper Settings
default_model
: Default Whisper model to usedevice
: Device to use (cuda/cpu)compute_type
: Computation type (float16/float32)beam_size
: Beam size for transcriptionvad_filter
: Enable/disable voice activity detection
App Settings
max_duration
: Maximum audio duration in secondsserver_name
: Server hostnameserver_port
: Server portshare
: Enable/disable public sharing
Models and Languages
available_models
: Comma-separated list of available modelsavailable_languages
: Comma-separated list of supported languages
Usage
- Start the application:
python app.py
-
Open your web browser and navigate to
http://localhost:7860
-
Choose between two tabs:
- Local File: Upload and transcribe audio/video files
- YouTube: Process YouTube videos with subtitle extraction
Local File Tab
- Upload an audio or video file
- Select your preferred model and language settings
- Click "Transcribe" and wait for the results
YouTube Tab
- Enter a YouTube URL (supports youtube.com, youtu.be, and invidious URLs)
- Select your preferred model and language settings
- Click "Process Video"
- The app will:
- First try to extract available subtitles
- If no subtitles are available, download and transcribe the video
Model Options
- tiny: Fastest, lowest accuracy
- base: Good balance of speed and accuracy
- small: Better accuracy, moderate speed
- medium: High accuracy, slower
- large-v1/v2/v3: Highest accuracy, slowest
Tips
- For better accuracy, use larger models (medium, large)
- Processing time increases with model size
- GPU is recommended for faster processing
- Maximum audio duration is configurable in config.ini
- Use uv for faster package installation and dependency resolution
- YouTube videos will first try to use available subtitles
- If no subtitles are available, the video will be transcribed
License
MIT License
Description
Languages
Python
100%