mirror of
https://github.com/tcsenpai/whisperapp.git
synced 2025-06-06 15:05:23 +00:00
96 lines
2.4 KiB
Markdown
96 lines
2.4 KiB
Markdown
# Whisper Transcription Web App
|
|
|
|
A user-friendly web application for transcribing audio and video files using OpenAI's Whisper model, powered by Gradio and faster-whisper.
|
|
|
|
## Features
|
|
|
|
- 🎙️ Transcribe audio and video files
|
|
- 🚀 GPU acceleration support
|
|
- 🌐 Multiple language support
|
|
- 📱 Responsive and modern UI
|
|
- 🔄 Multiple model options (tiny to large-v3)
|
|
- ⚙️ Configurable settings via config.ini
|
|
|
|
## Requirements
|
|
|
|
- Python 3.8+
|
|
- CUDA-capable GPU (recommended)
|
|
- FFmpeg (for audio/video processing)
|
|
|
|
## Installation
|
|
|
|
1. Clone this repository:
|
|
```bash
|
|
git clone <repository-url>
|
|
cd whisperapp
|
|
```
|
|
|
|
2. Create a virtual environment and activate it:
|
|
```bash
|
|
python -m venv venv
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
```
|
|
|
|
3. Install uv (recommended package installer):
|
|
```bash
|
|
curl -LsSf https://astral.sh/uv/install.sh | sh
|
|
```
|
|
|
|
4. Install the required packages using uv:
|
|
```bash
|
|
uv pip install -r requirements.txt
|
|
```
|
|
|
|
## Configuration
|
|
|
|
The application can be configured through the `config.ini` file. Here are the available settings:
|
|
|
|
### Whisper Settings
|
|
- `default_model`: Default Whisper model to use
|
|
- `device`: Device to use (cuda/cpu)
|
|
- `compute_type`: Computation type (float16/float32)
|
|
- `beam_size`: Beam size for transcription
|
|
- `vad_filter`: Enable/disable voice activity detection
|
|
|
|
### App Settings
|
|
- `max_duration`: Maximum audio duration in seconds
|
|
- `server_name`: Server hostname
|
|
- `server_port`: Server port
|
|
- `share`: Enable/disable public sharing
|
|
|
|
### Models and Languages
|
|
- `available_models`: Comma-separated list of available models
|
|
- `available_languages`: Comma-separated list of supported languages
|
|
|
|
## Usage
|
|
|
|
1. Start the application:
|
|
```bash
|
|
python app.py
|
|
```
|
|
|
|
2. Open your web browser and navigate to `http://localhost:7860`
|
|
|
|
3. Upload an audio or video file and select your preferred model and language settings
|
|
|
|
4. Click "Transcribe" and wait for the results
|
|
|
|
## Model Options
|
|
|
|
- tiny: Fastest, lowest accuracy
|
|
- base: Good balance of speed and accuracy
|
|
- small: Better accuracy, moderate speed
|
|
- medium: High accuracy, slower
|
|
- large-v1/v2/v3: Highest accuracy, slowest
|
|
|
|
## Tips
|
|
|
|
- For better accuracy, use larger models (medium, large)
|
|
- Processing time increases with model size
|
|
- GPU is recommended for faster processing
|
|
- Maximum audio duration is configurable in config.ini
|
|
- Use uv for faster package installation and dependency resolution
|
|
|
|
## License
|
|
|
|
MIT License |