tcsenpai/whisperapp

Fork 0

mirror of https://github.com/tcsenpai/whisperapp.git synced 2025-06-02 13:20:03 +00:00

Go to file

tcsenpai 3fec029b30 first test

2025-05-23 10:11:30 +02:00

.gradio

first test

2025-05-23 10:11:30 +02:00

.gitignore

first test

2025-05-23 10:11:30 +02:00

app.py

first test

2025-05-23 10:11:30 +02:00

config.ini.example

first test

2025-05-23 10:11:30 +02:00

README.md

first test

2025-05-23 10:11:30 +02:00

requirements.txt

first test

2025-05-23 10:11:30 +02:00

README.md

Whisper Transcription Web App

A user-friendly web application for transcribing audio and video files using OpenAI's Whisper model, powered by Gradio and faster-whisper.

Features

🎙️ Transcribe audio and video files
🚀 GPU acceleration support
🌐 Multiple language support
📱 Responsive and modern UI
🔄 Multiple model options (tiny to large-v3)
⚙️ Configurable settings via config.ini

Requirements

Python 3.8+
CUDA-capable GPU (recommended)
FFmpeg (for audio/video processing)

Installation

Clone this repository:

git clone <repository-url>
cd whisperapp

Create a virtual environment and activate it:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install uv (recommended package installer):

curl -LsSf https://astral.sh/uv/install.sh | sh

Install the required packages using uv:

uv pip install -r requirements.txt

Configuration

The application can be configured through the config.ini file. Here are the available settings:

Whisper Settings

default_model: Default Whisper model to use
device: Device to use (cuda/cpu)
compute_type: Computation type (float16/float32)
beam_size: Beam size for transcription
vad_filter: Enable/disable voice activity detection

App Settings

max_duration: Maximum audio duration in seconds
server_name: Server hostname
server_port: Server port
share: Enable/disable public sharing

Models and Languages

available_models: Comma-separated list of available models
available_languages: Comma-separated list of supported languages

Usage

Start the application:

python app.py

Open your web browser and navigate to http://localhost:7860
Upload an audio or video file and select your preferred model and language settings
Click "Transcribe" and wait for the results

Model Options

tiny: Fastest, lowest accuracy
base: Good balance of speed and accuracy
small: Better accuracy, moderate speed
medium: High accuracy, slower
large-v1/v2/v3: Highest accuracy, slowest

Tips

For better accuracy, use larger models (medium, large)
Processing time increases with model size
GPU is recommended for faster processing
Maximum audio duration is configurable in config.ini
Use uv for faster package installation and dependency resolution

License

MIT License