tcsenpai/whisperapp

Fork 0

mirror of https://github.com/tcsenpai/whisperapp.git synced 2025-07-28 13:52:46 +00:00

Go to file

tcsenpai 4ad72ffe8d switched to whisperX

2025-05-23 11:45:36 +02:00

.gradio

first test

2025-05-23 10:11:30 +02:00

.gitignore

first test

2025-05-23 10:11:30 +02:00

app.py

switched to whisperX

2025-05-23 11:45:36 +02:00

config.ini.example

updated prompt and generation technique

2025-05-23 11:30:19 +02:00

ollama_handler.py

quickfix

2025-05-23 11:33:14 +02:00

README.md

switched to whisperX

2025-05-23 11:45:36 +02:00

requirements.txt

switched to whisperX

2025-05-23 11:45:36 +02:00

youtube_handler.py

fixed regex

2025-05-23 10:17:52 +02:00

README.md

Whisper Transcription Web App

A user-friendly web application for transcribing audio and video files using OpenAI's Whisper model, powered by Gradio and faster-whisper.

Features

🎙️ Transcribe audio and video files
🚀 GPU acceleration support
🌐 Multiple language support
📱 Responsive and modern UI
🔄 Multiple model options (tiny to large-v3)
⚙️ Configurable settings via config.ini
📺 YouTube video support with subtitle extraction

Requirements

Python 3.10+
CUDA-capable GPU (recommended)
FFmpeg (for audio/video processing)
uv package manager

Installation

Clone this repository:

git clone <repository-url>
cd whisperapp

Install uv (if you just pip install you might break your environment):

curl -LsSf https://astral.sh/uv/install.sh | sh

Create a venv with uv:

uv venv --python=3.10

Install the required packages using uv:

uv pip install -r requirements.txt

Configuration

The application can be configured through the config.ini file. Here are the available settings:

Whisper Settings

default_model: Default Whisper model to use
device: Device to use (cuda/cpu)
compute_type: Computation type (float16/float32)
beam_size: Beam size for transcription
vad_filter: Enable/disable voice activity detection

App Settings

max_duration: Maximum audio duration in seconds
server_name: Server hostname
server_port: Server port
share: Enable/disable public sharing

Models and Languages

available_models: Comma-separated list of available models
available_languages: Comma-separated list of supported languages

Usage

Start the application:

python app.py

Open your web browser and navigate to http://localhost:7860
Choose between two tabs:
- Local File: Upload and transcribe audio/video files
- YouTube: Process YouTube videos with subtitle extraction

Local File Tab

Upload an audio or video file
Select your preferred model and language settings
Click "Transcribe" and wait for the results

YouTube Tab

Enter a YouTube URL (supports youtube.com, youtu.be, and invidious URLs)
Select your preferred model and language settings
Click "Process Video"
The app will:
- First try to extract available subtitles
- If no subtitles are available, download and transcribe the video

Model Options

tiny: Fastest, lowest accuracy
base: Good balance of speed and accuracy
small: Better accuracy, moderate speed
medium: High accuracy, slower
large-v1/v2/v3: Highest accuracy, slowest

Tips

For better accuracy, use larger models (medium, large)
Processing time increases with model size
GPU is recommended for faster processing
Maximum audio duration is configurable in config.ini
Use uv for faster package installation and dependency resolution
YouTube videos will first try to use available subtitles
If no subtitles are available, the video will be transcribed

License

MIT License