tcsenpai/whisperapp

Fork 0

mirror of https://github.com/tcsenpai/whisperapp.git synced 2025-06-05 14:45:20 +00:00

Go to file

tcsenpai d5a2caed7b now i removed vad

2025-05-23 11:55:26 +02:00

.gradio

first test

2025-05-23 10:11:30 +02:00

.gitignore

first test

2025-05-23 10:11:30 +02:00

app.py

now i removed vad

2025-05-23 11:55:26 +02:00

config.ini.example

removed vad

2025-05-23 11:54:05 +02:00

ollama_handler.py

quickfix

2025-05-23 11:33:14 +02:00

README.md

updated config.ini

2025-05-23 11:52:40 +02:00

requirements.txt

fixed whisper shenanigans

2025-05-23 11:48:37 +02:00

youtube_handler.py

fixed regex

2025-05-23 10:17:52 +02:00

README.md

Audio/Video Transcription Web App

A web application for transcribing audio and video files using WhisperX, with support for YouTube videos and optional summarization using Ollama.

Features

Transcribe local audio/video files
Process YouTube videos (with subtitle extraction when available)
Automatic language detection
Multiple WhisperX model options
Optional text summarization using Ollama
Modern web interface with Gradio
Configurable settings via config.ini

Requirements

Python 3.8+
CUDA-compatible GPU (recommended)
FFmpeg installed on your system
Ollama (optional, for summarization)

Installation

Clone the repository:

git clone <repository-url>
cd whisperapp

Install the required packages:

pip install -r requirements.txt

Install FFmpeg (if not already installed):

Ubuntu/Debian:

sudo apt update && sudo apt install ffmpeg

macOS:

brew install ffmpeg

Windows: Download from FFmpeg website

Copy the example configuration file:

cp .env.example .env

Edit the configuration files:

.env: Set your environment variables
config.ini: Configure WhisperX, Ollama, and application settings

Configuration

Application Settings (config.ini)

[whisper]
default_model = base
device = cuda
compute_type = float32
batch_size = 16
vad = true

[app]
max_duration = 3600
server_name = 0.0.0.0
server_port = 7860
share = true

[models]
available_models = tiny,base,small,medium,large-v1,large-v2,large-v3

[languages]
available_languages = en,es,fr,de,it,pt,nl,ja,ko,zh

[ollama]
enabled = false
url = http://localhost:11434
default_model = mistral
summarize_prompt = Please provide a comprehensive yet concise summary of the following text. Focus on the main points, key arguments, and important details while maintaining accuracy and completeness. Here's the text to summarize:

Usage

Start the application:

python app.py

Open your web browser and navigate to:

http://localhost:7860

Use the interface to:
- Upload and transcribe local audio/video files
- Process YouTube videos
- Generate summaries (if Ollama is configured)

Features in Detail

Local File Transcription

Supports various audio and video formats
Automatic language detection
Multiple WhisperX model options
Optional summarization with Ollama

YouTube Video Processing

Supports youtube.com, youtu.be, and invidious URLs
Automatically extracts subtitles if available
Falls back to transcription if no subtitles found
Optional summarization with Ollama

Summarization

Uses Ollama for text summarization
Configurable model selection
Customizable prompt
Available for both local files and YouTube videos

Notes

For better accuracy, use larger models (medium, large)
Processing time increases with model size
GPU is recommended for faster processing
Maximum audio duration is configurable (default: 60 minutes)
YouTube videos will first try to use available subtitles
If no subtitles are available, the video will be transcribed
Ollama summarization is optional and requires Ollama to be running

License

This project is licensed under the MIT License - see the LICENSE file for details.