tcsenpai/youlama

Fork 0

mirror of https://github.com/tcsenpai/youlama.git synced 2025-06-01 08:50:22 +00:00

Go to file

tcsenpai e5add93553 rebrand

2025-05-23 13:05:40 +02:00

.gradio

first test

2025-05-23 10:11:30 +02:00

.dockerignore

added dockerignore

2025-05-23 12:13:58 +02:00

.gitignore

first test

2025-05-23 10:11:30 +02:00

app.py

rebrand

2025-05-23 13:05:40 +02:00

config.ini.example

ollama internal support

2025-05-23 12:01:45 +02:00

docker-compose.yml

rebrand

2025-05-23 13:05:40 +02:00

Dockerfile

fixed ctranslate2 related errors

2025-05-23 13:03:28 +02:00

ollama_handler.py

quickfix

2025-05-23 11:33:14 +02:00

README.md

rebrand

2025-05-23 13:05:40 +02:00

requirements.txt

moved to docker for cuda support in whisper

2025-05-23 11:59:06 +02:00

youtube_handler.py

fixed regex

2025-05-23 10:17:52 +02:00

README.md

YouLama

A powerful web application for transcribing and summarizing YouTube videos and local media files using faster-whisper and Ollama.

Features

🎥 YouTube video transcription with subtitle extraction
🎙️ Local audio/video file transcription
🤖 Automatic language detection
📝 Multiple Whisper model options
📚 AI-powered text summarization using Ollama
🎨 Modern web interface with Gradio
🐳 Docker support with CUDA
⚙️ Configurable settings via config.ini

Requirements

Docker and Docker Compose
NVIDIA GPU with CUDA support
NVIDIA Container Toolkit
Ollama installed locally (optional, for summarization)

Installation

Clone the repository:

git clone <repository-url>
cd youlama

Install NVIDIA Container Toolkit (if not already installed):

# Add NVIDIA package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

# Install nvidia-docker2 package
sudo apt-get update
sudo apt-get install -y nvidia-docker2

# Restart the Docker daemon
sudo systemctl restart docker

Install Ollama locally (optional, for summarization):

curl https://ollama.ai/install.sh | sh

Copy the example configuration file:

cp .env.example .env

Edit the configuration files:

.env: Set your environment variables
config.ini: Configure Whisper, Ollama, and application settings

Running the Application

Start Ollama locally (if you want to use summarization):

ollama serve

Build and start the YouLama container:

docker-compose up --build

Open your web browser and navigate to:

http://localhost:7860

Configuration

Environment Variables (.env)

# Server configuration
SERVER_NAME=0.0.0.0
SERVER_PORT=7860
SHARE=true

Application Settings (config.ini)

[whisper]
default_model = base
device = cuda
compute_type = float16
beam_size = 5
vad_filter = true

[app]
max_duration = 3600
server_name = 0.0.0.0
server_port = 7860
share = true

[models]
available_models = tiny,base,small,medium,large-v1,large-v2,large-v3

[languages]
available_languages = en,es,fr,de,it,pt,nl,ja,ko,zh

[ollama]
enabled = false
url = http://host.docker.internal:11434
default_model = mistral
summarize_prompt = Please provide a comprehensive yet concise summary of the following text. Focus on the main points, key arguments, and important details while maintaining accuracy and completeness. Here's the text to summarize:

Features in Detail

YouTube Video Processing

Supports youtube.com, youtu.be, and invidious URLs
Automatically extracts subtitles if available
Falls back to transcription if no subtitles found
Optional AI-powered summarization with Ollama

Local File Transcription

Supports various audio and video formats
Automatic language detection
Multiple Whisper model options
Optional AI-powered summarization with Ollama

AI Summarization

Uses locally running Ollama for text summarization
Configurable model selection
Customizable prompt
Available for both local files and YouTube videos

Tips

For better accuracy, use larger models (medium, large)
Processing time increases with model size
GPU is recommended for faster processing
Maximum audio duration is configurable (default: 60 minutes)
YouTube videos will first try to use available subtitles
If no subtitles are available, the video will be transcribed
Ollama summarization is optional and requires Ollama to be running locally
The application runs in a Docker container with CUDA support
Models are downloaded and cached in the models directory
The container connects to the local Ollama instance using host.docker.internal

License

This project is licensed under the MIT License - see the LICENSE file for details.