DialogDetective

Stop guessing which episode is which. Let AI listen to the dialogue and figure it out for you.

AI-Powered Identification

Uses modern LLMs like Gemini or Claude to intelligently match dialogue to the correct episode. No manual guesswork required.

Dialogue Analysis

A unique approach: extracts audio, transcribes speech using Whisper models, then identifies episodes by what characters actually say.

Physical Media Made Easy

Finally organize those DVD and Blu-ray rips. Turn cryptic filenames like TITLE_01.mkv into properly named episode files.

Terminal

Quick Start

cargo install dialog_detective

Available on crates.io

Download from GitHub Releases

Pre-built binaries for macOS, Linux, and Windows

macOS: Metal GPU acceleration • Linux/Windows: CPU-only (build from source for GPU support)

git clone https://github.com/jakobwesthoff/DialogDetective.git
cd DialogDetective
cargo build --release

FFmpeg

Required for audio extraction. ffmpeg.org

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# Windows: Download from https://ffmpeg.org/download.html

AI CLI (install one)

Gemini CLI (recommended) or Claude Code

Whisper Models

Downloaded automatically on first run from HuggingFace.

Requirements

DialogDetective requires external tools to function:

FFmpeg - for audio extraction from video files
Gemini CLI or Claude Code - for AI-powered episode matching

See the Prerequisites tab above for installation instructions.

Basic Usage

# Dry run - preview what would happen
dialog_detective ./videos "Breaking Bad" -s 1

# Rename files in place
dialog_detective ./videos "Breaking Bad" --mode rename -s 1

# Copy to organized directory
dialog_detective ./videos "Breaking Bad" --mode copy -o ./organized -s 1

# List available Whisper models
dialog_detective --list-models

Documentation

DialogDetective takes the guesswork out of organizing TV series rips. Point it at a directory of video files, tell it the show name, and let AI do the detective work.

The process is simple: extract audio from each video using FFmpeg, transcribe the dialogue using Whisper, fetch episode metadata from TVMaze, then use an LLM to match what was said to the correct episode. Finally, rename or copy the files with proper episode information.

CLI Usage

dialog_detective <VIDEO_DIR> <SHOW_NAME> [OPTIONS]

Options

Option	Default	Description
`<VIDEO_DIR>`	Required	Directory to scan for video files
`<SHOW_NAME>`	Required	TV series name for metadata lookup
`-s, --season <N>`	All	Filter to specific season(s), repeatable
`--model <NAME>`	base	Whisper model (tiny/base/small/medium/large)
`--model-path <PATH>`	-	Custom Whisper model file path
`-m, --matcher <BACKEND>`	gemini	AI backend: gemini or claude
`--mode <MODE>`	dry-run	Operation: dry-run, rename, or copy
`-o, --output-dir <DIR>`	-	Output directory (required for copy mode)
`--format <PATTERN>`	See below	Custom filename template
`--list-models`	-	List available Whisper models

Operation Modes

DialogDetective supports three operation modes, controlled by the --mode option:

Mode	Description
`dry-run`	Default. Shows what would happen without modifying any files. Always run this first to verify the matches are correct.
`rename`	Renames files in place with proper episode information.
`copy`	Copies files to a new location (requires `--output-dir`). Original files remain untouched.

# Preview changes (always do this first)
dialog_detective ./videos "Breaking Bad" -s 1

# Rename files in place
dialog_detective ./videos "Breaking Bad" -s 1 --mode rename

# Copy to organized directory
dialog_detective ./videos "Breaking Bad" -s 1 --mode copy -o ./organized

Season Filtering

Highly Recommended

Always use --season when you know which season your files belong to:

Dramatically improves matching accuracy
Reduces LLM context size (fewer episodes to choose from)
Saves API tokens
Faster processing

Since you're typically processing a single season at a time when ripping discs, specifying the correct season makes the tool much more effective.

# Process only season 1 files
dialog_detective ./videos "Breaking Bad" -s 1

# Process multiple seasons
dialog_detective ./videos "Breaking Bad" -s 1 -s 2

Warning

The season filter limits the matching scope. If you specify -s 1 and a video file is actually from season 2, it will likely be mismatched to a season 1 episode. Only use season filtering when you know all your video files belong to the specified season(s).

Filename Templates

Use --format to customize output filenames. The default template is:

{show} - S{season:02}E{episode:02} - {title}.{ext}

Available Variables

Variable	Description
`{show}`	Series name
`{season}` / `{season:02}`	Season number (use `:02` for zero-padding, e.g., "01")
`{episode}` / `{episode:02}`	Episode number (use `:02` for zero-padding, e.g., "07")
`{title}`	Episode title
`{ext}`	Original file extension (without dot)

# Custom format example
dialog_detective ./videos "The Flash" -s 1 \
  --format "{show} S{season:02}E{episode:02} {title}.{ext}"

Whisper Models

DialogDetective uses Whisper for speech-to-text transcription. Models are automatically downloaded from HuggingFace on first use.

Model Selection

Choose a model based on your needs:

Model	Size	Speed	Accuracy	Notes
`tiny`	~39MB	Fastest	Lower	Good for testing
`base`	~142MB	Fast	Good	Default. Best balance of speed and accuracy
`small`	~466MB	Medium	Better	Good for non-English content
`medium`	~1.5GB	Slower	High	Recommended if base struggles
`large-v3`	~2.9GB	Slowest	Highest	Best accuracy, requires more RAM
`large-v3-turbo`	~809MB	Medium	High	Good compromise for large model quality

English-only variants (tiny.en, base.en, etc.) are slightly more accurate for English content. Quantized variants (-q5_0, -q5_1, -q8_0) are smaller but slightly less accurate.

# List all available models and see which are cached
dialog_detective --list-models

# Use a specific model
dialog_detective ./videos "Show" -s 1 --model large-v3-turbo

GPU Acceleration

DialogDetective uses whisper-rs for speech-to-text, which supports various GPU backends for faster transcription.

Pre-built Binaries

Platform	GPU Backend	Notes
macOS	Metal	Apple GPU acceleration enabled by default
Linux	CPU-only	Build from source for GPU support
Windows	CPU-only	Build from source for GPU support

Building with GPU Support

If you have the required GPU frameworks installed on Linux or Windows, you can build with GPU acceleration:

# NVIDIA CUDA (requires CUDA toolkit)
cargo build --release --features cuda

# Vulkan (requires Vulkan SDK)
cargo build --release --features vulkan

# AMD ROCm/hipBLAS (requires ROCm)
cargo build --release --features hipblas

See the whisper-rs documentation for detailed requirements for each GPU backend.

AI Backend

DialogDetective uses external CLI tools for LLM access. You must have one of the following installed and authenticated:

Gemini CLI (default) - Google's Gemini models
Claude Code - Anthropic's Claude models

The CLI must be working independently before DialogDetective can use it. Test with gemini or claude in your terminal.

# Use Gemini (default)
dialog_detective ./videos "Show" -s 1

# Use Claude
dialog_detective ./videos "Show" -s 1 --matcher claude

The interface is abstracted to easily support direct API access in the future. Contributions welcome!

Cache & Storage

DialogDetective caches various data to avoid redundant processing and speed up repeated runs.

Cache Location

All cached data is stored in a platform-specific cache directory:

macOS: ~/Library/Caches/de.westhoffswelt.dialogdetective/
Linux: ~/.cache/dialogdetective/
Windows: %LOCALAPPDATA%\dialogdetective\

What Gets Cached

Data	Directory	TTL	Why Cached
Whisper Models	`models/`	Permanent	Models are large (39MB - 2.9GB) and don't change. Downloaded once from HuggingFace on first use.
Series Metadata	`metadata/`	24 hours	Episode lists from TVMaze rarely change. Caching reduces API calls and speeds up repeated runs on the same show.
Transcripts	`transcripts/`	24 hours	Whisper transcription is CPU/GPU intensive. Caching by video file hash means re-running on the same files skips transcription entirely.
Match Results	`matching/`	24 hours	LLM matching costs tokens and time. Results are cached by a composite key (video hash + show + seasons + matcher), so identical queries return instantly.

The 24-hour TTL balances freshness with efficiency. If you need to force a refresh (e.g., after TVMaze updates episode data), simply delete the relevant cache subdirectory.

Temporary Files

During processing, DialogDetective extracts audio to temporary WAV files in your system's temp directory (/tmp, /var/folders/..., or %TEMP%). These files are automatically cleaned up when processing completes or if the program is interrupted.

Managing Cache

To clear all cached data:

# macOS
rm -rf ~/Library/Caches/de.westhoffswelt.dialogdetective/

# Linux
rm -rf ~/.cache/dialogdetective/

To clear only models (to free disk space):

# macOS
rm -rf ~/Library/Caches/de.westhoffswelt.dialogdetective/models/

# Linux
rm -rf ~/.cache/dialogdetective/models/

Use dialog_detective --list-models to see which models are currently cached and their sizes.