DialogDetective

Stop guessing which episode is which. Let AI listen to the dialogue and figure it out for you.

AI-Powered Identification

Uses modern LLMs like Gemini or Claude to intelligently match dialogue to the correct episode. No manual guesswork required.

Dialogue Analysis

A unique approach: extracts audio, transcribes speech using Whisper models, then identifies episodes by what characters actually say.

Physical Media Made Easy

Finally organize those DVD and Blu-ray rips. Turn cryptic filenames like TITLE_01.mkv into properly named episode files.

Terminal

Quick Start

cargo install dialog_detective

Available on crates.io

Download from GitHub Releases

Pre-built binaries for macOS, Linux, and Windows

macOS: Metal GPU acceleration • Linux/Windows: CPU-only (build from source for GPU support)

git clone https://github.com/jakobwesthoff/DialogDetective.git
cd DialogDetective
cargo build --release

FFmpeg

Required for audio extraction. ffmpeg.org

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# Windows: Download from https://ffmpeg.org/download.html

AI CLI (install one)

Gemini CLI (recommended) or Claude Code

Whisper Models

Downloaded automatically on first run from HuggingFace.

Requirements

DialogDetective requires external tools to function:

  • FFmpeg - for audio extraction from video files
  • Gemini CLI or Claude Code - for AI-powered episode matching

See the Prerequisites tab above for installation instructions.

Basic Usage

# Dry run - preview what would happen
dialog_detective ./videos "Breaking Bad" -s 1

# Rename files in place
dialog_detective ./videos "Breaking Bad" --mode rename -s 1

# Copy to organized directory
dialog_detective ./videos "Breaking Bad" --mode copy -o ./organized -s 1

# List available Whisper models
dialog_detective --list-models

Documentation

DialogDetective takes the guesswork out of organizing TV series rips. Point it at a directory of video files, tell it the show name, and let AI do the detective work.

The process is simple: extract audio from each video using FFmpeg, transcribe the dialogue using Whisper, fetch episode metadata from TVMaze, then use an LLM to match what was said to the correct episode. Finally, rename or copy the files with proper episode information.

CLI Usage

dialog_detective <VIDEO_DIR> <SHOW_NAME> [OPTIONS]

Options

Option Default Description
<VIDEO_DIR> Required Directory to scan for video files
<SHOW_NAME> Required TV series name for metadata lookup
-s, --season <N> All Filter to specific season(s), repeatable
--model <NAME> base Whisper model (tiny/base/small/medium/large)
--model-path <PATH> - Custom Whisper model file path
-m, --matcher <BACKEND> gemini AI backend: gemini or claude
--mode <MODE> dry-run Operation: dry-run, rename, or copy
-o, --output-dir <DIR> - Output directory (required for copy mode)
--format <PATTERN> See below Custom filename template
--list-models - List available Whisper models

Operation Modes

DialogDetective supports three operation modes, controlled by the --mode option:

Mode Description
dry-run Default. Shows what would happen without modifying any files. Always run this first to verify the matches are correct.
rename Renames files in place with proper episode information.
copy Copies files to a new location (requires --output-dir). Original files remain untouched.
# Preview changes (always do this first)
dialog_detective ./videos "Breaking Bad" -s 1

# Rename files in place
dialog_detective ./videos "Breaking Bad" -s 1 --mode rename

# Copy to organized directory
dialog_detective ./videos "Breaking Bad" -s 1 --mode copy -o ./organized

Season Filtering

Highly Recommended

Always use --season when you know which season your files belong to:

  • Dramatically improves matching accuracy
  • Reduces LLM context size (fewer episodes to choose from)
  • Saves API tokens
  • Faster processing

Since you're typically processing a single season at a time when ripping discs, specifying the correct season makes the tool much more effective.

# Process only season 1 files
dialog_detective ./videos "Breaking Bad" -s 1

# Process multiple seasons
dialog_detective ./videos "Breaking Bad" -s 1 -s 2

Warning

The season filter limits the matching scope. If you specify -s 1 and a video file is actually from season 2, it will likely be mismatched to a season 1 episode. Only use season filtering when you know all your video files belong to the specified season(s).

Filename Templates

Use --format to customize output filenames. The default template is:

{show} - S{season:02}E{episode:02} - {title}.{ext}

Available Variables

Variable Description
{show} Series name
{season} / {season:02} Season number (use :02 for zero-padding, e.g., "01")
{episode} / {episode:02} Episode number (use :02 for zero-padding, e.g., "07")
{title} Episode title
{ext} Original file extension (without dot)
# Custom format example
dialog_detective ./videos "The Flash" -s 1 \
  --format "{show} S{season:02}E{episode:02} {title}.{ext}"

Whisper Models

DialogDetective uses Whisper for speech-to-text transcription. Models are automatically downloaded from HuggingFace on first use.

Model Selection

Choose a model based on your needs:

Model Size Speed Accuracy Notes
tiny ~39MB Fastest Lower Good for testing
base ~142MB Fast Good Default. Best balance of speed and accuracy
small ~466MB Medium Better Good for non-English content
medium ~1.5GB Slower High Recommended if base struggles
large-v3 ~2.9GB Slowest Highest Best accuracy, requires more RAM
large-v3-turbo ~809MB Medium High Good compromise for large model quality

English-only variants (tiny.en, base.en, etc.) are slightly more accurate for English content. Quantized variants (-q5_0, -q5_1, -q8_0) are smaller but slightly less accurate.

# List all available models and see which are cached
dialog_detective --list-models

# Use a specific model
dialog_detective ./videos "Show" -s 1 --model large-v3-turbo

GPU Acceleration

DialogDetective uses whisper-rs for speech-to-text, which supports various GPU backends for faster transcription.

Pre-built Binaries

Platform GPU Backend Notes
macOS Metal Apple GPU acceleration enabled by default
Linux CPU-only Build from source for GPU support
Windows CPU-only Build from source for GPU support

Building with GPU Support

If you have the required GPU frameworks installed on Linux or Windows, you can build with GPU acceleration:

# NVIDIA CUDA (requires CUDA toolkit)
cargo build --release --features cuda

# Vulkan (requires Vulkan SDK)
cargo build --release --features vulkan

# AMD ROCm/hipBLAS (requires ROCm)
cargo build --release --features hipblas

See the whisper-rs documentation for detailed requirements for each GPU backend.

AI Backend

DialogDetective uses external CLI tools for LLM access. You must have one of the following installed and authenticated:

The CLI must be working independently before DialogDetective can use it. Test with gemini or claude in your terminal.

# Use Gemini (default)
dialog_detective ./videos "Show" -s 1

# Use Claude
dialog_detective ./videos "Show" -s 1 --matcher claude

The interface is abstracted to easily support direct API access in the future. Contributions welcome!

Cache & Storage

DialogDetective caches various data to avoid redundant processing and speed up repeated runs.

Cache Location

All cached data is stored in a platform-specific cache directory:

  • macOS: ~/Library/Caches/de.westhoffswelt.dialogdetective/
  • Linux: ~/.cache/dialogdetective/
  • Windows: %LOCALAPPDATA%\dialogdetective\

What Gets Cached

Data Directory TTL Why Cached
Whisper Models models/ Permanent Models are large (39MB - 2.9GB) and don't change. Downloaded once from HuggingFace on first use.
Series Metadata metadata/ 24 hours Episode lists from TVMaze rarely change. Caching reduces API calls and speeds up repeated runs on the same show.
Transcripts transcripts/ 24 hours Whisper transcription is CPU/GPU intensive. Caching by video file hash means re-running on the same files skips transcription entirely.
Match Results matching/ 24 hours LLM matching costs tokens and time. Results are cached by a composite key (video hash + show + seasons + matcher), so identical queries return instantly.

The 24-hour TTL balances freshness with efficiency. If you need to force a refresh (e.g., after TVMaze updates episode data), simply delete the relevant cache subdirectory.

Temporary Files

During processing, DialogDetective extracts audio to temporary WAV files in your system's temp directory (/tmp, /var/folders/..., or %TEMP%). These files are automatically cleaned up when processing completes or if the program is interrupted.

Managing Cache

To clear all cached data:

# macOS
rm -rf ~/Library/Caches/de.westhoffswelt.dialogdetective/

# Linux
rm -rf ~/.cache/dialogdetective/

To clear only models (to free disk space):

# macOS
rm -rf ~/Library/Caches/de.westhoffswelt.dialogdetective/models/

# Linux
rm -rf ~/.cache/dialogdetective/models/

Use dialog_detective --list-models to see which models are currently cached and their sizes.