DialogDetective

Stop guessing which episode is which. Let AI listen to the dialogue and figure it out for you.

AI-Powered Identification

Uses modern LLMs like Gemini or Claude to intelligently match dialogue to the correct episode. No manual guesswork required.

Dialogue Analysis

A unique approach: extracts audio, transcribes speech using Whisper models, then identifies episodes by what characters actually say.

Physical Media Made Easy

Finally organize those DVD and Blu-ray rips. Turn cryptic filenames like TITLE_01.mkv into properly named episode files.

Terminal

Quick Start

cargo install dialog_detective

Available on crates.io

Download from GitHub Releases

Pre-built binaries for macOS, Linux, and Windows

macOS: Metal GPU acceleration • Linux/Windows: CPU-only (build from source for GPU support)

git clone https://github.com/jakobwesthoff/DialogDetective.git
cd DialogDetective
cargo build --release

FFmpeg

Required for audio extraction. ffmpeg.org

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# Windows: Download from https://ffmpeg.org/download.html

AI CLI (install one)

Gemini CLI (recommended) or Claude Code

Whisper Models

Downloaded automatically on first run from HuggingFace.

Requirements

DialogDetective requires external tools to function:

  • FFmpeg - for audio extraction from video files
  • Gemini CLI or Claude Code - for AI-powered episode matching

See the Prerequisites tab above for installation instructions.

Basic Usage

# Dry run - preview what would happen
dialog_detective ./videos "Breaking Bad" -s 1

# Rename files in place
dialog_detective ./videos "Breaking Bad" --mode rename -s 1

# Copy to organized directory
dialog_detective ./videos "Breaking Bad" --mode copy -o ./organized -s 1

# List available Whisper models
dialog_detective --list-models

Documentation

DialogDetective takes the guesswork out of organizing TV series rips. Point it at a directory of video files, tell it the show name, and let AI do the detective work.

The process is simple: extract audio from each video using FFmpeg, transcribe the dialogue using Whisper, fetch episode metadata from TVMaze, then use an LLM to match what was said to the correct episode. Finally, rename or copy the files with proper episode information.

CLI Usage

dialog_detective <VIDEO_DIR> <SHOW_NAME> [OPTIONS]

Options

OptionDefaultDescription
<VIDEO_DIR>RequiredDirectory to scan for video files
<SHOW_NAME>RequiredTV series name for metadata lookup
-s, --season <N>AllFilter to specific season(s), repeatable
--model <NAME>baseWhisper model (tiny/base/small/medium/large)
--model-path <PATH>-Custom Whisper model file path
-m, --matcher <BACKEND>geminiAI backend: gemini or claude
--mode <MODE>dry-runOperation: dry-run, rename, or copy
-o, --output-dir <DIR>-Output directory (required for copy mode)
--format <PATTERN>See belowCustom filename template
--list-models-List available Whisper models

Operation Modes

DialogDetective supports three operation modes, controlled by the --mode option:

ModeDescription
dry-runDefault. Shows what would happen without modifying any files. Always run this first to verify the matches are correct.
renameRenames files in place with proper episode information.
copyCopies files to a new location (requires --output-dir). Original files remain untouched.
# Preview changes (always do this first)
dialog_detective ./videos "Breaking Bad" -s 1

# Rename files in place
dialog_detective ./videos "Breaking Bad" -s 1 --mode rename

# Copy to organized directory
dialog_detective ./videos "Breaking Bad" -s 1 --mode copy -o ./organized

Season Filtering

Highly Recommended

Always use --season when you know which season your files belong to:

  • Dramatically improves matching accuracy
  • Reduces LLM context size (fewer episodes to choose from)
  • Saves API tokens
  • Faster processing

Since you're typically processing a single season at a time when ripping discs, specifying the correct season makes the tool much more effective.

# Process only season 1 files
dialog_detective ./videos "Breaking Bad" -s 1

# Process multiple seasons
dialog_detective ./videos "Breaking Bad" -s 1 -s 2

The season filter limits the matching scope. If you specify -s 1 and a video file is actually from season 2, it will likely be mismatched to a season 1 episode. Only use season filtering when you know all your video files belong to the specified season(s).

Filename Templates

Use --format to customize output filenames. The default template is:

{show} - S{season:02}E{episode:02} - {title}.{ext}

Available Variables

VariableDescription
{show}Series name
{season} / {season:02}Season number (use :02 for zero-padding, e.g., "01")
{episode} / {episode:02}Episode number (use :02 for zero-padding, e.g., "07")
{title}Episode title
{ext}Original file extension (without dot)
# Custom format example
dialog_detective ./videos "The Flash" -s 1 \
  --format "{show} S{season:02}E{episode:02} {title}.{ext}"

Whisper Models

DialogDetective uses Whisper for speech-to-text transcription. Models are automatically downloaded from HuggingFace on first use.

Model Selection

Choose a model based on your needs:

ModelSizeSpeedAccuracyNotes
tiny~39MBFastestLowerGood for testing
base~142MBFastGoodDefault. Best balance of speed and accuracy
small~466MBMediumBetterGood for non-English content
medium~1.5GBSlowerHighRecommended if base struggles
large-v3~2.9GBSlowestHighestBest accuracy, requires more RAM
large-v3-turbo~809MBMediumHighGood compromise for large model quality

English-only variants (tiny.en, base.en, etc.) are slightly more accurate for English content. Quantized variants (-q5_0, -q5_1, -q8_0) are smaller but slightly less accurate.

# List all available models and see which are cached
dialog_detective --list-models

# Use a specific model
dialog_detective ./videos "Show" -s 1 --model large-v3-turbo

GPU Acceleration

DialogDetective uses whisper-rs for speech-to-text, which supports various GPU backends for faster transcription.

Pre-built Binaries

PlatformGPU BackendNotes
macOSMetalApple GPU acceleration enabled by default
LinuxCPU-onlyBuild from source for GPU support
WindowsCPU-onlyBuild from source for GPU support

Building with GPU Support

If you have the required GPU frameworks installed on Linux or Windows, you can build with GPU acceleration:

# NVIDIA CUDA (requires CUDA toolkit)
cargo build --release --features cuda

# Vulkan (requires Vulkan SDK)
cargo build --release --features vulkan

# AMD ROCm/hipBLAS (requires ROCm)
cargo build --release --features hipblas

See the whisper-rs documentation for detailed requirements for each GPU backend.

AI Backend

DialogDetective uses external CLI tools for LLM access. You must have one of the following installed and authenticated:

The CLI must be working independently before DialogDetective can use it. Test with gemini or claude in your terminal.

# Use Gemini (default)
dialog_detective ./videos "Show" -s 1

# Use Claude
dialog_detective ./videos "Show" -s 1 --matcher claude

The interface is abstracted to easily support direct API access in the future. Contributions welcome!

Cache & Storage

DialogDetective caches various data to avoid redundant processing and speed up repeated runs.

Cache Location

All cached data is stored in a platform-specific cache directory:

  • macOS: ~/Library/Caches/de.westhoffswelt.dialogdetective/
  • Linux: ~/.cache/dialogdetective/
  • Windows: %LOCALAPPDATA%\dialogdetective\

What Gets Cached

DataDirectoryTTLWhy Cached
Whisper Modelsmodels/PermanentModels are large (39MB - 2.9GB) and don't change. Downloaded once from HuggingFace on first use.
Series Metadatametadata/24 hoursEpisode lists from TVMaze rarely change. Caching reduces API calls and speeds up repeated runs on the same show.
Transcriptstranscripts/24 hoursWhisper transcription is CPU/GPU intensive. Caching by video file hash means re-running on the same files skips transcription entirely.
Match Resultsmatching/24 hoursLLM matching costs tokens and time. Results are cached by a composite key (video hash + show + seasons + matcher), so identical queries return instantly.

The 24-hour TTL balances freshness with efficiency. If you need to force a refresh (e.g., after TVMaze updates episode data), simply delete the relevant cache subdirectory.

Temporary Files

During processing, DialogDetective extracts audio to temporary WAV files in your system's temp directory (/tmp, /var/folders/..., or %TEMP%). These files are automatically cleaned up when processing completes or if the program is interrupted.

Managing Cache

To clear all cached data:

# macOS
rm -rf ~/Library/Caches/de.westhoffswelt.dialogdetective/

# Linux
rm -rf ~/.cache/dialogdetective/

To clear only models (to free disk space):

# macOS
rm -rf ~/Library/Caches/de.westhoffswelt.dialogdetective/models/

# Linux
rm -rf ~/.cache/dialogdetective/models/

Use dialog_detective --list-models to see which models are currently cached and their sizes.