Documentation
DialogDetective takes the guesswork out of organizing TV series rips. Point it at a directory of video files,
tell it the show name, and let AI do the detective work.
The process is simple: extract audio from each video using FFmpeg,
transcribe the dialogue using Whisper,
fetch episode metadata from TVMaze, then use an LLM to match
what was said to the correct episode. Finally, rename or copy the files with proper episode information.
CLI Usage
dialog_detective <VIDEO_DIR> <SHOW_NAME> [OPTIONS]
Options
| Option |
Default |
Description |
<VIDEO_DIR> |
Required |
Directory to scan for video files |
<SHOW_NAME> |
Required |
TV series name for metadata lookup |
-s, --season <N> |
All |
Filter to specific season(s), repeatable |
--model <NAME> |
base |
Whisper model (tiny/base/small/medium/large) |
--model-path <PATH> |
- |
Custom Whisper model file path |
-m, --matcher <BACKEND> |
gemini |
AI backend: gemini or claude |
--mode <MODE> |
dry-run |
Operation: dry-run, rename, or copy |
-o, --output-dir <DIR> |
- |
Output directory (required for copy mode) |
--format <PATTERN> |
See below |
Custom filename template |
--list-models |
- |
List available Whisper models |
Operation Modes
DialogDetective supports three operation modes, controlled by the --mode option:
| Mode |
Description |
dry-run |
Default. Shows what would happen without modifying any files. Always run this first to verify the matches are correct. |
rename |
Renames files in place with proper episode information. |
copy |
Copies files to a new location (requires --output-dir). Original files remain untouched. |
# Preview changes (always do this first)
dialog_detective ./videos "Breaking Bad" -s 1
# Rename files in place
dialog_detective ./videos "Breaking Bad" -s 1 --mode rename
# Copy to organized directory
dialog_detective ./videos "Breaking Bad" -s 1 --mode copy -o ./organized
Season Filtering
Highly Recommended
Always use --season when you know which season your files belong to:
- Dramatically improves matching accuracy
- Reduces LLM context size (fewer episodes to choose from)
- Saves API tokens
- Faster processing
Since you're typically processing a single season at a time when ripping discs, specifying the correct season makes the tool much more effective.
# Process only season 1 files
dialog_detective ./videos "Breaking Bad" -s 1
# Process multiple seasons
dialog_detective ./videos "Breaking Bad" -s 1 -s 2
Warning
The season filter limits the matching scope. If you specify -s 1 and a video file is actually from season 2, it will likely be mismatched to a season 1 episode. Only use season filtering when you know all your video files belong to the specified season(s).
Filename Templates
Use --format to customize output filenames. The default template is:
{show} - S{season:02}E{episode:02} - {title}.{ext}
Available Variables
| Variable |
Description |
{show} |
Series name |
{season} / {season:02} |
Season number (use :02 for zero-padding, e.g., "01") |
{episode} / {episode:02} |
Episode number (use :02 for zero-padding, e.g., "07") |
{title} |
Episode title |
{ext} |
Original file extension (without dot) |
# Custom format example
dialog_detective ./videos "The Flash" -s 1 \
--format "{show} S{season:02}E{episode:02} {title}.{ext}"
Whisper Models
DialogDetective uses Whisper for speech-to-text transcription.
Models are automatically downloaded from HuggingFace on first use.
Model Selection
Choose a model based on your needs:
| Model |
Size |
Speed |
Accuracy |
Notes |
tiny |
~39MB |
Fastest |
Lower |
Good for testing |
base |
~142MB |
Fast |
Good |
Default. Best balance of speed and accuracy |
small |
~466MB |
Medium |
Better |
Good for non-English content |
medium |
~1.5GB |
Slower |
High |
Recommended if base struggles |
large-v3 |
~2.9GB |
Slowest |
Highest |
Best accuracy, requires more RAM |
large-v3-turbo |
~809MB |
Medium |
High |
Good compromise for large model quality |
English-only variants (tiny.en, base.en, etc.) are slightly more accurate for English content.
Quantized variants (-q5_0, -q5_1, -q8_0) are smaller but slightly less accurate.
# List all available models and see which are cached
dialog_detective --list-models
# Use a specific model
dialog_detective ./videos "Show" -s 1 --model large-v3-turbo
GPU Acceleration
DialogDetective uses whisper-rs for speech-to-text,
which supports various GPU backends for faster transcription.
Pre-built Binaries
| Platform |
GPU Backend |
Notes |
| macOS |
Metal |
Apple GPU acceleration enabled by default |
| Linux |
CPU-only |
Build from source for GPU support |
| Windows |
CPU-only |
Build from source for GPU support |
Building with GPU Support
If you have the required GPU frameworks installed on Linux or Windows, you can build with GPU acceleration:
# NVIDIA CUDA (requires CUDA toolkit)
cargo build --release --features cuda
# Vulkan (requires Vulkan SDK)
cargo build --release --features vulkan
# AMD ROCm/hipBLAS (requires ROCm)
cargo build --release --features hipblas
See the whisper-rs documentation
for detailed requirements for each GPU backend.
AI Backend
DialogDetective uses external CLI tools for LLM access. You must have one of the following installed and authenticated:
The CLI must be working independently before DialogDetective can use it. Test with gemini or claude in your terminal.
# Use Gemini (default)
dialog_detective ./videos "Show" -s 1
# Use Claude
dialog_detective ./videos "Show" -s 1 --matcher claude
The interface is abstracted to easily support direct API access in the future. Contributions welcome!
Cache & Storage
DialogDetective caches various data to avoid redundant processing and speed up repeated runs.
Cache Location
All cached data is stored in a platform-specific cache directory:
- macOS:
~/Library/Caches/de.westhoffswelt.dialogdetective/
- Linux:
~/.cache/dialogdetective/
- Windows:
%LOCALAPPDATA%\dialogdetective\
What Gets Cached
| Data |
Directory |
TTL |
Why Cached |
| Whisper Models |
models/ |
Permanent |
Models are large (39MB - 2.9GB) and don't change. Downloaded once from HuggingFace on first use. |
| Series Metadata |
metadata/ |
24 hours |
Episode lists from TVMaze rarely change. Caching reduces API calls and speeds up repeated runs on the same show. |
| Transcripts |
transcripts/ |
24 hours |
Whisper transcription is CPU/GPU intensive. Caching by video file hash means re-running on the same files skips transcription entirely. |
| Match Results |
matching/ |
24 hours |
LLM matching costs tokens and time. Results are cached by a composite key (video hash + show + seasons + matcher), so identical queries return instantly. |
The 24-hour TTL balances freshness with efficiency. If you need to force a refresh (e.g., after TVMaze updates episode data), simply delete the relevant cache subdirectory.
Temporary Files
During processing, DialogDetective extracts audio to temporary WAV files in your system's temp directory
(/tmp, /var/folders/..., or %TEMP%). These files are automatically
cleaned up when processing completes or if the program is interrupted.
Managing Cache
To clear all cached data:
# macOS
rm -rf ~/Library/Caches/de.westhoffswelt.dialogdetective/
# Linux
rm -rf ~/.cache/dialogdetective/
To clear only models (to free disk space):
# macOS
rm -rf ~/Library/Caches/de.westhoffswelt.dialogdetective/models/
# Linux
rm -rf ~/.cache/dialogdetective/models/
Use dialog_detective --list-models to see which models are currently cached and their sizes.