Technology

MioSub's subtitle generation engine and editor are built from scratch, optimized for AI-powered subtitle workflows.

Subtitle Generation Engine

Automatic Terminology Extraction

The problem with traditional machine translation: inconsistent proper noun translations — "東京" becomes "Tokyo" in one place and "Dongjing" in another.

MioSub's approach:

Intelligently extract proper nouns from audio (names, places, titles, etc.)
Verify standard translations via search engines
Generate a glossary for consistent terminology throughout the video

Long-Context Translation

Traditional MT processes sentence by sentence, losing context and meaning.

MioSub's approach:

Segment by semantics into 5-10 minute chunks
Preserve full context for translation, understanding speaker intent
Support scene presets (anime, film, news, tech) for optimized translation style

Post-Transcription Processing

Raw Whisper output has issues with sentence segmentation and timing accuracy.

MioSub's approach:

Smart segmentation: Auto-split subtitles based on semantics and pauses
Timeline correction: Fix timing drift in Whisper output
Terminology replacement: Auto-apply glossary for consistent translations

CTC Forced Alignment

High-precision timeline alignment based on CTC (Connectionist Temporal Classification) technology.

Millisecond-level character alignment
Built-in aligner in v3.0, works out of the box
Auto-downloads model on first use

Speaker Identification

Automatically identify and label speakers in multi-person conversations.

LLM-based speaker inference
Customizable speaker names and colors
Merge adjacent subtitles from the same speaker

Smart Concurrency Control

Dynamically adjust concurrency based on model to maximize speed while avoiding rate limits:

Model	Concurrency	Strategy
Gemini Flash	5	Speed priority
Gemini Pro	2	Avoid rate limits

Result: 30-minute video processed in ~8-10 minutes.

Fully Automated Pipeline

Paste a video link (YouTube/Bilibili), and the entire workflow runs automatically:

Auto Download — yt-dlp fetches the best quality video
Audio Extraction — Extract audio and perform VAD segmentation
Smart Transcription — Whisper speech-to-text
AI Translation & Polish — Gemini context-aware translation and refinement
Auto Hardcoding — FFmpeg burns bilingual subtitles (GPU acceleration supported)
Output — Ready-to-share MP4 with hardcoded subtitles

Subtitle Editor

Real-Time Preview

Built-in assjs rendering engine for accurate font, color, and position rendering
WYSIWYG — edit and preview simultaneously
One-click toggle between source and translated text for quick review

Smart Caching

Efficient transcoding cache for smooth playback
Audio-only file support with adaptive player UI

Batch Operations

Batch Regenerate: Re-run the full pipeline on selected segments (transcription → polish → align → translate)
Polish Translation: Optimize translation quality for selected segments while maintaining context
Auto-save snapshots before operations, rollback anytime

Version Management

Auto-save edit history
Snapshot rollback support
Prevent accidental work loss

On this page