Reference-to-video, ranked first
Combine up to 7 reference images — subjects, environments, costumes, props, styles — into one consistent shot. No.1 on SuperCLUE's first reference-to-video leaderboard, No.1 on Artificial Analysis at Q3 launch.
Vidu is built around reference-to-video: hand it subjects, sets, costumes, props, and styles, and it composes them into one shot that holds — up to sixteen seconds with the audio already synced. In MML ONE, nine tiers answer your storyboard, episode after episode.
By ShengShu Technology
Five things Vidu does for serialized work — from ShengShu's own releases and leaderboard placements.
Combine up to 7 reference images — subjects, environments, costumes, props, styles — into one consistent shot. No.1 on SuperCLUE's first reference-to-video leaderboard, No.1 on Artificial Analysis at Q3 launch.
A single Q3 generation carries up to 16 seconds of video with synchronized native audio — dialogue, music, foley, ambience in layers.
Multi-shot composition, camera control, spatial continuity across shots, and a subject library that carries characters episode to episode.
Six classes of cinematic effects generated natively — particles, fluids, lighting, transitions, dynamic motion, camera moves.
Q2 introduced micro-expression acting and multiple-entity consistency, with a global API since October 2025 — the acting arrived before the spectacle.
Exactly what our catalog serves today — Q3 line first, then Q2.
Tier lineup as served in MML ONE on the day this page was written. The drama/ad-tuned rows are provider-channel configurations of Q3, not ShengShu-official tiers — the in-app catalog is the live truth.

Takes land in the storyboard as versioned assets. Recurring characters, anime and stylized work, and episode-to-episode continuity are what you route to Vidu.
Speech output covers English, Japanese, and Chinese only — narrower than the language lists of some competing families, per Vidu's own FAQ.
Reference video input is narrow: one tier (Q2 Pro), one to two clips of at most five seconds. Everything else works from image references.
The drama/ad-specialized configurations are China-channel offerings without vendor-published spec pages — their 16-second / 4K headline specs do not apply to the standard Q2/Q3 tiers (≤10 seconds, 1080p).
Start with a premise, a screenplay, or a folder of references. We'll set up your provider keys and walk through the first scene with you.
Cookie settings
We use analytics cookies to improve MML ONE. You can decline anytime. Privacy