Model family · Video

The take comes back talking.

Kling generates up to fifteen seconds of multi-shot cinema with the dialogue already in it — five languages, per-character voices, camera moves read straight from the prompt. In MML ONE it sits one click from your storyboard: nine tiers, routed shot by shot.

By Kling AI (Kuaishou)

What it's best at

Built for scenes that speak.

Five things Kling does that most video models don't — each from the vendor's own releases, not our imagination.

01

Dialogue, native

Speech in English, Chinese, Japanese, Korean, and Spanish — English accents and Chinese dialects included — and each character can speak a different language, in the order you direct.

02

Fifteen seconds, many shots

One generation carries shot-reverse-shot, cross-cutting, and voice-over — multi-shot storytelling understood straight from the prompt.

03

A character from one clip

3.0 Omni lifts a character's look and voice from a single reference video, then re-stages them scene to scene — per-shot duration, framing, angle, camera move.

04

Edit with a sentence

O1 generates and edits in one model: swap the subject, change the weather or the style on existing footage. No masks, no keyframes, up to seven mixed inputs.

05

Signage survives

On-screen text and logos stay legible through generation — storefronts, labels, and branded wardrobe keep their lettering. Product films care about this.

In MML ONE

Nine Kling tiers, ready to route.

Exactly what our catalog serves today — vendor names, newest first.

Kling 3.0The flagship: up to 15-second takes, native audio, multi-shot storytelling from the prompt.
Kling 3.0 OmniReference-video character cloning — look and voice — with per-shot storyboard control.
Kling 3.0 TurboThe fast tier (June 2026). First-frame input only — no reference images or video.
Kling O1The generate-and-edit model: 3–10 second clips, up to 7 mixed inputs, prompt-based edits on existing footage.
Kling 2.6The first Kling with native audio (December 2025). First/last-frame mode stays silent.
Kling 2.5Silent tier that topped the Artificial Analysis Video Arena text-to-video and image-to-video charts within a week of its 2025 release.
Kling 2.1The 2.x workhorse line, up to 1080p, silent.
Kling 2.0Earlier 2.x generation, up to 1080p, silent.
Kling 1.6The earliest tier still in service.

Tier lineup as served in MML ONE on the day this page was written. Kling ships fast — the catalog inside the app is the live truth.

Paper-cut collage of a film strip unspooling across a paper stage, a little projector beam lighting a stack of take cards
The honest part

Know before you route.

The newest tiers gate behind premium access first, and the fast tier trades inputs for speed — Kling 3.0 Turbo takes a first frame only, no references.

Audio arrived at 2.6. Everything below it is a silent model, and even 2.6 goes silent in first/last-frame mode.

Character-from-video work lives on O1 and 3.0 Omni only; every other tier works from image references and first/last frames.

Public Alpha

Bring one film into the graph.

Start with a premise, a screenplay, or a folder of references. We'll set up your provider keys and walk through the first scene with you.