Wan 2.6 incorporates video generation with reference characters and synchronized audio

16/12/2025

Wan 2.6 is a multimodal model that generates videos and images from text descriptions. The new version allows using characters from reference videos and creating multi-shot narratives with audiovisual synchronization.

Wan 2.6 incorporates video generation with reference characters and synchronized audio

The Wan 2.6 model introduces multimodal content generation capabilities that combine video, image and text. Among the highlighted functionalities is Starring, which allows incorporating characters from reference videos into new scenes while maintaining visual and voice consistency. The system analyzes up to 150 reference frames to preserve the appearance and voice timbre of characters, and supports up to three simultaneous references to create interactions between multiple entities.

The multi-shot narrative function converts simple prompts into structured video sequences, maintaining consistency of characters, scenarios and atmosphere throughout different shots. This capability enables developing more complex stories than single-shot generations.

Regarding video generation, Wan 2.6 produces 15-second clips in 1080p resolution with native audio-video synchronization. The system generates multi-speaker dialogues, natural lip-sync and audio quality comparable to professional studios. The current version improves instruction following, motion physics and aesthetic control compared to previous versions.

For image synthesis, the model offers control over lens and lighting parameters, with the ability to reference multiple images to maintain aesthetic consistency. The text-image generation function allows creating structured visual narratives that interleave both formats, using real-world knowledge and reasoning capabilities.

The model is designed for applications requiring visual and narrative coherence in multimedia content generation, from creating scenes with specific characters to producing sequences with complete narrative structure.

Key points

  • Wan 2.6 allows incorporating characters from reference videos into new scenes maintaining appearance and voice
  • The system analyzes up to 150 reference frames to preserve visual consistency
  • Supports up to three simultaneous references to create interactions between multiple characters
  • Generates 15-second videos in 1080p resolution with audio-video synchronization
  • Includes multi-speaker dialogues and natural lip-sync
  • Multi-shot narrative converts simple prompts into structured sequences
  • Offers control over lens and lighting parameters in image synthesis
  • Allows creating visual narratives that coherently interleave text and image

Videos

Related AI

Wan

Multimodal video and image generation

AI platform for visual content creation using generative models. Offers video and image generation from text, audio, and visual references. Includes editing tools and open-source ...

Lastest news

Trustpilot
This website uses technical, personalization and analysis cookies, both our own and from third parties, to facilitate anonymous browsing and analyze website usage statistics. We consider that if you continue browsing, you accept their use.