Fast Multi-view Consistent 3D Editing with Video Priors


AAAI2026

The Hong Kong Polytechnic University
Corresponding Author



Our proposed ViP3DE leverages video priors for fast multi-view consistent 3D editing, achieving high-quality results with a single forward pass.

Abstract

Text-driven 3D editing enables user-friendly 3D object or scene editing with text instructions. Due to the lack of multi-view consistency priors, existing methods typically resort to employ 2D generation or editing models to process per-view individually, followed by iterative 2D-3D-2D updating. However, these methods are not only time-consuming but also prone to yielding over-smoothed results, since iterative process averages the different editing signals gathered from different views. In this paper, we propose, for the first time to our best knowledge, generative Video Prior based 3D Editing, ViP3DE in short, to repurpose the temporal consistency priors from pre-trained video generation models to achieve consistent 3D editing within a single forward pass. Our key insight is to condition the video generation model on a single edited view to generate other consistent edited views for 3D updating directly, thereby bypassing iterative editing paradigm. First, 3D updating requires edited views to be paired with specific camera poses. To this end, we propose motion-preserved noise blending for the video model to generate edited views at predefined camera poses. In addition, we introduce geometrically aware denoising to further enhance multi-view consistency by integrating 3D geometric priors into video models. Extensive experiments demonstrate that our proposed ViP3DE can achieve high-quality 3D editing results even within a single forward pass, significantly outperforming existing methods in both editing quality and speed.


Results

Face Editing

Horn Object

Fangzhou Scene

Person Editing

Results comparison

BibTeX

@article{park2025vision,
  title={Fast Multi-view Consistent 3D Editing with Video Priors},
  author={Chen, Liyi and Li, Ruihuang and Zhang, Guowen and Wang, Pengfei and Zhang, Lei},
  journal={arXiv preprint arXiv:2511.23172},
  year={2025}
}

Acknowledgement

We sincerely thank the research community for their inspiring work on 3D editing and video generation models.