1st Workshop on
Generating Digital Twins from Images and Videos
International Conference on Computer Vision (ICCV 2025)
Oct 20th, 2025 - Room 319 A - Honolulu, Hawai'i
Time | Info | |
---|---|---|
8:00 | Organizers: Introductory Remarks | |
8:30 |
![]() |
Keynote: Manolis Savva SFU Title: Towards realistic & interactive 3D simulation for embodied AI Abstract
3D simulators are increasingly being used to develop and evaluate embodied AI agents that perceive and act in realistic environments. Much of the prior work in this space has treated simulators as "black boxes" within which learning algorithms are to be deployed. However, the system characteristics of the simulation platforms themselves and the datasets that are used with these platforms both greatly impact the feasibility and the outcomes of experiments involving simulation. In this talk, I will describe recent projects that outline emerging challenges and opportunities in the development of 3D simulation for embodied AI, in particular focusing on controllable generation of articulated objects and interactive 3D scenes for creating digital twins of interior spaces.
|
9:00 |
![]() |
Keynote: Katerina Fragkiadaki CMU Title: Sim-to-Real Translation for Scalable Benchmarking in Robotics Abstract
I will present our recent work on jointly segmenting and reconstructing 3D scenes from images using compositional diffusion models, and on leveraging state-of-the-art reality-to-simulation pipelines to create a general, scalable, and continually evolving benchmark for robot generalist policies. This benchmark incorporates human preference crowdsourcing to enable large-scale, adaptive evaluation of vision-language-action models in simulation, building on recent advances in real-to-sim translation and modern physics engines.
|
9:30 |
Spotlight Talks (6 minutes each)
DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness
Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi PICO: Reconstructing 3D People In Contact with Objects Alpár Cseke, Shashank Tripathi, Sai Kumar Dwivedi, Arjun Lakshmipathy, Agniv Chatterjee, Michael J. Black, Dimitrios Tzionas DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction Rui Wang, Quentin Lohmeyer, Mirko Meboldt, Siyu Tang Robot Learning from Any Images Siheng Zhao, Jiageng Mao, Wei Chow, Zeyu Shangguan, Tianheng Shi, Rong Xue, Yuxi Zheng, Yijia Weng, Yang You, Daniel Seita, Leonidas Guibas, Sergey Zakharov, Vitor Guizilini, Yue Wang MonoFusion: Sparse-View 4D Reconstruction via Monocular Fusion Zihan Wang, Jeff Tan, Tarasha Khurana, Neehar Peri, Deva Ramanan |
|
10:00 | Coffee & Posters ExHall II ASSIGNED BOARDS: Numbers 164 to 182 (refer to poster badges in the Accepted Papers section below) |
|
11:00 |
![]() |
Keynote: Marc Pollefeys ETH Zurich Title: Spatial AI Abstract
In this talk we’ll discuss how to build rich 3D representations of the environment to assist people and robots to perform tasks. We’ll first discuss how to build visual 3D maps of environments and use those for visual (re)localization, spatial data access and navigation. We’ll also discuss how to build rich 3D semantic representations that enable queries and interactions with the scene.
|
11:30 |
![]() |
Keynote: Jiajun Wu Stanford Title: blank Abstract
blank
|
12:00 | Lunch | |
13:30 |
![]() |
Keynote: Lei Li University of Virginia Title: Learning Interactable 3D Worlds Abstract
I will present our recent work on modeling high-fidelity, interactable 3D virtual worlds. A central challenge we aim to address is to create 3D worlds that are not only visually realistic but also structurally grounded, enabling dynamic human interactions. Our research begins by constructing large-scale, even infinite 3D scenes with diffusion models that capture fine geometric details, providing a strong foundation for scene generation. To further enrich these 3D scenes, we generate high-fidelity 3D objects with image-based conditioning, and model part-level dynamics of objects to support functional understanding and interactions. We then analyze human behaviors within these scene contexts and synthesize realistic human motions interacting with 3D objects, thus bringing 3D scenes to life. Together, these efforts represent a step toward building faithful digital replicas of physical environments.
|
14:00 |
![]() |
Keynote: Matthias Nießner Technical University of Munich Title: Photo-realistic AI Avatars Abstract
I will talk about our latest research on creating photo-realistic AI Avatars. Here, the main goal is to create virtual characters that can are visually indistinguishable from photos and videos of real people. Further, we aim to control such avatars with multi-modal control signals such as animation rigs, text, or voice in order to replicate real-world conversations and leverage our avatars for 3D content creation. Ultimately, the goal is to witness the evolution of photos and videos into interactive, holographic 3D content that is indistinguishable from the physical reality. To this end, we focus on the possibility of capturing and sharing 3D photos with friends, family, or through social media platforms. Imagine the ability to comprehensively document historical events along with the participating people for future generations, or to generate content for upcoming applications in augmented and virtual reality.
|
14:30 |
![]() |
Keynote: Yanpei Cao Chief Scientist & Co-founder @ Tripo AI Title: Beyond the Digital Statue: Generating Composable and Articulated 3D Assets Abstract
blank
|
15:00 | Coffee | |
15:30 |
![]() |
Keynote: Steve Xie Founder & CEO of Lightwheel Title: Closing the Sim2Real Gap with Physically Accurate SimReady Assets and Benchmarks Abstract
blank
|
16:00 | Discussion. Panelists: Manolis Savva, Katerina Fragkiadaki, Marc Pollefeys, Jiajun Wu, Angela Dai, Matthias Nießner, Yanpei Cao, Steve Xie |
|
17:00 | Organizers: Closing Remarks |
Loading papers...
We will have two submission tracks:
Papers should present original, unpublished work and follow a strict double-blind review. Manuscripts must be prepared with the ICCV 2025 Author Kit and be 4–8 pages (references and appendices may extend beyond 8 pages). Accepted papers will appear in the official ICCV 2025 Workshop Proceedings, and authors will present in person.
We follow the same policy as the main ICCV conference for this track. By submitting to this track, the authors acknowledge that it has not been previously published or accepted for publication in substantially similar form in any peer-reviewed venue including journal, conference or workshop, or archival forum. Furthermore, no publication substantially similar in content (defined as having 20 percent or more overlap) has been or will be registered or submitted to this or another conference, workshop, or journal during the review period. Violation of any of these conditions will lead to rejection, and will be reported to the other venue to which the submission was sent. For more details policy, please check the official policy at ICCV 2025.
Submit to the Archival Track via: OpenReview
This flexible track welcomes a wider range of contributions:
All submissions should be formated with the ICCV 2025 Author Kit. Submissions remain double-blind for review but will not appear in proceedings. Thus, papers that have already been published at major conferences are also welcome. Authors may also submit their work to future conferences or journals after acceptance to this workshop.
Submit to the Non-Archival Track via: OpenReview