ICCV - Digital Twins

Schedule - Room 319 A - Oct 20th, 2025

Time		Info
8:00		Organizers: Introductory Remarks
8:30		Keynote: Manolis Savva SFU Title: Towards realistic & interactive 3D simulation for embodied AI [YouTube] Abstract 3D simulators are increasingly being used to develop and evaluate embodied AI agents that perceive and act in realistic environments. Much of the prior work in this space has treated simulators as "black boxes" within which learning algorithms are to be deployed. However, the system characteristics of the simulation platforms themselves and the datasets that are used with these platforms both greatly impact the feasibility and the outcomes of experiments involving simulation. In this talk, I will describe recent projects that outline emerging challenges and opportunities in the development of 3D simulation for embodied AI, in particular focusing on controllable generation of articulated objects and interactive 3D scenes for creating digital twins of interior spaces.
9:00		Keynote: Katerina Fragkiadaki CMU Title: Sim-to-Real Translation for Scalable Benchmarking in Robotics [YouTube] Abstract I will present our recent work on jointly segmenting and reconstructing 3D scenes from images using compositional diffusion models, and on leveraging state-of-the-art reality-to-simulation pipelines to create a general, scalable, and continually evolving benchmark for robot generalist policies. This benchmark incorporates human preference crowdsourcing to enable large-scale, adaptive evaluation of vision-language-action models in simulation, building on recent advances in real-to-sim translation and modern physics engines.
9:30		Spotlight Talks (6 minutes each) DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi PICO: Reconstructing 3D People In Contact with Objects Alpár Cseke, Shashank Tripathi, Sai Kumar Dwivedi, Arjun Lakshmipathy, Agniv Chatterjee, Michael J. Black, Dimitrios Tzionas DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction Rui Wang, Quentin Lohmeyer, Mirko Meboldt, Siyu Tang Robot Learning from Any Images Siheng Zhao, Jiageng Mao, Wei Chow, Zeyu Shangguan, Tianheng Shi, Rong Xue, Yuxi Zheng, Yijia Weng, Yang You, Daniel Seita, Leonidas Guibas, Sergey Zakharov, Vitor Guizilini, Yue Wang MonoFusion: Sparse-View 4D Reconstruction via Monocular Fusion Zihan Wang, Jeff Tan, Tarasha Khurana, Neehar Peri, Deva Ramanan
10:00		Coffee & Posters ExHall II ASSIGNED BOARDS: Numbers 164 to 182 (refer to poster badges in the Accepted Papers section below)
11:00		Keynote: Marc Pollefeys ETH Zurich Title: Spatial AI YouTube Abstract In this talk we’ll discuss how to build rich 3D representations of the environment to assist people and robots to perform tasks. We’ll first discuss how to build visual 3D maps of environments and use those for visual (re)localization, spatial data access and navigation. We’ll also discuss how to build rich 3D semantic representations that enable queries and interactions with the scene.
11:30		Keynote: Jiajun Wu Stanford Title: Two Paradigms for Building Digital Twins [YouTube] Abstract Much of our visual world has its intrinsic, physical structure: scenes are made of objects; objects have their geometry, texture, material, and physical properties. With the rapid development of neural visual generative models, what role does such structural information play, or do we still need it at all? In this talk, I will discuss our recent efforts on scene and object understanding, reconstruction, and generation via distilling such physical intrinsics from pre-trained vision models, and show that we can now build models that infer object shape, texture, material, and physics, as well as scene context, all from a single image or video, and use them for controllable, action-conditioned 4D scene generation.
12:00		Lunch
13:30		Keynote: Lei Li University of Virginia Title: Learning Interactable 3D Worlds YouTube Abstract I will present our recent work on modeling high-fidelity, interactable 3D virtual worlds. A central challenge we aim to address is to create 3D worlds that are not only visually realistic but also structurally grounded, enabling dynamic human interactions. Our research begins by constructing large-scale, even infinite 3D scenes with diffusion models that capture fine geometric details, providing a strong foundation for scene generation. To further enrich these 3D scenes, we generate high-fidelity 3D objects with image-based conditioning, and model part-level dynamics of objects to support functional understanding and interactions. We then analyze human behaviors within these scene contexts and synthesize realistic human motions interacting with 3D objects, thus bringing 3D scenes to life. Together, these efforts represent a step toward building faithful digital replicas of physical environments.
14:00		Keynote: Matthias Nießner Technical University of Munich Title: A Digital Twin is all you need Abstract I will talk about our latest research on creating photo-realistic AI Avatars. Here, the main goal is to create virtual characters that can are visually indistinguishable from photos and videos of real people. Further, we aim to control such avatars with multi-modal control signals such as animation rigs, text, or voice in order to replicate real-world conversations and leverage our avatars for 3D content creation. Ultimately, the goal is to witness the evolution of photos and videos into interactive, holographic 3D content that is indistinguishable from the physical reality. To this end, we focus on the possibility of capturing and sharing 3D photos with friends, family, or through social media platforms. Imagine the ability to comprehensively document historical events along with the participating people for future generations, or to generate content for upcoming applications in augmented and virtual reality.
14:30		Keynote: Yanpei Cao Chief Scientist & Co-founder @ Tripo AI Title: Beyond the Digital Statue: Generating Composable and Articulated 3D Assets Abstract blank
15:00		Coffee
15:30		Keynote: Steve Xie Founder & CEO of Lightwheel Title: Closing the Sim2Real Gap with Physically Accurate SimReady Assets and Benchmarks Abstract blank
16:00		Discussion. Panelists: Manolis Savva, Marc Pollefeys, Lei Li, Matthias Nießner, Yanpei Cao
17:00		Organizers: Closing Remarks

Call for Papers

Targeted Topics

The gDT-IV Workshop at ICCV 2025 welcomes submissions on all aspects of generating digital twins from images and videos, especially (but not limited to):

Applications of digital twins in robotics, media content creation, AI-driven video synthesis, interior design, construction monitoring, and gaming.
Streaming and compression techniques for sharing large 3D scenes over the web.
Synchronization of the Digital Twin model with the video stream.
3D Gaussian Splatting (3DGS) and its extensions for dynamic scenes.
Lighting variations, shadows, material properties, and reflections in 3DGS reconstruction.
Modeling articulated object parts and their motion constraints from video (e.g., cabinets).
Inpainting and completion techniques for occluded regions in complex indoor scenes.
Retrieval-based and generative hybrid pipelines for 3D object and scene creation.
Learning object properties (mass, softness, friction) from videos.
Physics-informed 3D generation, blending physical simulation with visual generation.
Semantic 3D segmentation for scene understanding (object-level vs. fine-grained).

Submission Tracks

We will have two submission tracks:

Archival Track

Papers should present original, unpublished work and follow a strict double-blind review. Manuscripts must be prepared with the ICCV 2025 Author Kit and be 4–8 pages (references and appendices may extend beyond 8 pages). Accepted papers will appear in the official ICCV 2025 Workshop Proceedings, and authors will present in person.

We follow the same policy as the main ICCV conference for this track. By submitting to this track, the authors acknowledge that it has not been previously published or accepted for publication in substantially similar form in any peer-reviewed venue including journal, conference or workshop, or archival forum. Furthermore, no publication substantially similar in content (defined as having 20 percent or more overlap) has been or will be registered or submitted to this or another conference, workshop, or journal during the review period. Violation of any of these conditions will lead to rejection, and will be reported to the other venue to which the submission was sent. For more details policy, please check the official policy at ICCV 2025.

Submit to the Archival Track via: OpenReview

Non-Archival Track

This flexible track welcomes a wider range of contributions:
- Extended abstracts & short papers — up to 4 pages, suitable for work in progress, negative results, or position papers. References and appendices may extend beyond 4 pages.
- Previously published work — including papers accepted at ICCV 2025 or other venues in the last year.
All submissions should be formated with the ICCV 2025 Author Kit. Submissions remain double-blind for review but will not appear in proceedings. Thus, papers that have already been published at major conferences are also welcome. Authors may also submit their work to future conferences or journals after acceptance to this workshop.

Submit to the Non-Archival Track via: OpenReview

Important Dates

~~Archival Submission Deadline: 27 June 2025~~
~~Archival Author Notifications: 10 July 2025~~
~~Archival Camera-Ready: 13 August 2025~~

~~Non-Archival Submission Deadline: 19 August 2025~~
~~Non-Archival Author Notifications: 14 September 2025~~
~~Non-Archival Camera-Ready: 28 September 2025~~