1st Workshop on
Generating Digital Twins from Images and Videos
International Conference on Computer Vision (ICCV 2025)
Oct 20th, 2025 - Room 319 A - Honolulu, Hawai'i
| Time | Info | |
|---|---|---|
| 8:00 | Organizers: Introductory Remarks | |
| 8:30 |
|
Keynote: Manolis Savva SFU Title: Towards realistic & interactive 3D simulation for embodied AI YouTube: https://youtu.be/P40vtWfiebQ Abstract
3D simulators are increasingly being used to develop and evaluate embodied AI agents
that perceive and act in realistic environments. Much of the prior work in this space
has treated simulators as "black boxes" within which learning algorithms are to be
deployed. However, the system characteristics of the simulation platforms themselves and
the datasets that are used with these platforms both greatly impact the feasibility and
the outcomes of experiments involving simulation. In this talk, I will describe recent
projects that outline emerging challenges and opportunities in the development of 3D
simulation for embodied AI, in particular focusing on controllable generation of
articulated objects and interactive 3D scenes for creating digital twins of interior
spaces.
|
| 9:00 |
|
Keynote: Katerina Fragkiadaki CMU Title: Sim-to-Real Translation for Scalable Benchmarking in Robotics YouTube: https://youtu.be/vknqhQMEESc Abstract
I will present our recent work on jointly segmenting and reconstructing 3D scenes
from images using compositional diffusion models, and on leveraging state-of-the-art
reality-to-simulation pipelines to create a general, scalable, and continually
evolving benchmark for robot generalist policies. This benchmark incorporates human
preference crowdsourcing to enable large-scale, adaptive evaluation of
vision-language-action models in simulation, building on recent advances in
real-to-sim translation and modern physics engines.
|
| 9:30 |
Spotlight Talks (6 minutes each)
DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness
Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi PICO: Reconstructing 3D People In Contact with Objects Alpár Cseke, Shashank Tripathi, Sai Kumar Dwivedi, Arjun Lakshmipathy, Agniv Chatterjee, Michael J. Black, Dimitrios Tzionas DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction Rui Wang, Quentin Lohmeyer, Mirko Meboldt, Siyu Tang Robot Learning from Any Images Siheng Zhao, Jiageng Mao, Wei Chow, Zeyu Shangguan, Tianheng Shi, Rong Xue, Yuxi Zheng, Yijia Weng, Yang You, Daniel Seita, Leonidas Guibas, Sergey Zakharov, Vitor Guizilini, Yue Wang MonoFusion: Sparse-View 4D Reconstruction via Monocular Fusion Zihan Wang, Jeff Tan, Tarasha Khurana, Neehar Peri, Deva Ramanan |
|
| 10:00 | Coffee & Posters ExHall II ASSIGNED BOARDS: Numbers 164 to 182 (refer to poster badges in the Accepted Papers section below) |
|
| 11:00 |
|
Keynote: Marc Pollefeys ETH Zurich Title: Spatial AI YouTube: https://youtu.be/EKHfl0rqaQw Abstract
In this talk we’ll discuss how to build rich 3D representations of the environment to
assist people and robots to perform tasks. We’ll first discuss how to build visual 3D
maps of environments and use those for visual (re)localization, spatial data access and
navigation. We’ll also discuss how to build rich 3D semantic representations that enable
queries and interactions with the scene.
|
| 11:30 |
|
Keynote: Jiajun Wu Stanford Title: Two Paradigms for Building Digital Twins YouTube: https://youtu.be/AmZaoqXtZv4 Abstract
Much of our visual world has its intrinsic, physical structure: scenes are made of
objects; objects have their geometry, texture, material, and physical properties. With
the rapid development of neural visual generative models, what role does such structural
information play, or do we still need it at all? In this talk, I will discuss our recent
efforts on scene and object understanding, reconstruction, and generation via distilling
such physical intrinsics from pre-trained vision models, and show that we can now build
models that infer object shape, texture, material, and physics, as well as scene
context, all from a single image or video, and use them for controllable,
action-conditioned 4D scene generation.
|
| 12:00 | Lunch | |
| 13:30 |
|
Keynote: Lei Li University of Virginia Title: Learning Interactable 3D Worlds YouTube: https://youtu.be/5_oiFCEl9Xc Abstract
I will present our recent work on modeling high-fidelity, interactable 3D virtual
worlds. A central challenge we aim to address is to create 3D worlds that are not only
visually realistic but also structurally grounded, enabling dynamic human interactions.
Our research begins by constructing large-scale, even infinite 3D scenes with diffusion
models that capture fine geometric details, providing a strong foundation for scene
generation. To further enrich these 3D scenes, we generate high-fidelity 3D objects with
image-based conditioning, and model part-level dynamics of objects to support functional
understanding and interactions. We then analyze human behaviors within these scene
contexts and synthesize realistic human motions interacting with 3D objects, thus
bringing 3D scenes to life. Together, these efforts represent a step toward building
faithful digital replicas of physical environments.
|
| 14:00 |
|
Keynote: Matthias
Nießner Technical University of Munich Title: A Digital Twin is all you need YouTube: https://youtu.be/uBj_Rz71SX0 Abstract
I will talk about our latest research on creating photo-realistic AI Avatars. Here, the
main goal is to create virtual characters that can are visually indistinguishable from
photos and videos of real people. Further, we aim to control such avatars with
multi-modal control signals such as animation rigs, text, or voice in order to replicate
real-world conversations and leverage our avatars for 3D content creation. Ultimately,
the goal is to witness the evolution of photos and videos into interactive, holographic
3D content that is indistinguishable from the physical reality. To this end, we focus on
the possibility of capturing and sharing 3D photos with friends, family, or through
social media platforms. Imagine the ability to comprehensively document historical
events along with the participating people for future generations, or to generate
content for upcoming applications in augmented and virtual reality.
|
| 14:30 |
|
Keynote: Yanpei Cao Chief Scientist & Co-founder @ Tripo AI Title: Beyond the Digital Statue: Generating Composable and Articulated 3D Assets YouTube: https://youtu.be/w9YJuVYoxu4 Abstract
blank
|
| 15:00 | Coffee | |
| 15:30 |
|
Keynote: Steve Xie Founder & CEO of Lightwheel Title: Closing the Sim2Real Gap with Physically Accurate SimReady Assets and Benchmarks YouTube: https://youtu.be/GGKMgcQqqJg Abstract
blank
|
| 16:00 | Discussion. Panelists: Manolis Savva, Marc Pollefeys, Lei Li, Matthias Nießner, Yanpei Cao |
|
| 17:00 | Organizers: Closing Remarks |
Loading papers...
We will have two submission tracks:
Papers should present original, unpublished work and follow a strict double-blind review. Manuscripts must be prepared with the ICCV 2025 Author Kit and be 4–8 pages (references and appendices may extend beyond 8 pages). Accepted papers will appear in the official ICCV 2025 Workshop Proceedings, and authors will present in person.
We follow the same policy as the main ICCV conference for this track. By submitting to this track, the authors acknowledge that it has not been previously published or accepted for publication in substantially similar form in any peer-reviewed venue including journal, conference or workshop, or archival forum. Furthermore, no publication substantially similar in content (defined as having 20 percent or more overlap) has been or will be registered or submitted to this or another conference, workshop, or journal during the review period. Violation of any of these conditions will lead to rejection, and will be reported to the other venue to which the submission was sent. For more details policy, please check the official policy at ICCV 2025.
Submit to the Archival Track via: OpenReview
This flexible track welcomes a wider range of contributions:
All submissions should be formated with the ICCV 2025 Author Kit. Submissions remain double-blind for review but will not appear in proceedings. Thus, papers that have already been published at major conferences are also welcome. Authors may also submit their work to future conferences or journals after acceptance to this workshop.
Submit to the Non-Archival Track via: OpenReview