Recent advances in foundation models—such as GPT for language and CLIP/Segment Anything for images—have driven rapid progress in two-dimensional (2D) and textual data processing. However, an expanding range of real-world applications, including large-scale autonomous systems, robotics, extended reality, and molecular modeling, requires deeper three-dimensional (3D) understanding. Moving beyond flat images to spatially grounded 3D representations is essential for addressing a broader set of physical and spatial challenges.
Many current 3D learning techniques (e.g., 3D reconstruction via Structure-from-Motion with dense stereo) still rely on sequential pipelines that are slow, prone to errors, and challenging to scale to massive, web-scale data. This workshop therefore aims to explore how these fragmented approaches can be replaced by a single, differentiable framework—one that processes raw imagery to directly produce complete 3D outputs. Such an end-to-end design can be trained on large-scale unannotated data and can enable downstream tasks that require a generalizable, real-time understanding of real-world geometry and semantics.
More specifically, the End-to-End 3D Learning (E2E3D) workshop will examine how best to unify modeling, inference, and optimization into a single data-driven architecture. By allowing AI systems to perceive images and generate 3D content, reason about geometric and semantic relationships, and integrate multi-modal signals, we can advance content creation, machine perception, and autonomous control.
This workshop brings together researchers from computer vision, robotics, extended reality (XR), autonomous driving, scientific imaging, and related fields to foster interdisciplinary discussions on next-generation 3D systems. By spotlighting recent breakthroughs and identifying key challenges, we aim to inspire innovative research and practical applications across these domains.
Jerome Revaud
Naver Labs Europe
Large geometric models: DUSt3R, MAST3R
Ruoshi Liu
Columbia University
Large generative models for the physical world
Iro Armeni
Stanford University
Creating and updating 3D of evolving scenes
Rakesh Ranjan
Meta Reality Labs
Large-scale 3D models for the Metaverse
Hongyang Li
The University of Hong Kong
Large-scale models for autonomous driving
Jiakai Zhang
ShanghaiTech University
Large-scale models for Cryo-EM
13:00 - 13:40 | Opening Remarks | |
13:40 - 14:15 | Invited Talk 1 | TBD |
14:15 - 14:50 | Invited Talk 2 | TBD |
14:50 - 15:20 | Oral Talks | TBD |
15:20 - 15:35 | Break | |
15:35 - 16:10 | Invited Talk 3 | TBD |
16:10 - 16:45 | Invited Talk 4 | TBD |
16:45 - 17:20 | Invited Talk 5 | TBD |
17:20 - 17:55 | Invited Talk 6 | TBD |
17:55 - 18:00 | Closing Remarks |
We invite both short (up to 4 pages) and long (up to 8 pages) paper submissions, excluding references and supplementary materials. Short papers may introduce original but unfinished research or serve as technical reports that present implementations using open source frameworks. Authors can opt for archival or non-archival submissions; non-archival submissions may be concurrently under review elsewhere if external policies permit.
All accepted papers will be presented as posters, with three selected for oral presentations. A single best paper will be chosen from among the long papers, accompanied by a cash prize from our sponsors.