End-to-End 3D Learning Workshop

Overview

Recent advances in foundation models—such as GPT for language and CLIP/Segment Anything for images—have driven rapid progress in two-dimensional (2D) and textual data processing. However, an expanding range of real-world applications, including large-scale autonomous systems, robotics, extended reality, and molecular modeling, requires deeper three-dimensional (3D) understanding. Moving beyond flat images to spatially grounded 3D representations is essential for addressing a broader set of physical and spatial challenges.

Many current 3D learning techniques (e.g., 3D reconstruction via Structure-from-Motion with dense stereo) still rely on sequential pipelines that are slow, prone to errors, and challenging to scale to massive, web-scale data. This workshop therefore aims to explore how these fragmented approaches can be replaced by a single, differentiable framework—one that processes raw imagery to directly produce complete 3D outputs. Such an end-to-end design can be trained on large-scale unannotated data and can enable downstream tasks that require a generalizable, real-time understanding of real-world geometry and semantics.

More specifically, the End-to-End 3D Learning (E2E3D) workshop will examine how best to unify modeling, inference, and optimization into a single data-driven architecture. By allowing AI systems to perceive images and generate 3D content, reason about geometric and semantic relationships, and integrate multi-modal signals, we can advance content creation, machine perception, and autonomous control.

Focus of the Workshop:

Technology Development: A central question in end-to-end 3D learning is how to replace traditional multi-stage 3D pipelines with a single, differentiable model. This includes:
- Designing unified network architectures that transform raw inputs (e.g., multi-view or single-view images) into final 3D outputs without extensive hand-engineered steps.
- Leveraging self-supervised learning for pretraining large-scale 3D foundation models on vast, unannotated datasets.
- Developing real-time inference techniques and efficient deployment methods for resource-limited platforms.
Data Challenges: Progress in end-to-end 3D learning also depends on effectively collecting diverse data sources and incorporating large-scale pretraining. Key topics include:
- Methods for incorporating massive, unannotated sources (e.g., multi-view imagery from benchmarks or videos from YouTube) into robust pretraining pipelines.
- Strategies to leverage existing 2D and multimodal image-text datasets to mitigate the shortage of high-quality 3D data.
- Techniques for building automatic or semi-automatic tools that facilitate reliable 3D annotation for supervised finetuning.
Real-World Impact: Another key objective is to create end-to-end 3D learning systems that have transformative applications. Discussions will include:
- Case studies in autonomous driving, where fast and accurate 3D perception enhances safety and decision-making.
- Real-time 3D scene understanding for AR/VR, robotics, and digital twins, demonstrating the advantages of integrated pipelines over traditional methods.
- Scalable 3D modeling in scientific imaging and other fields requiring precise spatial analysis.

This workshop brings together researchers from computer vision, robotics, extended reality (XR), autonomous driving, scientific imaging, and related fields to foster interdisciplinary discussions on next-generation 3D systems. By spotlighting recent breakthroughs and identifying key challenges, we aim to inspire innovative research and practical applications across these domains.

News

[June 30, 2025] The submission portal will remain open until July 1, 2025, allowing authors to optionally upload supplementary materials (e.g., videos or technical appendices). Please note that supplementary materials are entirely optional, and reviewers are not required to evaluate them.
[June 29, 2025] The submission deadline has been extended to June 30, 2025, 23:59 GMT due to OpenReview downtime on June 28.
One Best Paper Award and one Best Demo Award will be announced during the event, each with a $1000 cash prize. Gifts will be given to onsite attendees.
All submissions must follow the ICCV 2025 LaTeX template format. Please refer to the official ICCV 2025 Author Guidelines for detailed formatting instructions.

Invited Speakers

Jerome Revaud

Naver Labs Europe

Large geometric models: DUSt3R, MAST3R

Ruoshi Liu

Columbia University

Large generative models for the physical world

Iro Armeni

Stanford University

Creating and updating 3D of evolving scenes

Rakesh Ranjan

Meta Reality Labs

Large-scale 3D models for the Metaverse

Hongyang Li

The University of Hong Kong

Large-scale models for autonomous driving

Jiakai Zhang

ShanghaiTech University

Large-scale models for Cryo-EM

Schedule

13:00 - 13:05	Opening Remarks
13:05 - 13:40	Keynote Speaker: Jerome Revaud
13:40 - 14:15	Keynote Speaker: Ruoshi Liu
14:15 - 14:50	Keynote Speaker: Iro Armeni
14:50 - 16:10	Poster Session
16:10 - 16:45	Keynote Speaker: Rakesh Ranjan
16:45 - 17:20	Keynote Speaker: Hongyuan Li
17:20 - 17:55	Keynote Speaker: Jiakai Zhang
17:55 - 18:00	Closing Remarks

Call for Papers

We invite both short (up to 4 pages) and long (up to 8 pages) paper submissions, excluding references and supplementary materials. Short papers may introduce original but unfinished research or serve as technical reports that present implementations using open source frameworks. Authors can opt for archival or non-archival submissions; non-archival submissions may be concurrently under review elsewhere if external policies permit.

All accepted papers will be presented as posters, with three selected for oral presentations. A single best paper will be chosen from among the long papers, accompanied by a cash prize from our sponsors.

Submission Portal: OpenReview
Submission Start Date: May 22, 2025, 00:00 GMT
Submission Deadline: ~~June 29, 2025, 23:59 GMT~~ July 1, 2025, 23:59 GMT (extended due to OpenReview downtime on June 28)
Reviewer Assignment Complete: July 2, 2025
Review Period: July 2 - ~~July 9~~ July 11, 2025
Discussion and Final Decisions: ~~July 10~~ July 12, 2025
Notification to Authors: ~~July 11~~ July 13, 2025
Camera-ready Deadline: August 18, 2025