End-to-End 3D Learning Workshop

Abstract

Overview: Large 2D vision and multimodal models have shown how to learn from both supervised and unannotated data and transfer across tasks. These lessons point to a clear path for 3D spatial tasks. Yet many 3D related systems still rely on long, brittle pipelines (e.g., COLMAP) that are hard to scale and slow to adapt. This workshop focuses on end-to-end 3D learning (E2E3D): a single trainable system that maps raw images or video to complete 3D representations, then supports downstream tasks that run in real time and scale with large datasets. Our goal is practical impact in robotics, extended reality, and scientific imaging. Topics include architectures that map from pixels to 3D without hand-tuned steps; cross-modal training; data engines that mine in-the-wild video at scale; tight integration with end-to-end planning and control; efficient deployment on edge devices and robots; and methods for scientific imaging, from protein structure to cellular microscopy. By unifying modeling, inference, and optimization in one data-driven approach, the E2E3D workshop aims to set a clear path for next-generation spatial intelligence systems.

Focus of the Workshop:

Modeling and learning. E2E3D studies unified architectures that map pixels to 3D with minimal postprocessing; pretraining that embeds geometric priors such as scale, viewpoint, and occlusion; world models and vision-language-action models that use spatial memory to handle spatial-temporal dynamics; and differentiable rendering and physics that provide gradients for shape, appearance, and motion.

Data and pretraining. E2E3D focuses on open scale data sources, including long video, multi-view, and multi-sensor logs for robust 3D pretraining; cross modal alignment that uses 2D and image-text corpora to ground language and action; auto annotation with quality control and reproducible protocols; and data governance covering licensing, privacy, and safety for 3D assets.

Systems, evaluation, and impact. E2E3D emphasizes real time and edge deployment on robots and mobile devices; holistic metrics that report accuracy, latency, memory, and energy; robustness and safety for open world generalization and failure analysis; and applications in autonomous driving, XR, industrial and scientific imaging, and mapping.

News

One Best Paper Award/Best Translational Paper Award and one Best Demo Award will be announced during the event, each with a $500 cash prize. Gifts will be given to onsite attendees.
All submissions must follow the CVPR 2026 LaTeX template format. Please refer to the official CVPR 2026 Author Guidelines for detailed formatting instructions.

Invited Speakers

Luca Carlone

MIT

SLAM, robotic perception

Jiajun Wu

Stanford University

Physical scene generation and understanding

Marco Pavone

Stanford / NVIDIA

End-to-End VLA

Paul Edouard Sarlin

Google

Geometric learning and mapping

Georgios Pavlakos

UT Austin

End-to-end view synthesis and 3D human

Schedule

June 3, 2026 · 13:00–18:00 · Room 501

13:00–13:05	Opening Remarks
13:05–13:50	Keynote: Luca Carlone
13:50–14:35	Keynote: Jiajun Wu
14:35–15:20	Keynote: Marco Pavone
15:20–15:35	Break
15:35–16:20	Keynote: Paul Edouard Sarlin
16:20–17:05	Keynote: Georgios Pavlakos
17:05–17:35	Poster Session
17:35–17:50	Awards
17:50–18:00	Closing Remarks

Call for Papers

We invite both short (up to 4 pages) and long (up to 8 pages) paper submissions, excluding references and supplementary materials. Short papers may introduce original but unfinished research or serve as technical reports that present implementations using open source frameworks.

All submissions are non-archival. Dual submission is allowed where outside venue rules permit. Accepted papers will be hosted on OpenReview and/or the workshop website.

All accepted papers will be presented as posters.

Submission Portal: OpenReview
Paper Submission Deadline: May 3, 2026, 11:59 PM AoE May 10, 2026, 11:59 PM AoE
Program Committee Call Deadline: May 3, 2026, 11:59 PM AoE May 10, 2026, 11:59 PM AoE
Notification to Authors: May 20, 2026, 11:59 PM AoE May 29, 2026, 11:59 PM AoE
Camera-ready Deadline (non-archival papers): May 27, 2026, 11:59 PM AoE June 3, 2026, 11:59 PM AoE
Workshop Date: June 3, 2026 (CVPR 2026, Denver)

Awards

Best Paper Award/Best Translational Paper Award – $500
Selected based on reviewer scores.
Best Demo Award – $500
Selected based on reviewer scores from the video demo review and the onsite poster/demo presentation.

Presentation Format

All accepted papers will be presented as posters.
Best Demo evaluation will take place during the poster/demo session.

Rising Star Award for Spatial Intelligence

As part of the E2E3D Workshop at CVPR 2026, we invite applications for the Rising Star Award for Spatial Intelligence, recognizing an early-career researcher with strong research achievements and a clear future vision for spatial intelligence. The program will select 1 final awardee and 4 finalists, with up to USD 30,000 in research gift funding to the awardee's institution.

Submission Deadline: May 24, 2026, 11:59 PM PST · Eligibility: Current PhD students and postdoctoral researchers.

Full Details & Application →