PROGRAM

OVERVIEW

ACCV Program Overview

Main Conference

2024 December 10-12

Workshops and Tutorials

2024 December 8-9 (co-located with ACML on December 8)  

At A Glance Schedule

 

December 10

   
8:30 9:00 Opening Session
9:00 10:45 Oral Session 1
10:45 11:15 Coffee Break
11:15 12:15 Invited Talk: Michal Irani
12:15 13:30 Lunch
13:30 15:15 Oral Session 2
15:15 15:45 Coffee Break
15:45 17:45 Poster Session 1
   
 

December 11

9:00 10:45 Oral Session 3
10:45 11:15 Coffee Break
11:15 12:15 Invited Talk: Gerard Medioni
12:15 13:30 Lunch
13:30 15:15 Oral Session 4
15:15 15:45 Coffee Break
15:45 17:45 Poster Session 2
   
 

December 12

9:00 – 10:45 Oral Session 5
10:45 – 11:15 Coffee Break
11:15 – 12:15 Invited Talk: Deepak Pathak
12:15 – 13:30 Lunch
13:30 – 14:50 Oral Session 6
14:50 – 15:20 Coffee Break
15:20 – 17:20 Poster Session 3

 

PDF LInk

KEYNOTE SPEAKERS

Keynote                    Tuesday, December 10th                   Time:  11:15 AM

Michal Irani, Professor, Department of Computer Science and Applied Mathematics at the Weizmann Institute of Science, Israel

Title:  Reading Minds & Machines 

Abstract:

  1. Can we reconstruct images that a person saw, directly from his/her fMRI brain recordings?
  2. Can we reconstruct the training data that a deep-network trained on, directly from the parameters of the network?

The answer to both of these intriguing questions is “Yes!”  

In this talk I will present some of our work in both of these domains. I will then show how combining the power of Brains & Machines can lead to significant breakthroughs in both domains, and potentially bridge the gap between Minds and Machines. Finally, I will show how combining the power of Multiple Brains (with NO shared data) may lead to new breakthrough discoveries in Brain-Science, and allow mapping of information between different brains.

Keynote                    Wednesday, December 11th                   Time:  11:15 AM

Gérard Medioni, Vice President and Distinguished Scientist, Prime Video & Studios A ten-year Amazon veteran,

Title: Prime Video: A Differentiated Viewing Experience.

Abstract: 

We present an overview of the technology components powering the Prime Video customer experience.

Going beyond title level information, we segment the video into shots and scenes, parse each scene to infer semantic content, 
and use it for a number of applications, such as content moderation, subtitles, dubbing, audio descriptions.

We also augment the original content with artwork and video clips, provide cast and music recognition in X-Ray, all of which feed into the recommendation presentation.

Finally, we present AI-powered innovative features in live broadcast of sports events.

Bio:

A ten-year Amazon veteran, Gérard is a member of the leadership team for Amazon Prime Video & Studios group which aims to be the global entertainment destination of choice for customers. Prior to joining Prime Video, Gérard was responsible for leading artificial intelligence (AI) and computer vision-based research efforts powering Amazon Just Walk Out technology, which provides checkout-free shopping experience for customers, and the Amazon One palm recognition service that combines cutting-edge biometrics, optical engineering, generative AI, and machine learning to deliver a new means of identification, entry, payment, and age-verification.

Gérard is the recipient of several prestigious awards recognizing his contributions to both academia and industry. In 2022, he was inducted as a Fellow by the National Academy of Inventors, and in 2023 was elected to the National Academy of Engineering. Gérard is also the recipient of the IEEE PAMI Mark Everingham Prize and the APSIPA Industrial Distinguished Leader award. He serves on the advisory board of the IEEE Transactions on PAMI journal, and the Image and Vision Computing journal. He is the Vice President of the Computer Vision Foundation, a Fellow of ACM, IAPR, IEEE, AAAI, and AAIA.

He is passionate about advancing the state of the art in computer vision, and using the technology in customer-facing applications. He is the author of four books, more than 90 journal papers and 280 conference articles, and the recipient of 121 patents. He is the editor, with Sven Dickinson, of the Computer Vision series of books for Springer. Gérard serves as co-chair of technical conferences (CVPR, ICPR, ACCV, WACV, ICPR).

Gérard holds the title of Professor Emeritus of Computer Science at USC, where he served as the Chair of the -2007. Prior to joining Amazon in 2014, Gérard consulted with numerous companies and startups. He received his genieur from ENST, Paris in 1977, and an M.S. and Ph.D. from the University of Southern California in 1980 and 1983, respectively.

Keynote                    Thursday, December 12th                   Time:  11:15 AM

Deepak Pathak, Raj Reddy Assistant Professor in the School of Computer Science at Carnegie Mellon University and CEO/Co-founder of Skild AI

Title:“Building Intelligence from the Ground Up: A New Path to Embodied AI”

Abstract:

Building general-purpose embodied intelligence has been a core goal of AI since its inception 70 years ago. The field has evolved through multiple hypotheses, beginning with search as a solution, followed by knowledge-based systems, and, most recently, leveraging Large Language Models (LLMs) for robotics. Yet, the challenge of creating generalist embodied agents capable of performing thousands of tasks across diverse environments remains unsolved. Building such a generalist robotic agent presents a “chicken-and-egg” problem: to train agents for generalization, we need vast amounts of robotic/agent data from varied environments, but gathering such data is impractical without deploying robots/agents that already generalize.

In this talk, I propose a new framework inspired by children’s learning, which leverages alternative sources of “indirect” supervision to build robotic agents from the ground up. I will focus on three key questions: (1) How can agents continually generate supervision for themselves (curiosity)? (2) How can they bootstrap learning by observing humans (social learning)? and (3) How can they adapt learned skills in real-time (adaptation)? I will demonstrate this framework’s potential for scaling robot learning through case studies, including robots discovering new tasks in complex kitchen environments, controlling dexterous robotic hands from monocular vision, enabling dynamic-legged robots to walk on challenging, unseen trails using vision, and performing a wide array of manipulation tasks in natural settings.

Bio:

Deepak Pathak is the Raj Reddy Assistant Professor in the School of Computer Science at Carnegie Mellon University and CEO/Co-founder of Skild AI which is developing an AI foundation model for robotics. He received his Ph.D. from UC Berkeley and his Bachelor’s from IIT Kanpur with a Gold Medal in Computer Science. His research spans computer vision, machine learning, and robotics with the goal of building general-purpose embodied intelligence. He is a recipient of MIT TR 35 under 35 Award, Okawa research award, IIT Kanpur Young Alumnus award, Best Paper Awards at ICRA’24, CoRL’22 and faculty awards from Google, Samsung, Sony and GoodAI. Deepak’s research has been featured in popular press outlets, including The Economist, The Wall Street Journal, Forbes, Quanta Magazine, Washington Post, CNET, Wired, and MIT Technology Review among others. Earlier, he received his Bachelor’s from IIT Kanpur with a Gold Medal in Computer Science. He co-founded VisageMap Inc. later acquired by FaceFirst Inc and co-founder of Skild AI.