C6 Video Analysis

The goal of this module is to provide a comprehensive treatment of modern video analysis, progressing from foundational techniques to advanced end-to-end deep learning approaches. The module begins with core problems in low- and mid-level video processing, including video segmentation, motion estimation (optical flow), and multi-object tracking. The module also introduces sequence modeling techniques for video, including recurrent neural networks and Transformer-based architectures. These models provide the basis for capturing long-range temporal dependencies and enable general scalable spatio-temporal representation learning for the remaining part of the module.

The second part of the module focuses on higher-level video understanding tasks. Topics include action recognition, action detection and anticipation, anomaly detection, and video summarization. In addition, the module explores self-supervised and multimodal learning strategies that leverage large-scale video data without exhaustive annotation. Generative approaches for video synthesis are also covered, exposing students to emerging paradigms in video-based computer vision.

Module Projects: Road Traffic Monitoring & Recognition and Spotting of Soccer Actions (SoccerNet 2025 Challenge)

Throughout the module, the students complete two applied projects that consolidate the most central theoretical and practical components of the module. These projects involve implementing and evaluating video analysis systems for realistic scenarios, first dealing with fast multi-object tracking (Project 1 – Road Traffic Monitoring) and then the end-to-end learning in a real-world soccer scenario (Project 2 – Recognition and Spotting of Soccer Actions).

M6 Schedule – Academic Year 2025-2026 – Student Guide <here>