M3 Machine Learning for Computer Vision

The goal of this module is to introduce the Machine learning techniques to solve computer vision problems. Machine learning deals with the automatic analysis of large scale data, which in these days, it is the basis to solve visual pattern recognition or classification, where ‘patterns’ encompasses for images of world objects, scenes and video sequences of human actions, to name a few.
This module presents the foundations and most important techniques for the classification of visual patterns, mainly focusing on supervised methods. Once the evaluation setup is adressed, the course focus on defining image descriptors, both: flat hand-crafted descriptors and deep learning-based descriptors. In parallel, the course introduces from classical machine learning methods to deep learning architectures with their tricks and tips: target function definition, learning strategies, inner inference understanding and behaviour.

Module Project: Scene Classification

The aim of the project is to practice the concepts explaine in the lectures to get insight into the details of using machine learning for image classification. It starts with a classical approach using the Bag of Words approach as a mechanism to represent hand-crafted features extracted from the images and using a discriminative model to classify real scenes. Nexts steps move to dig into the deep learning paradigm starting from moving from hand-crafted image descriptor to a set of learnt features using convolutional networks. Next step is to use pre-existing architectures to work in a end-to-end architecture. The goal is to understand the main procedures that allow to tune such a complex system. The final step to master deep learning architectures is to create an efficient architecture from scratch applying all the concepts learnt during the previous stages.

M3 Schedule – Academic Year 2021-2022 – Student Guide <here>