Academic Project Proposals


Note: Greyed Entries: already assigned projects.

Current year proposals
  3D fruit detection in LiDAR point clouds using graph or 3D neural networks
The detection and measurement of fruit size is of great interest to estimate the crop and predict harvest resources. Nowadays, different sensors, such as LiDAR, are able to register fruit trees into a 3D map of the environment. The main goal of this thesis will be to explore and design new deep learning architectures based on Graph or 3D Neural Networks to detect fruits.
Extended abstract: Download PDF
Academic Supervisor: Javier Ruiz Hidalgo
Supervisor e-mail: j.ruiz@upc.edu
Institution: UPC
co-Supervisor: Jordi Gene Mola
co-Supervisor e-mail: jordi.genemola@udl.cat
Assigned Student Name: Pablo Vega Gallego
Student e-mail: Pablo.VegaG@autonoma.cat
Confidential: No
Date: 2023-10-31 17:08:06
  Quantum Machine Learning (QML): Image Encoding and Generative Models
Quantum Machine Learning (QML) takes advantage of the Quantum Computing Paradigm and hardware. Excellent resources exist to implementation QML algorithms (Pennylane, Qiskit, Cirq, and other). This is a highly promissing field with pathways to improvement vs. classical machine learning. Research will focus on generative models with QML, addressing the image encoding in an real quantum computer.
Academic Supervisor: Fernando Vilarino
Supervisor e-mail: fernando@cvc.uab.es
Institution: UAB
co-Supervisor: Matias Bilkis
co-Supervisor e-mail: mbilkis@cvc.uab.es
Assigned Student Name: Miruna-Diana Jarda
Student e-mail: MirunaDiana.Jarda@autonoma.cat
Confidential: No
Date: 2023-11-03 10:54:21
  Quantum Machine Learning (QML): Image Encoding for Quantum Graph Neural Networks
Quantum Machine Learning (QML) takes advantage of the Quantum Computing Paradigm and hardware. Excellent resources exist to implementation QML algorithms (Pennylane, Qiskit, Cirq, and other). This Master Thesis will carrry research on the quantum representations of Graph Neural Networks (GNN) using QML approraches, addressing the image encoding in an real quantum computer.
Academic Supervisor: Fernando Vilarino
Supervisor e-mail: fernando@cvc.uab.es
Institution: UAB
co-Supervisor: Matias Bilkis
co-Supervisor e-mail: mbilkis@cvc.uab.es
Confidential: No
Date: 2023-11-03 11:14:46
  Extracting Document Structure from Text-Intensive Images using Large Multi-Modal Prertaining
The project utilizes multimodal models to convert text-rich and structured content images into structured code (e.g., Markup, Docx). This innovation aims to revolutionize the editing of scientific documents by accurately translating complex diagrams, tables, and equations from images into editable text. Leveraging pre-trained models such as LLaVa, CogVLM, or Fuyu-8B.
Extended abstract: Download PDF
Academic Supervisor: David Vazquez
Supervisor e-mail: David Vazquez
Institution: UAB
co-Supervisor: Sai Rajeswar
co-Supervisor e-mail: Sai Rajeswar
Confidential: No
Date: 2023-11-27 21:45:13
  Enhancing User-AI Interaction: A Study on Multimodal Models for Document-Based Conversations
This project explores the use of multimodal models in enhancing document-based dialogues. It focuses on how these models understand document context and generate coherent responses. This involves interpreting diverse document formats, extracting relevant information, and answering queries. The approach Leveraging pre-trained models such as LLaVa, CogVLM, or Fuyu-8B.
Extended abstract: Download PDF
Academic Supervisor: David Vazquez
Supervisor e-mail: aklaway@gmail.com
Institution: UAB
co-Supervisor: Sai Rajeswar
co-Supervisor e-mail: rajsai24@gmail.com
Confidential: No
Date: 2023-11-27 21:49:18
  Extracting Document Structure from Text-Intensive Images using Large Multi-Modal Prertaining
The project utilizes multimodal models to convert text-rich and structured content images into structured code (e.g., Markup, Docx). This innovation aims to revolutionize the editing of scientific documents by accurately translating complex diagrams, tables, and equations from images into editable text. Leveraging pre-trained models such as LLaVa, CogVLM, or Fuyu-8B.
Extended abstract: Download PDF
Academic Supervisor: David Vazquez
Supervisor e-mail: aklaway@gmail.com
Institution: UAB
co-Supervisor: Sai Rajeswar
co-Supervisor e-mail: rajsai24@gmail.com
Confidential: No
Date: 2023-11-27 21:51:07
  2D Image, 3D LiDAR & 4D radar fusion for autonomous driving
Fusion of range (radar, Lidar) and image data can help improving detection, localization and tracking of objects for autonomous driving. This project will explore early fusion strategies to increase resolution of very imaging radar 4D data, and Lidar 3D range data by registration and depth completion considering image contours, 3D localizations and speeds.
Extended abstract: Download PDF
Academic Supervisor: Josep R. Casas
Supervisor e-mail: josep.ramon.casas@upc.edu
Institution: UPC
co-Supervisor: Santiago Royo
co-Supervisor e-mail: santiago.royo@upc.edu
Assigned Student Name: Adrià Subirana Pérez
Student e-mail: Adria.Subirana@autonoma.cat
pre-Assigned Student Name: Adria Subirana
Confidential: No
Date: 2024-02-05 13:29:04
  Advancing End-to-End Autonomous Driving
End-to-end autonomous driving is a promising paradigm. However, despite the impressive results, different challenges remain to be addressed (memory, inductive bias, etc.). In this master dissertation, we will select one of those and work on it to improve the CIL++ end-to-end driving model.
Academic Supervisor: Antonio M. Lopez
Supervisor e-mail: antonio@cvc.uab.cat
Institution: UAB
co-Supervisor: Gabriel Villalonga Pineda
Assigned Student Name: Ainoa Contreras Rodríguez
Student e-mail: Ainoa.Contreras@autonoma.cat
pre-Assigned Student Name: Ainoa Contreras Rodriguez
Confidential: No
Date: 2024-02-06 10:39:58
  Real-time Analysis of LASER Marked Codes in Industrial Production Lines
LASER marking systems can have errors due to product movement or power loss. Real-time CV algorithms on CPUs, GPUs, or FPGAs are proposed for quality inspection. The project aims to detect the marking area and the position of printed codes. A dataset with 1600 images is provided and a synthetic dataset will be generated based on backgrounds extranted from acquired images.
Academic Supervisor: David Castells-Rufas
Supervisor e-mail: david.castells@uab.cat
Institution: UAB
co-Supervisor: Dimosthenis Karatzas
co-Supervisor e-mail: dimos@cvc.uab.es
Confidential: No
Date: 2024-02-15 11:43:08
  Visual Prompts for Guiding Image Restoration Tasks
In this project, we plan to derive visual prompts to guide and refine image restoration tasks. Rather than relying on textual input, our focus lies in generating a comprehensive set of features that encapsulate important image context. These features may include depth maps for spatial understanding, color naming as as human perception features, optical flow for motion, and other tailored cues.
Extended abstract: Download PDF
Academic Supervisor: Javier Vazquez-Corral
Supervisor e-mail: javier.vazquez@cvc.uab.cat
Institution: UAB
co-Supervisor: David Serrano-Lozano
co-Supervisor e-mail: dserrano@cvc.uab.cat
Assigned Student Name: Francisco Antonio Molina Bakhos
Student e-mail: FranciscoAntonio.Molina@autonoma.cat
Confidential: No
Date: 2024-02-22 11:54:43
  Self-supervised video decomposition for video editing
Video editing presents unique challenges compared to image editing, as it requires modeling frame-to-frame relationships and addressing temporal variations. Our goal is to learn an implicit neural representation model based on this video from a given input sequency that decomposes different layers -background and individual foreground objects- allowing the user to edit them independently.
Extended abstract: Download PDF
Academic Supervisor: Javier Vazquez-Corral
Supervisor e-mail: javier.vazquez@cvc.uab.cat
Institution: UAB
co-Supervisor: Danna Xue
co-Supervisor e-mail: dxue@cvc.uab.cat
Confidential: No
Date: 2024-02-22 12:05:04
  Extracting visual content from YouTube trading videos
Analyzing the visual content and graphics of videos is a crucial to understanding different styles and recommendations. The main objective is to extract and analyze key visual elements from a pre-selected sample of YouTube videos. These videos focus on providing investment/stock information and can vary in nature and content.
Extended abstract: Download PDF
Academic Supervisor: Ernest Valveny
Supervisor e-mail: ernest@cvc.uab.es
Institution: UAB
Assigned Student Name: Cristina Aguilera Gonzalez
Student e-mail: Cristina.AguileraG@autonoma.cat
Confidential: No
Date: 2024-03-04 23:32:08
  Multimodal information extraction for automatic inventory of books
This project aims to identify and recognize the books contained in the images of personal bookshelves as a tool to help in creating a socio-cultural profile of people. The project will rely on multimodal integration of textual and visual information of books using recent methods for image segmentation, text recognition and multimodal learning.
Extended abstract: Download PDF
Academic Supervisor: Ernest Valveny
Supervisor e-mail: ernest@cvc.uab.es
Institution: UAB
Assigned Student Name: Beatrice Anamaria Peptenaru
Student e-mail: BeatriceAnamaria.Peptenaru@autonoma.cat
Confidential: No
Date: 2024-03-05 00:00:24
  Recognition of handwritten documents
The recognition of images of handwritten documents, especially if they use rare alphabets, is difficult. One example is historical ciphers (e.g. secret messages in religious or diplomatic correspondence). This thesis work will be focused on designing a transcription model and validate its applicability on real ciphered document images. More info: https://de-crypt.org/
Academic Supervisor: Alicia Fornes
Supervisor e-mail: afornes@cvc.uab.es
Institution: UAB
co-Supervisor: Xavier Otazu
co-Supervisor e-mail: xotazu@cvc.uab.cat
Assigned Student Name: Goio García Moro
Student e-mail: Goio.Garcia@autonoma.cat
Confidential: No
Date: 2024-03-05 17:38:09
  Can We Read a Book without Opening it? A New Perspective towards Multi-Page DocumentVQA
Document Visual Question Answering (DocumentVQA) refers to the task of answering questions from document images. Existing work on DocumentVQA only considers single-page documents. However, in real scenarios documents are mostly composed of multiple pages that should be processed altogether. This proposal will focus on a new perspective towards Multi-Page DocumentVQA task. Please refer to the PDF.
Extended abstract: Download PDF
Academic Supervisor: Lei Kang
Supervisor e-mail: lkang@cvc.uab.cat
Institution: UAB
co-Supervisor: Dimosthenis Karatzas
co-Supervisor e-mail: dimos@cvc.uab.cat
Assigned Student Name: Iñaki Lacunza Castilla
Student e-mail: Inaki.Lacunza@autonoma.cat
Confidential: No
Date: 2024-03-07 10:35:44
  Reading Historical Maps: End-to-end Map Text Detection, Recognition, and Label Grouping
Localizing and recognizing text on historical maps presents several difficulties: widely spaced and overlapping words, complex text-like backgrounds, and strongly rotated or curved text. This project continues to build on robust reading models with additional features tailored to the map reading task and will develop a jointly-trained neural method for grouping words into label phrases.
Extended abstract: Download PDF
Academic Supervisor: Dimosthenis Karatzas
Supervisor e-mail: dimos@cvc.uab.es
Institution: UAB
co-Supervisor: Jerod Weinman
co-Supervisor e-mail: jerod@acm.org
Confidential: No
Date: 2024-03-07 16:44:50
  Deep Learning analysis of coronary arteries
Coronary artery disease may lead to a heart stroke. Intra-operative imaging can assist physicians in identifying the optimal treatment. The objective of this project is to analyze intravascular images using deep learning techniques (classification and segmentation) to provide a quantitative diagnosis of the patient's status.
Academic Supervisor: Simon Balocco
Supervisor e-mail: simone.balocco@ub.edu
Institution: UB
Assigned Student Name: Georg Simn Herodes
Student e-mail: GeorgSimn.Herodes@autonoma.cat
Confidential: No
Date: 2024-03-10 21:16:32
  Explainable Document Visual Question Answering
Document visual question answering (DocVQA) is an important tool to perform high-level reasoning and interpret document images. Despite the advancements in DocVQA models, the absence of explanations for the provided answers remains a notable limitation. This project endeavors to bridge this gap by developing a novel DocVQA model that not only generates accurate responses to questions but also off
Extended abstract: Download PDF
Academic Supervisor: Dimosthenis Karatzas
Supervisor e-mail: dimos@cvc.uab.es
Institution: UAB
co-Supervisor: Mohamed Ali Souibgui
co-Supervisor e-mail: msouibgui@cvc.uab.es
Confidential: No
Date: 2024-03-15 14:55:01
  Segmentation and classification of prostate MRI for prostate cancer diagnosis
The aim of this project is to develop a model for segmenting MRI images into relevant regions and predicting clinically significant Prostate Cancer from MRI. It will be done in collaboration with a project on MRI synthesis to compare segmentation and prediction results when using augmented databases. This project is financed with a collaboration grant.
Extended abstract: Download PDF
Academic Supervisor: Veronica Vilaplana
Supervisor e-mail: veronica.vilaplana@upc.edu
Institution: UPC
co-Supervisor: Montse Pardas
co-Supervisor e-mail: montse.pardas@upc.edu
Assigned Student Name: Iker Garcia Fernandez
Student e-mail: Iker.GarciaF@autonoma.cat
Confidential: No
Date: 2024-03-16 10:20:14
  Synthesis of prostate MRI with generative adversarial networks
The project aims to synthesize realistic MRI images using StyleGAN, exploring latent space manipulation to control image features. It seeks to improve Prostate MRI synthesis by adjusting latent vectors for desired properties and combining features from different MRIs. This project is financed with a collaboration grant.
Extended abstract: Download PDF
Academic Supervisor: Montse Pardas
Supervisor e-mail: montse.pardas@upc.edu
Institution: UPC
co-Supervisor: Veronica Vilaplana
co-Supervisor e-mail: veronica.vilaplana@upc.edu
Assigned Student Name: Sigrid Vila Bagaria
Student e-mail: Sigrid.Vila@autonoma.cat
Confidential: No
Date: 2024-03-16 10:28:37
  Self-supevised multi-modal 2D and 3D image registration
In multi-modal image registration the goal is to align images corresponding to different modalities. Examples could be an RGB image with a thermal image, or a computer tomography scan with a magnetic resonance image. The goal in this thesis is to extend a recent method for its application in 2D and 3D medical imaging. The method is based on neural networks trained in a self-supervised way.
Extended abstract: Download PDF
Academic Supervisor: Pablo Arias Martinez
Supervisor e-mail: pablo.arias@upf.edu
Institution: UPF
co-Supervisor: Gemma Piella Fenoy
co-Supervisor e-mail: gemma.piella@upf.edu
Confidential: No
Date: 2024-03-21 16:16:21
  ZoomVQA - Fixation guided attention models for VQA
Through this project we will explore mechanisms for sequentially attending on the visual input, taking inspiration from the human attention allocation mechanisms (scanpaths and sequences of fixations). We will explore to what extent this can reduce memory requirements, and apply the resulting model to a VQA task.
Academic Supervisor: Dimosthenis Karatzas
Supervisor e-mail: dimos@cvc.uab.es
Institution: UAB
co-Supervisor: Lei Kang
co-Supervisor e-mail: lkang@cvc.uab.es
Assigned Student Name: Jordi Morales Casas
Student e-mail: Jordi.MoralesC@autonoma.cat
pre-Assigned Student Name: Jordi Morales
Confidential: No
Date: 2024-03-22 11:08:56
  Learning Without Limits: Online Continual Learning for Never-Ending Data Streams
Online continual learning tackles the challenge of training a deep learning model on a sequential data stream. This is a crucial setting in real-world scenarios (robotics, autonomous driving) where new data arrives continuously and new patterns or categories emerge over time This approach unlocks the potential for training models that can permanently evolve and adapt to a changing environment.
Academic Supervisor: Bogdan Raducanu
Supervisor e-mail: bogdan@cvc.uab.es
Institution: UAB
co-Supervisor: Sandesh Kamath
co-Supervisor e-mail: skamath@cvc.uab.es
Confidential: No
Date: 2024-04-04 16:43:50
  Enhancing Generative Model Reliability with Self-Conditioning Techniques
Foundation models excel in generative tasks but struggle with bias and off-task behavior. This project will work on self-conditionings method for pre-trained models, thus enabling concept-specific generation and bias mitigation without retraining or extra parameters, while enhancing reliability and control.
Academic Supervisor: Jordi Gonzalez
Supervisor e-mail: poal@cvc.uab.cat
Institution: UAB
co-Supervisor: Pau Rodriguez Lopez, Apple
Assigned Student Name: Pau Vallespí
Student e-mail: Pau.Vallespi2@autonoma.cat
pre-Assigned Student Name: Pau Vallespi
Confidential: No
Date: 2024-04-04 19:37:05
  deeP lEarning AlgoRithmS in the diagnosis of adenomas and early cOlorectal caNcer (PEARSON)
The goal of this work is to extract visual information from histopathological images and combine it with clinical risk factors in a multimodal approach to find a new biomarker with which to stratify the risk of lymph node involvement at the time of the endoscopic resection to decide the best treatment in colorectal cancer in the first stages.
Extended abstract: Download PDF
Academic Supervisor: Debora Gil
Supervisor e-mail: debora@cvc.uab.es
Institution: UAB
Confidential: No
Date: 2024-04-09 11:56:47
  Fruit Tree management based on Radiance Fields
In this project we propose to explore the use of radiance fields for assisting in tasks related with fruit tree management: branch pruning assistance and fruit counting. Some computer systems have been proposed for these tasks, based on analysing video sequences. In this work we propose to first estimate a radiance field from a video sequence of a fruit tree, and then use it to solve the pruning and fruit counting tasks with a performance higher than processing the video sequence.
Extended abstract: Download PDF
Academic Supervisor: Daniel Ponsa
Supervisor e-mail: Daniel.Ponsa@uab.cat
Institution: UAB
Assigned Student Name: Gunjan Paul Paul
Student e-mail: Gunjan.Paul@autonoma.cat
pre-Assigned Student Name: Gunjan Paul
Confidential: No
Date: 2024-04-12 10:29:34
  Enhancing Emotional Expressiveness in Facial Animation from Speech
The proposed research aims to improve facial animation techniques by enhancing the emotional expressiveness of digital avatars generated only from speech. Utilizing the Facial Landmark Animation and Modeling with Emotions (FLAME) as the foundational model, this project seeks to integrate advanced deep learning (DL) methods to disentangle pose and expression information, thereby achieving more nuanced and realistic animations.
Extended abstract: Download PDF
Academic Supervisor: Jordi Sanchez Riera
Supervisor e-mail: jsanchez@iri.upc.edu
Institution: UPC
Assigned Student Name: Luis González Gudiño
Student e-mail: Luis.GonzalezGu@autonoma.cat
pre-Assigned Student Name: Luis Gonzalez Gudino
Company: Institut de Robotica i Informatica Industrial
Confidential: No
Date: 2024-04-12 12:31:09
  Emotional Reasoning through Multi-modal alignment of Vision Transformers and LLMs
Given the vast taxonomies of emotions and their subjective nature, there is a strong requirement for a system capable of zero-shot learning through the creation of an emotional embedding space that would enable the identification and analysis of sentiments. This proposal aims to create a multi-modal alignment of images and text with the down-stream task of emotional classification and reasoning through state-of-the-art Vision Transformers and Large Language Models.
Academic Supervisor: Agata Lapedriza
Supervisor e-mail: alapedriza@uoc.edu
Institution: UOC
co-Supervisor: Cristina Bustos
co-Supervisor e-mail: mbustosro@uoc.edu
Assigned Student Name: Cristian Gutiérrez Gómez
Student e-mail: Cristian.GutierrezG@autonoma.cat
pre-Assigned Student Name: Cristian Gutierrez
Confidential: No
Date: 2024-04-17 18:22:56
  Wheres Waldo?. Link Discovery through visual Graph Construction from historical census documents
Nowadays link discovery is one of the topic tasks in graph base representations. Generally link discovery came from existing graphs. This proposal will focus on the creation of this graph via prompting visual segmentation to track individuals in time and space across different censuses for the county of Baix Llobregat (Barcelona) along the 19th - 20th centuries.
Academic Supervisor: Oriol Ramos Terrades
Supervisor e-mail: oriolrt@cvc.uab.cat
Institution: UAB
Assigned Student Name: Carlos Boned Riera
Student e-mail: Carlos.Boned@uab.cat
pre-Assigned Student Name: Carlos Boned Riera
Confidential: No
Date: 2024-04-19 12:48:14
  A study of automatic emotion recognition in children aged 3 to 5 years
In this project we will explore Computer Vision techniques for children's facial analysis. The objective of this Master Thesis proposal is to assess whether the combinations of Action Units that previous literature has found to be associated with certain emotional expressions are correlated with three- to five-year-old children’s expression of those emotions perceived by the observer.
Academic Supervisor: Agata Lapedriza
Supervisor e-mail: alapedriza@uoc.edu
Institution: UOC
co-Supervisor: Lucrezia Cresceni
co-Supervisor e-mail: lcrescenzi@uoc.edu
Assigned Student Name: Diana Tat
Student e-mail: Diana.Tat@autonoma.cat
pre-Assigned Student Name: Diana Tat
Confidential: No
Date: 2024-04-29 23:15:54
  Enhancing Foul Detection in Soccer Matches Using Multi-View Video Analysis
In soccer, fair play depends on accurate foul detection. Our project introduces an AI system that automatically identifies fouls in match footage. By improving upon the existing baseline from SoccerNet-MVFoul, we enhance referee decision-making through multi-view analysis, aiming for accurate foul recognition and assessment of severity.
Academic Supervisor: Antonio Agudo
Supervisor e-mail: aagudo@iri.upc.edu
Institution: UPF
co-Supervisor: Marc Gutierrez
Assigned Student Name: Marc Pérez Sabater
Student e-mail: Marc.PerezSa@autonoma.cat
pre-Assigned Student Name: Marc Perez Sabater
Confidential: No
Date: 2024-05-24 09:25:09
  Cardiac segmentation with foundational models
Many foundation models for medical image analysis, have been released and proved to be useful in multiple tasks. However, their effectiveness on real world medical imaging data has not been explored. A medical image challenge is being organised as part of the MICCAI 2024 conference, called CARE (http://www.zmic.org.cn/care_2024/), where a particular track is centred on foundational models for the heart. This project will focus on developing a pipeline to participate in such track.
Academic Supervisor: Oscar Camara
Supervisor e-mail: oscar.camara@upf.edu
Institution: UPF
pre-Assigned Student Name: Angel Herrero
Confidential: No
Date: 2024-05-30 15:47:40
  Enhancing Fine-Grained Action Recognition in Gymnastics through 3D Pose Estimation
This thesis explores enhancing fine-grained action recognition in gymnastics by integrating 3D pose estimation with the FineGYM dataset. Given the semantic similarities in gymnastic movements, we utilize FineGYM's detailed class-specific questions and 3D pose data to improve classification accuracy.
Academic Supervisor: Antonio Agudo
Supervisor e-mail: aagudo@iri.upc.edu
Institution: UPF
Assigned Student Name: Anna Domènech Olivé
Student e-mail: Anna.DomenechO@autonoma.cat
pre-Assigned Student Name: Anna Domenech
Confidential: No
Date: 2024-05-31 16:44:20
  ISP optimization for driving
This thesis aims to explore and implement an automated black-box ISP tuning optimization process using the CMA-ES algorithm. The proposed method is validated using a pre-trained 2D multiclass object detection algorithm with the objective to improve the mAP and mAR. As there is a very little work done in this area, there are no standard RAW image datasets with reliable ground truth available. For that reason, a set of 100 RAW images will be annotated in this thesis.
Academic Supervisor: Javier Vazquez Corral
Supervisor e-mail: jvazquez@cvc.uab.cat
Institution: UAB
Confidential: No
Date: 2024-06-14 16:36:08

  2022-2023 proposals
  Diffusion Models for Replay in Continual Learning
Continual learning is a significant challenge in machine learning, involving learning from a continuous stream of data while retaining past knowledge. Replay-based methods have shown promise in mitigating catastrophic forgetting but suffer from computational costs and limited sample diversity. This project aims to overcome these limitations by leveraging diffusion models for continual learning.
Academic Supervisor: Ernest Valveny
Supervisor e-mail: ernest.valveny@cvc.uab.cat
Institution: UAB
co-Supervisor: Pau Rodriguez
co-Supervisor e-mail: pau.rodri1@gmail.com
Assigned Student Name: Sergi Masip Cabeza
Student e-mail: Sergi.Masip@autonoma.cat
pre-Assigned Student Name: Sergi Masip
Confidential: No
Date: 2023-05-30 16:22:43
  Visual vs Textual Features: Towards Instance Document Layout Segmentation.
Document layout segmentation (DLS) is the task of identifying the different layout elements such as text, images, tables, and graphs, in a document image. One of the critical decisions in this task is the choice of features used to represent the layout elements. In this thesis, we compare the performance of visual with textual features and multi-modal (visual + textual) features in DLS.
Extended abstract: Download PDF
Academic Supervisor: Josep Llados
Supervisor e-mail: josep@cvc.uab.cat
Institution: UAB
Assigned Student Name: Banerjee, Ayan
Student e-mail: Ayan.Banerjee@autonoma.cat
pre-Assigned Student Name: Ayan Banerjee
Confidential: No
Date: 2023-05-26 12:01:59
  Self-supervised learning of multimodal representations in food recipes
This project aims at developing a self-supervised deep learning approach to learn video representations of food recipes jointly from either video frames and textual descriptions or from video frames and their corresponding audio signal. The learned representation will be evaluated on the downstream tasks of action prediction and action localization on different datasets related to cooking.
Academic Supervisor: Gloria Haro
Supervisor e-mail: gloria.haro@upf.edu
Institution: UPF
co-Supervisor: Coloma Ballester
co-Supervisor e-mail: Mariella Dimiccoli
pre-Assigned Student Name: Igor Ugarte
Confidential: No
Date: 2023-05-19 19:58:08
  Adaptive Control and Task Generalization in Delta Robot Manipulation
This research develops an AI-driven Delta robot system for manipulation tasks. It includes a physical robot, simulator, control algorithms, and an AI method utilizing LLM. The LLM decomposes user requests into atomic tasks executed with traditional control. Close look control is achieved through a camera. The method generalizes to new tasks and objects, enhancing the robot's capabilities.
Academic Supervisor: David Vazquez Bermudez
Supervisor e-mail: david.vazquez@servicenow.com
Institution: UAB
co-Supervisor: Michal Drozdzal
Assigned Student Name: Jia Qiang Ye Zhu
Student e-mail: JiaQiang.Ye@autonoma.cat
pre-Assigned Student Name: Jiaqiang Ye Zhu
Confidential: No
Date: 2023-05-17 18:08:45
  Image retrieval with text modifiers
In this project, we will study the task of image retrieval, where the input query is specified in the form of an image plus some text that describes desired modifications to the input image.
Academic Supervisor: Lluis Gomez
Supervisor e-mail: lgomez@cvc.uab.cat
Institution: UAB
Assigned Student Name: Razvan-Florin Apatean
Student e-mail: RazvanFlorin.Apatean@autonoma.cat
Confidential: No
Date: 2023-05-16 13:31:49
  Prior-based implicit reconstruction of human avatars
In this project we aim at exploring the use of parametric priors to boost the performance of implicit representation in the task of building human avatars
Academic Supervisor: Francesc Moreno-Noguer
Supervisor e-mail: fmoreno@iri.upc.edu
Institution: UPC
Assigned Student Name: Alvaro Francesc Budria Fernández
Student e-mail: AlvaroFrancesc.Budria@autonoma.cat
pre-Assigned Student Name: Alvaro Francesc Budria
Confidential: No
Date: 2023-04-24 10:35:40
  Large Language Models for Document Visual Question Answering
Document visual question answering is an important tool to perform high-level reasoning and interpret document images. Nowadays, Large language models are becoming popular in question answering tasks. In this project, we aim to incorporate the large language models in a machine learning model that answers user questions and queries about a document image in a multi-modal fashion.
Extended abstract: Download PDF
Academic Supervisor: Dimosthenis Karatzas
Supervisor e-mail: dimos@cvc.uab.es
Institution: UAB
co-Supervisor: Mohamed Ali Souibgui
co-Supervisor e-mail: msouibgui@cvc.uab.es
Assigned Student Name: Anna Oliveras Tous
Student e-mail: Anna.OliverasT@autonoma.cat
Confidential: No
Date: 2023-04-17 18:23:42
  Exploring Rejection Strategies for Zero-Shot Image Classification
This project aims to explore state-of-the-art rejection strategies in the zero-shot setting for popular models such as CLIP, which have shown impressive performance in zero-shot image classification. By examining the effectiveness of various rejection strategies, we hope to improve the robustness and accuracy of these models.
Extended abstract: Download PDF
Academic Supervisor: Lluis Gomez
Supervisor e-mail: lgomez@cvc.uab.cat
Institution: UAB
Assigned Student Name: Hicham El Muhandiz Aarab
Student e-mail: Hicham.ElMuhandiz@autonoma.cat
Confidential: No
Date: 2023-04-17 15:44:57
  Automated Building Damage Assessment using Satellite Imagery
The project aims to automate the process of assessing building damage after a natural disaster using state-of-the-art computer vision algorithms. The xView2 Challenge dataset, consisting of high-resolution satellite imagery, will be used to develop and compare models for building damage assessment.
Extended abstract: Download PDF
Academic Supervisor: Lluis Gomez
Supervisor e-mail: lgomez@cvc.uab.cat
Institution: UAB
co-Supervisor: Ali Furkan Biten
co-Supervisor e-mail: abiten@cvc.uab.es
Confidential: No
Date: 2023-04-17 15:26:14
  Automated Animal Detection and Classification in Camera Traps using Computer Vision
Automate the categorization of species in camera trap data using computer vision. This project will provide a comparative study for researchers to investigate state-of-the-art models' ability to generalize to unseen locations, lighting conditions, and occlusions.
Extended abstract: Download PDF
Academic Supervisor: Lluis Gomez
Supervisor e-mail: lgomez@cvc.uab.cat
Institution: UAB
co-Supervisor: Ali Furkan Biten
co-Supervisor e-mail: abiten@cvc.uab.es
Confidential: No
Date: 2023-04-17 14:38:03
  Improving Flood Detection on SAR images using State-of-the-Art Computer Vision Algorithms
This project consists of applying state-of-the-art computer vision algorithms for flood detection, comparing their performance, and proposing novel ideas to improve them. The project is based on the NASA Interagency Implementation and Advanced Concepts Team's flood event detection contest, which involves using supervised learning to identify flood pixels in Synthetic Aperture Radar (SAR) images.
Extended abstract: Download PDF
Academic Supervisor: Lluis Gomez
Supervisor e-mail: lgomez@cvc.uab.cat
Institution: UAB
co-Supervisor: Ali Furkan Biten
co-Supervisor e-mail: abiten@cvc.uab.es
Confidential: No
Date: 2023-04-17 14:22:56
  Exploring pretraining tasks for multimodal methods in DocVQA
This project consist of explore different pretraining tasks on already existing models for DocVQA. It's defined as two different milestones. First, the student will select an already existing model that has been pretrained with textual tasks and implement visual pretraining tasks. Then, he/she will extend the analysis of the results with extra different pretraining tasks on both modalities.
Extended abstract: Download PDF
Academic Supervisor: Dimosthenis Karatzas
Supervisor e-mail: dimos@cvc.uab.cat
Institution: UAB
co-Supervisor: Ruben P. Tito
co-Supervisor e-mail: rperez@cvc.uab.cat
Confidential: No
Date: 2023-04-13 20:31:20
  Smoke Evolution Measurement: Estimation Of 3D Shape And Volume Of Fire Plumes From Multiple Views
This project aims to measure wildfire plume dimensions and geometry in 3D space and time by means of computer vision techniques. In comparison with fuel and fire monitoring, smoke remote sensing is significantly less developed, mainly due to the highly dynamic nature of smoke and its very variable optical properties. Significant impact on plume models testing are expected for fire management ops.
Extended abstract: Download PDF
Academic Supervisor: Josep R. Casas
Supervisor e-mail: josep.ramon.casas@upc.edu
Institution: UPC
co-Supervisor: Montse Pardas
co-Supervisor e-mail: montse.pardas@upc.edu
Assigned Student Name: Júlia Ariadna Blanco Arnaus
Student e-mail: JuliaAriadna.Blanco@autonoma.cat
pre-Assigned Student Name: Julia Ariadna Blanco i Arnaus
Confidential: No
Date: 2023-04-12 10:25:40
  Apples and Oranges: Topology Alignment for OCR-Free Topic Modeling in the Visual Domain
In the context of document understanding, many advances have been done in terms of information retrieval. So far, many of the topic model approaches in document analysis rely on extracting the OCR in order to perform the computation in the textual domain. In this work, we propose a mechanism to perform this retrieval directly in the visual domain.
Academic Supervisor: Josep Llados
Supervisor e-mail: josep@cvc.uab.cat
Institution: UAB
co-Supervisor: Oriol Ramos
co-Supervisor e-mail: oriolrt@cvc.uab.cat
Assigned Student Name: Adrià Molina Rodríguez
Student e-mail: Adria.Molina@uab.cat
pre-Assigned Student Name: Adria Molina Rodriguez
Confidential: No
Date: 2023-03-16 14:58:59
  A study of automatic emotion recognition in children aged 3 to 5 years.
In this project we will explore Computer Vision techniques for children's facial analysis. The objective of this Master Thesis proposal is to assess whether the combinations of Action Units that previous literature has found to be associated with certain emotional expressions are correlated with three- to five-year-old children’s expression of those emotions perceived by the observer.
Academic Supervisor: Agata Lapedriza
Supervisor e-mail: alapedriza@uoc.edu
Institution: UOC
co-Supervisor: Lucrezia Crescenzi
co-Supervisor e-mail: lcrescenzi@uoc.edu
Confidential: No
Date: 2023-03-15 22:26:07
  3D fruit detection in LiDAR point clouds using graph or 3D neural networks
The detection of fruits is of great interest to predict the harvest resources in advance. The main goal of this project will be to explore and design new deep learning architectures based on Graph or 3D Neural Networks to be able to detect fruits in the 3D representation. There is a possibility to parcially fund this work (see attached PDF).
Extended abstract: Download PDF
Academic Supervisor: Javier Ruiz Hidalgo
Supervisor e-mail: j.ruiz@upc.edu
Institution: UPC
co-Supervisor: Jordi Gene Mola
co-Supervisor e-mail: jordi.genemola@udl.cat
Assigned Student Name: Berkay Arpaci
Student e-mail: Berkay.Arpaci@autonoma.cat
Confidential: No
Date: 2023-03-15 13:48:06
  Neural 3D detail-aware shape from RGB images under general lighting
We aim to propose a new neural method to capture 3D objects along with illumination properties from pictures. To this end, we explore a universal approach that can work in an uncalibrated, unified and unsupervised manner, fully interpretable, and without assuming any prior knowledge of the shape geometry to constrain the solution and under general lighting.
Academic Supervisor: Antonio Agudo
Supervisor e-mail: aagudo@iri.upc.edu
Institution: UPF
Confidential: No
Date: 2023-03-15 09:53:07
  Low-level image features’ contribution to aesthetic valuation.
Humans perceive and rate real world objects and images with ease. However, this task is very challenging for computers, mostly because of the large semantic content of current training datasets. We propose training a model on semantically-deprived images to understand the contribution to aesthetics of low-level features such as colourfulness, symmetry, image technical quality, etc.
Academic Supervisor: C. Alejandro Parraga
Supervisor e-mail: Alejandro.Parraga@cvc.uab.cat
Institution: UAB
Assigned Student Name: Marcos Muñoz González
Student e-mail: Marcos.MunozG@autonoma.cat
pre-Assigned Student Name: Marcos Munoz Gonzalez
Confidential: No
Date: 2023-03-09 15:14:03
  Large scale crinoideus counting from ROV videos
The exhaustive monitoring of marine species is crucial to estimate the effects of protection measures and policies. This TFM proposes to reduce manual annotation needs and to specifically provide DL tools to estimate the populations of crinoideus from large DBs recorded using ROV in the deep sea. We will explore recent CNN architectures and attention mechanisms to provide reliable counting.
Extended abstract: Download PDF
Academic Supervisor: David Masip Rodo
Supervisor e-mail: dmasipr@uoc.edu
Institution: UOC
Confidential: No
Date: 2023-02-24 11:57:23
  Weakly Supervised Learning Segmentation applied to Wind Turbines Images: Loss exploration
The goal of the project is developing an image segmentation algorithm on wind turbine blade imagery, obtained during drone inspections. The project would pursue improving current networks by designing customized loss functions that include our prior knowledge of blade images.
Academic Supervisor: Antonio Agudo
Supervisor e-mail: Raül Perez
Institution: UPF
Confidential: No
Date: 2023-02-20 22:14:43
  Exploring continual learning capabilities of large language models
Large language models (LLM) have shown incredible performance in language-related tasks. Nevertheless, the potential of LLMs can be utilized for other endeavors as well. For instance, it has been shown that they can store and simulate other neural networks inside their hidden layers. In this project you will explore the capabilities of frozen LLM (such as GTP-3) for continual learning of images.
Academic Supervisor: Alex Gomez-Villa
Supervisor e-mail: agomezvi@cvc.uab.cat
Institution: UAB
co-Supervisor: Joost Van De Weijer
co-Supervisor e-mail: joost@cvc.uab.es
Confidential: No
Date: 2023-02-13 18:13:40
  Autonomous Driving powered by Deep Learning
CVC performs worldwide pioneer research on different topics at the intersection of deep learning (DL), simulation, and autonomous driving (AD). Known works are CARLA simulator (carlar.org) and AD by imitation learning (real-world AD example in www.youtube.com/watch?v=pzmQ-TmaGi0 . This proposal offers a TFM on DL for AD, where the specific topic will be decided with the selected student.
Academic Supervisor: Antonio M. Lopez
Supervisor e-mail: antonio@cvc.uab.es
Institution: UAB
Assigned Student Name: Abel García Romera
Student e-mail: Abel.GarciaR@autonoma.cat
pre-Assigned Student Name: Abel Garcia
Confidential: No
Date: 2023-02-11 14:01:18
  Golf Swing analysis
Currently, the tools used to analyze the quality of the golf swing are based on very expensive multi-camera systems, available only on a few very large golf clubs. In this project, we want to study the feasibility of using Deep Learning models to perform this analysis on videos captured with a single low-cost camera, enabling its use as a tool for coordination and motor learning at schools.
Academic Supervisor: Xavier Baro
Supervisor e-mail: xbaro@uoc.edu
Institution: UOC
Confidential: No
Date: 2023-02-08 16:44:55
  Transcription and Decryption of handwritten ciphered document images
Contrary to text documents, there are few methods for recognizing manuscripts with uncommon alphabets, like ciphers documents (secret messages in diplomatic letters, secret societies...). This work will focus on designing a transcription and decryption model based on Deep Learning architectures, and validate its applicability on real cipher images. More info: https://de-crypt.org/
Academic Supervisor: Alicia Fornes
Supervisor e-mail: afornes@cvc.uab.es
Institution: UAB
co-Supervisor: Mohamed Ali Souibgui
co-Supervisor e-mail: msouibgui@cvc.uab.cat
Confidential: No
Date: 2023-02-08 16:29:40
  Automatic analysis of sport events in video sequences
Develop a robust deep learning architecture to track different players in a sport event, in particular, basketball. The main goal of this project is to study improve occlusions/disappearance situations using global information (such as the number on the back or the face information). There is a possibility to finance the TFM.
Academic Supervisor: Javier Ruiz Hidalgo
Supervisor e-mail: j.ruiz@upc.edu
Institution: UPC
co-Supervisor: Josep Ramon Morros
co-Supervisor e-mail: ramon.morros@upc.edu
Confidential: No
Date: 2023-02-02 18:31:59
  Biologically inspired algorithms for energy efficient machine learning
The astonishing improvements of ML since the emergence of deep learning (DL) rest on enormous computational resources, with unsustainable environmental cost. As the brain is an extremely efficient learning machine, using bio-inspired algorithms is a promising avenue for tackling these issues. This project explores this idea by comparing the energy use of bio-inspired and standard ML algorithms.
Extended abstract: Download PDF
Academic Supervisor: Olivier Penacchio
Supervisor e-mail: oliver.penacchio@uab.cat
Institution: UAB
co-Supervisor: Xavier Otazu
co-Supervisor e-mail: xotazu@cvc.uab.cat
Confidential: No
Date: 2023-02-02 11:46:13
  Bio-inspired networks for AI and ML
A long-standing aim for AI and ML is to mimic the behavior of the human brain. Hence, an increasing interest in implementing biologically plausible mechanisms from neuroscience to improve state-of-the-art neural networks (NN). In this project, spiking NN will be used to classify small datasets (e.g. MNIST) using biologically plausible learning mechanisms such as spike timing-dependent plasticity.
Extended abstract: Download PDF
Academic Supervisor: Xavier Otazu
Supervisor e-mail: xotazu@cvc.uab.cat
Institution: UAB
co-Supervisor: Olivier Penacchio
co-Supervisor e-mail: olivier.penacchio@uab.cat
Confidential: No
Date: 2023-02-02 11:38:23
  Quantum Machine Learning for Computer Vision
Quantum Machine Learning (QML) refers to the application of quantum algorithms to machine learning problems, Recently available tools (IBM Qiskit, Google Cirq, Pennnylane) provide excellent starting points for research on QML algorithms, This project tackles classical CV problems (detection, classif., ...) with QML to achieve an introductory approach to the topic and its comparative performance.
Academic Supervisor: Fernando Vilarino
Supervisor e-mail: fernando@cvc.uab.es
Institution: UAB
Confidential: No
Date: 2023-01-31 11:33:54

  2021-2022 proposals
  4D Neural Models from uncalibrated videos
The estimation of 4D shape reconstruction normally relies on sophisticated pre-defined models or exhaustive systems of capture. Unfortunately, the generality of these solutions does not scale to a wide variety of challenging objects in nature. In this project, we will present an uncalibrated, differentiable and self-supervised algorithm to recover high detailed 4D reconstructions from still video.
Academic Supervisor: Antonio Agudo
Supervisor e-mail: aagudo@iri.upc.edu
Institution: UPF
Assigned Student Name: Sergio Montoya de Paco
Student e-mail: Sergio.MontoyaDePaco@autonoma.cat
Confidential: No
Date: 2022-06-07 15:52:01
  Text conditional image generation
In this project, we are interested in generating images that match a given text. Especially, we will work with fashionGEN dataset which contain fashion outfits with their text description.
Academic Supervisor: David Vazquez Bermudez
Supervisor e-mail: david.vazquez@servicenow.com
Assigned Student Name: Sergi García Sarroca
Student e-mail: Sergi.GarciaSa@autonoma.cat
pre-Assigned Student Name: Sergi Garcia Sarroca
Confidential: No
Date: 2022-05-31 13:41:19
  2-phase crossmodal search combining Dual Encoders and Visual-Language Models
2-phase crossmodal search combining Dual Encoders and Visual-Language Models
Academic Supervisor: Lluis Gomez
Supervisor e-mail: lgomez@cvc.uab.es
Institution: UAB
co-Supervisor: Dimosthenis Karatzas
co-Supervisor e-mail: dimos@cvc.uab.es
Assigned Student Name: Joan Fontanals Martinez
Student e-mail: Joan.Fontanals@autonoma.cat
pre-Assigned Student Name: Joan Fontanals
Confidential: No
Date: 2022-05-24 22:42:39
  Methods to minimize human interaction in labeling images for flora and fauna visual identification
The research is part of the UPC participation in the XPrize Rainforest challenge. The goal is identifying the flora and fauna in a given rainforest area and in a limited amount of time. Successful image classification systems are data-hungry. Our challenge is the lack of labeled images. We will explore methods to solve our classification problem with the minimum interaction in the labeling task
Academic Supervisor: Ferran Marques
Supervisor e-mail: ferran.marques@upc.edu
Institution: UPC
co-Supervisor: Antonio Torralba
co-Supervisor e-mail: torralba@csail.mit.edu
Assigned Student Name: Laia Albors Zumel
Student e-mail: Laia.Albors@autonoma.cat
pre-Assigned Student Name: Laia Albors Zumel
Confidential: No
Date: 2022-05-23 10:57:08
  Multimodal Data Representations for the Analysis of Social Media
Deep Learning for Multimodal data representation is a fundamental technique to integrate data of different types into a common space. With this project we intend to explore multimodal representations for the automatic detection of hate speech and fake news in social media posts
Extended abstract: Download PDF
Academic Supervisor: Ernest Valveny
Supervisor e-mail: Ernest.Valveny@uab.cat
Institution: UAB
co-Supervisor: Dimosthenis Karatzas
co-Supervisor e-mail: dimos@cvc.uab.es
Confidential: No
Date: 2022-05-17 14:36:12
  Synthesis of Virtual Avatar Animations from Sign Language Videos
Avatar synthesis is one of the most important and challenging tasks when it comes to sign language synthesis. In this project I will provide a system to convert sign animations from sign language videos. Additionally, a novel automatic system to generate a dataset will be proposed. Finally, an evaluation of different approaches to generate realistic sign animations will also be presented.
Extended abstract: Download PDF
Academic Supervisor: Coloma Ballester
Supervisor e-mail: coloma.ballester@upf.edu
Institution: UPF
co-Supervisor: Gloria Haro
co-Supervisor e-mail: gloria.haro@upf.edu
Assigned Student Name: Víctor Ubieto Nogales
Student e-mail: Victor.Ubieto@autonoma.cat
Confidential: No
Date: 2022-05-15 21:30:58
  Thermal event forecasting from video analysis in Wendelstein 7-X
Wendelstein 7-X is a Stellarator fusion prototype. UPC collaborates with IPP-MPI for a new operation phase to start in 2022. Data is available for research to develop image processing and deep learning tools for detection, tracking and classification of thermal events on Plasma Facing Components, and for the estimation of their evolution.
Extended abstract: Download PDF
Academic Supervisor: Josep R. Casas
Supervisor e-mail: josep.ramon.casas@upc.edu
Institution: UPC
co-Supervisor: Philippe Salembier, Aleix Puig-Sitjes
co-Supervisor e-mail: philippe.salembier@upc.edu
Confidential: No
Date: 2022-05-10 19:56:36
  Self-supervised learning of multimodal representations in food recipes
This project aims at developing a self-supervised deep learning approach to learn video representations of food recipes jointly from either video frames and textual descriptions or from video frames and their corresponding audio signal. The learned representation will be evaluated on the downstream tasks of action prediction and action localization on different datasets related to cooking.
Academic Supervisor: Coloma Ballester
Supervisor e-mail: coloma.ballester@upf.edu
Institution: UPF
co-Supervisor: Mariella Dimiccoli; Gloria Haro
co-Supervisor e-mail: mdimiccoli@iri.upc.edu; gloria.haro@upf.edu
Assigned Student Name: Igor Ugarte Molinet
Student e-mail: Igor.Ugarte@autonoma.cat
Confidential: No
Date: 2022-05-06 12:03:11
  Classifying and processing the information from the electricity bills of companies in Spain
In this project, we intend to extract information from electricity bills —such as euros or kilowatts spent— from some Spanish companies. For this, we need a machine learning algorithm that performs optical character recognition and classification, so that it is available to identify and classify the information contained in the bills. It should also deal with the clients’ private information.
Extended abstract: Download PDF
Academic Supervisor: Oriol Ramon
Supervisor e-mail: oriolrt@cvc.uab.cat
co-Supervisor: Dr. Melanie Revilla, Patricia Iglesias
co-Supervisor e-mail: melanie.revilla@upf.edu; patricia.iglesias@upf.edu
Assigned Student Name: Yu Pang
Student e-mail: Yu.Pang@autonoma.cat
Confidential: No
Date: 2022-04-04 09:41:50
  Audio-visual speech and singing voice separation 
Source separation is the automatic estimation of the individual isolated sources that make up the audio mixture. The goal of this project is to separate a human voice in a mixture by using both the audio and video modalities. Leveraging visual and motion information from the target person-s face is particularly useful when there are different voices present in the mixture.
Extended abstract: Download PDF
Academic Supervisor: Gloria Haro
Supervisor e-mail: gloria.haro@upf.edu
Institution: UPF
Assigned Student Name: Eudald Ballescà Casas
Student e-mail: Eudald.Ballesca@autonoma.cat
Confidential: No
Date: 2022-03-25 16:17:35
  Multiple and Diverse Image Colorization
Image colorization is a problem with multiple possible solutions. The aim of this project is to explore which is the best approach (i.e., transformers, capsule networks,...) to tackle this one-to-many problem yielding plausible colorization results being both spatially and semantically coherent.
Extended abstract: Download PDF
Academic Supervisor: Coloma Ballester
Supervisor e-mail: coloma.ballester@upf.edu
Institution: UPF
co-Supervisor: Lara Raad Cisa, Patricia Vitoria
co-Supervisor e-mail: lara.raadcisa@esiee.fr; patricia.vitoria@upf.edu
Confidential: No
Date: 2022-03-18 14:39:44
  Assisting people in successfully sorting their waste by using images
We aim to help people successfully sort their waste by using their smartphone camera. To do that, we need a mobile-friendly machine learning algorithm that performs image classification of the items to be recycled, and provides, as an outcome, the type of waste based on the main categories used in Spain: paper/carton, glass, plastic, green (organic waste), or garbage.
Extended abstract: Download PDF
Academic Supervisor: Dr. Melanie Revilla
Supervisor e-mail: melanie.revilla@upf.edu
Institution: UPF
co-Supervisor: Carlos Ochoa, Patricia Iglesias
co-Supervisor e-mail: carlos.ochoa@upf.edu, patricia.iglesias@upf.edu
Confidential: No
Date: 2022-03-14 12:31:25
  Knowledge-base for TextCaps and CTC
In this project, we will explore Cross-modal Retrieval, where the task is to provide a matching caption to a given image or vice-versa. The main idea is to create an external knowledge base or graph that captures relations between objects, scenes and scene-text. Later, the learned representation can be employed to assess which modalities can be employed to yield an improved retrieval performance.
Extended abstract: Download PDF
Academic Supervisor: Andres Mafla; Ali Furkan Biten
Supervisor e-mail: amafla@cvc.uab.es; abiten@cvc.uab.es
Institution: UAB
co-Supervisor: Dimosthenis Karatzas; Lluis Gomez
co-Supervisor e-mail: lgomez@cvc.uab.es; dimos@cvc.uab.es
Confidential: No
Date: 2022-03-10 17:28:41
  Image retrieval with text modifiers
In this project, we will study the task of image retrieval, where the input query is specified in the form of an image plus some text that describes desired modifications to the input image.
Extended abstract: Download PDF
Academic Supervisor: Ali Furkan Biten; Andres Mafla;
Supervisor e-mail: abiten@cvc.uab.es; amafla@cvc.uab.es
Institution: UAB
co-Supervisor: Dimosthenis Karatzas; Lluis Gomez
co-Supervisor e-mail: lgomez@cvc.uab.es; dimos@cvc.uab.es
Confidential: No
Date: 2022-03-10 17:27:10
  Hierarchical Model for Cross Modal Retrieval
In this project, we will try to explore cross modal retrieval (CMR) where the task is to retrieve an image given its captions or retrieve a caption given an image. The main idea of the project is to create CMR models that can perform retrieval across different datasets by creating hierarchical embeddings. We will try to answer the hypothesis that learning concepts hierarchically will result in bet
Extended abstract: Download PDF
Academic Supervisor: Ali Furkan Biten; Dimosthenis Karatzas
Supervisor e-mail: abiten@cvc.uab.es; dimos@cvc.uab.es
Institution: UAB
co-Supervisor: Andres Mafla; Lluis Gomez
co-Supervisor e-mail: amafla@cvc.uab.es; lgomez@cvc.uab.es;
Confidential: No
Date: 2022-03-10 17:25:19
  Understanding the dynamics of human interactions
Understanding social interactions in detail means understanding a large collection of social signals and social dynamics. In this project we will work on developing explainable deep learning models to approach fine-grain classification tasks related to the understanding of the dynamics of dyadic interactions.
Extended abstract: Download PDF
Academic Supervisor: Agata Lapedriza
Supervisor e-mail: alapedriza@uoc.edu
Institution: UOC
Assigned Student Name: José Manuel López Camuñas
Student e-mail: JoseManuel.LopezCam@autonoma.cat
pre-Assigned Student Name: Jose Manuel Lopez Camuñas
Confidential: No
Date: 2022-02-28 20:35:31
  Image Sentiment Analysis using commonsense knowledge
Image Sentiment Analysis is the problem of recognizing the emotions that images evoke to humans. In this project we will explore the use of external commonsense knowledge to improve the accuracy and generalization capabilities of the deep learning models for image sentiment analysis. For that we will create bimodal neural networks that will incorporate semantic reasoning.
Extended abstract: Download PDF
Academic Supervisor: Agata Lapedriza
Supervisor e-mail: alapedriza@uoc.edu
Institution: UOC
Assigned Student Name: Guang Jun Du
Student e-mail: GuangJun.Du@autonoma.cat
pre-Assigned Student Name: Guang Jun Du
Confidential: No
Date: 2022-02-28 20:29:31
  Automatic detection of fiber-cement roofs in aerial images
Some types of fiber-cement roofs can contain highly toxic materials. Although these types of roofs are currently banned, there are still many roofs that contain these toxic materials. Unfortunately the location of these types of roofs is often unknown. In this project we will work on the development of computer vision systems for the automatic detection of fiber-cement roofs in aerial images.
Extended abstract: Download PDF
Academic Supervisor: Agata Lapedriza
Supervisor e-mail: alapedriza@uoc.edu
Institution: UOC
co-Supervisor: Javier Borge
Assigned Student Name: Kevin Martín Fernández
Student e-mail: Kevin.MartinF@autonoma.cat
pre-Assigned Student Name: Kevin Martin Fernandez
Confidential: No
Date: 2022-02-28 20:25:36
  Automated count of fish species from submarine videos.
The research is a joint collaboration with the CISC Marine Science Institute. The MSC recorded thousands of hours of video from the Mediterranean sea, and need to automatically count the number of times that specific strategic fish species appear in the footage, to estimate the population before and after doing ecological interventions. Drop me an email for more information (dmasipr@uoc.edu).
Academic Supervisor: David Masip
Supervisor e-mail: dmasipr@uoc.edu
Institution: UOC
Assigned Student Name: David Serrano Lozano
Student e-mail: David.SerranoL@autonoma.cat
Confidential: No
Date: 2022-02-24 13:07:07
  On generating plausible RAW data
Data augmentation is a drawback for training neural networks. Currently, it is performed using conversions that are not realistic, hindering the results. In this project, we will improve on this problem by generating plausible augmentations. Given a dataset, we will generate a plausible version of it in RAW format, from where we will be able to generate different realistic augmentations.
Extended abstract: Download PDF
Academic Supervisor: Javier Vázquez Corral
Supervisor e-mail: javier.vazquez@cvc.uab.cat
Institution: UAB
Confidential: No
Date: 2022-02-22 16:15:04
  Color manipulation for photographic enhancement
Creating models that given an unprocessed image output an image that mimics the result of professional photographers is currently a hot topic. However, current models are cumbersome, and different from the few modifications that photographers are allowed to perform. We will tackle this problem by developing a model that can be expressed as a cascade of standard image photographic enhancements.
Extended abstract: Download PDF
Academic Supervisor: Javier Vázquez Corral
Supervisor e-mail: javier.vazquez@cvc.uab.cat
Institution: UAB
Assigned Student Name: Marcos V Conde Osorio
Student e-mail: MarcosV.Conde@autonoma.cat
Confidential: No
Date: 2022-02-22 15:59:29
  Can biological solutions help computers perceive symmetry?
Symmetry is an important visual cue for a wide range of biological organisms regardless of size and cognitive ability. The perception of symmetry is important for object processing by facilitating target recognition and identification. Although easy for humans, it is very challenging for computers -it has been proposed as a robust -captcha- by Funk & Liu (CVPR2016).
Extended abstract: Download PDF
Academic Supervisor: C. Alejandro Parraga
Supervisor e-mail: Alejandro.Parraga@cvc.uab.es
Institution: UAB
Confidential: No
Date: 2022-02-18 11:32:03
  Biologically-inspired tone mapping
The processing of high dynamic range images into lower dynamic range representations is something that the human visual system easily does all the time. However, this task (i.e. Tone Mapping or TM) is very difficult for machines. Our aim is to apply well-known primate visual mechanisms to tone-mapping algorithms to improve their performance.
Extended abstract: Download PDF
Academic Supervisor: C. Alejandro Parraga
Supervisor e-mail: Alejandro.Parraga@cvc.uab.es
Institution: UAB
Confidential: No
Date: 2022-02-18 11:27:28
  White balance in the presence of mixed colour illuminations
White balance (WB) algorithms compensate for the colours of illuminants. For example, tungsten lights introduce a yellowish cast. Unfortunately, many scenes exhibit a combination of illuminants (e.g., artificially lit indoor scenes plus light from a window). In these cases, WB is a challenging task. Our aim is to extend the capabilities of existing WB algorithms to mixed coloured illuminants.
Extended abstract: Download PDF
Academic Supervisor: C.Alejandro Parraga
Supervisor e-mail: Alejandro.Parraga@cvc.uab.es
Institution: UAB
Confidential: No
Date: 2022-02-18 11:22:43
  Sketch2Code
Designing structured apps such as websites is a labor-intensive process from the side of front-end developers and designers. Recent advances in CV and NLP with Transformers have made possible to generate HTML from simple sketches [1]. In this project, we propose developing new sketch2code techniques for more structured formats such as forms. [1] https://sketch2code.azurewebsites.net/
Academic Supervisor: Pau Rodríguez
Supervisor e-mail: pau.rodriguez@servicenow.com
Institution: UAB
co-Supervisor: David Vazquez
Assigned Student Name: Juan Antonio Rodríguez García
Student e-mail: JuanAntonio.RodriguezG@autonoma.cat
Confidential: No
Date: 2022-02-14 19:33:00
  Handwritten Music Recognition
Despite the raise of deep learning, the recognition of old handwritten scores is far to be solved, because labeled data to train is barely available and the high variability in the handwriting styles. Thus, this work will be focused on proposing deep learning methodologies for historical handwritten scores, taking into account the particularities of graphical music notation.
Academic Supervisor: Alicia Fornes
Supervisor e-mail: afornes@cvc.uab.es
Institution: UAB
Assigned Student Name: Pau Torras Coloma
Student e-mail: Pau.Torras@autonoma.cat
pre-Assigned Student Name: Pau Torras
Confidential: No
Date: 2022-02-10 13:23:59
  Fruit tracking using RGB-D data
The goal of this project is to capture a video sequence of fruit trees using a camera that travels along the row of trees. Most of the fruits that are occluded in a frame will be visible in another one. To avoid counting each fruit multiple times, the fruits must be given a unique ID. This can be achieved using object tracking.. RGB-D data will be used to improve the performance.
Extended abstract: Download PDF
Academic Supervisor: Josep Ramon Morros
Supervisor e-mail: ramon.morros@upc.edu
Institution: UPC
co-Supervisor: Jordi Gené Mola
co-Supervisor e-mail: jordi.genemola@udl.cat
Assigned Student Name: Francesc Net Barnés
Student e-mail: Francesc.Net@autonoma.cat
Confidential: No
Date: 2022-02-08 11:39:31
  ERP decoding to classify self-made and externally generated errors
In the present project, we want to explore the application of ERPs in response to self-made errors and externally generated errors, and to investigate the accuracy of decoders. We will use different ERP descriptors and classifiers, at a single-trial basis, to decode internal and external causes of errors.
Extended abstract: Download PDF
Academic Supervisor: Xim Cerdá Company
Supervisor e-mail: xcerda@cvc.uab.cat
Institution: UAB
co-Supervisor: Alba Gómez Andrés
co-Supervisor e-mail: agomezandres@gmail.com
Confidential: No
Date: 2022-02-04 08:54:30
  3D fruit detection and size estimation using graph neural networks
The detection and measurement of fruit size is of great interest to estimate the crop and predict harvest resources. Nowadays, different sensors are able to register fruit trees into a 3D map of the environment. The main goal of this thesis will be to explore and design new deep learning architectures based on Graph Neural Networks to detect fruits and estimate their size.
Extended abstract: Download PDF
Academic Supervisor: Javier Ruiz Hidalgo
Supervisor e-mail: j.ruiz@upc.edu
Institution: UPC
co-Supervisor: Jordi Gené Mola
co-Supervisor e-mail: jordi.genemola@udl.cat
Assigned Student Name: Ignacio Galve Ceamanos
Student e-mail: Ignacio.Galve@autonoma.cat
Confidential: No
Date: 2022-02-02 16:03:32