The following projects are suitable for students on Computer Science courses undertaking their undergraduate or postgraduate dissertations at the Department of Computer Science, Loughborough University under my supervision.
Deep Feature Selection: Feature selection using Deep Neural Networks has not been well studied, despite its importance which facilitates understanding of data. This project involves the development (or an in-depth empirical comparison) of algorithms for removing irrelevant features from large unimodal and multi-modal datasets.
Deep learning for noisy unbalanced data: It is a challenging task to train deep learning models on unbalanced data which is commonly generated in real-time scenarios. The complexity escalates when training deep learning algorithms on multi-modal data spaces. This project involves: 1) the development (or thorough empirical comparison) of algorithms for classifying unbalanced data obtained from smart environments for tasks such as human activity or gesture recognition; and 2) investigating the performance of the algorithms when classifying various activities.
Continual/Lifelong Deep Learning: Training deep neural networks to learn a very accurate mapping from inputs (such as image data, sensor data, text) to outputs (e.g. labels also known as classes) requires large amounts of labelled data. Even when these models are trained, they have limited ability to generalise to conditions which are different to the ones used for training the model. This topic concerns the development (or thorough empirical comparison) of Continual/Lifelong learning algorithms which can learn continuously and adaptively, to autonomously and incrementally develop complex skills and knowledge. Projects include the development of methods for recognising new behaviours in various environments such as smart environments (e.g. cities, homes, healthcare settings), and continuous object recognition.
Multi-modal information retrieval: This project involves the development of algorithms (embedded in an on-line tool) for indexing and retrieving documents which contain multi-modal data (such as images and text).
Tweet classification: This project involves the development of machine learning methods for automatic clustering of semantically similar tweets. Develop a tool which, given a query and other search criteria, finds relevant tweets and clusters the tweets based on their semantic similarity and topic. For this project, clustering, topic modelling and natural language algorithms will be implemented.
Cross-lingual information retrieval: This project involves the development (or an in-depth empirical comparison) of deep learning methods for retrieving information written in a language which is different from the language of the user’s query. For example, retrieving content written in English given a query which has been written in another language, such as Greek or French.
Source-code similarity detection using unsupervised machine learning: This project involves the development of machine learning models for analysing and clustering files of large source-code repositories. An example project would be developing a tool which can: 1) index source-code fragments and files found in large source-code repositories; and 2) cluster similar source-code fragments using deep learning methods.
Cross-language source code retrieval: This project involves the development (or an in-depth empirical comparison) of methods for clustering, searching and retrieving source-code files and fragments which have been written in different programming languages. Given a user query which contains code written in Java, can the system retrieve similar source-code fragments written in C#?