Available Bachelor and Master thesis opportunities, as well as internships.
The list is not exhaustive — feel free to get in touch if a topic sparks your interest.
Erasmus+ opportunities are available for selected thesis and internship projects. Active international collaborations with Télécom Paris (FR), The University of Manchester (UK), University of Aberdeen (UK), Imperial College London (UK), University of Malta (MT), and Maynooth University (IE).
This project explores how Runtime Verification can be combined with nudging techniques to improve both software reliability and user behaviour. Runtime Verification monitors a system during execution, while nudging provides gentle suggestions to help users make better decisions. The goal is to design a framework that not only detects incorrect behaviour but also guides users towards safer and more effective actions.
In runtime verification, monitors check system behaviour against Linear Temporal Logic (LTL) properties, but limited resources allow observing only part of the system. Instead of relying on manually defined metrics to decide what to observe, this work applies machine learning to automatically learn the best selection strategy from execution traces. The goal is to improve the efficiency and accuracy of runtime verification in resource-constrained environments.
Genetic algorithms are used to automatically learn formal properties of systems by exploring large and complex search spaces. This work studies how these algorithms can discover and represent meaningful properties, as well as their strengths and limitations. The results help understand how evolutionary techniques can support formal methods in areas such as artificial intelligence, optimisation, and robotics.
Large Language Models such as ChatGPT are powerful but raise concerns about reliability and correctness. Formal verification techniques are explored as a way to define precise specifications and check the behaviour of LLMs against them. The aim is to develop and evaluate methods and tools that improve the safety and trustworthiness of language models in real-world applications.
The Runtime Monitoring Language (RML) is a flexible formalism for defining monitoring rules, but it currently lacks precise support for time-based reasoning. The idea is to extend RML with metric time intervals, similar to those used in Metric Temporal Logic, enabling rules with explicit timing constraints. This makes RML more suitable for applications such as event scheduling and time-critical monitoring systems. More information on RML is available at RMLatDIBRIS.
As AI systems become more autonomous and are used in critical domains such as healthcare and autonomous driving, ensuring their reliability is essential. Runtime Verification is used to monitor AI systems during execution and detect unexpected or unsafe behaviour. The goal is to review and compare existing runtime verification techniques for AI, analysing how well they work in practice and where their limitations are. This helps identify open challenges and future research directions.
Wearable devices such as glucose monitors and smartwatches are increasingly used to support diabetes management. The idea is to analyse real data from wearable devices and study typical faults and anomalies that may occur during operation. A runtime monitoring framework is then designed and implemented to detect incorrect behaviour in real time. The approach is evaluated in terms of reliability and patient safety improvements.
Multi-agent systems provide a natural way to model autonomous and cooperative behaviour in distributed environments. This project applies these concepts to educational robotics by using programmable Sphero robots. Multiple robots are designed to communicate and coordinate, showing behaviours such as collaboration and swarm intelligence. Visual and intuitive programming tools allow users to experiment hands-on with core multi-agent concepts in an engaging way.
This thesis explores the use of the VEsNA Toolkit to build intelligent and interactive virtual environments in Godot by combining multi-agent systems with natural language interaction. Agents embodied in virtual worlds can communicate with users through language to manipulate objects, control behaviours, and modify the environment in real time. The work involves designing interactive scenarios, implementing agent behaviours, and evaluating how natural and intuitive the interaction feels. More information on the toolkit is available at VEsNA Toolkit.
VEsNA enables users to interact with and manipulate virtual environments through natural language and cognitive agents. This work focuses on extending the framework to support more advanced reasoning, simulation, and training capabilities. The system is enhanced to allow dynamic scenario simulations, rule-based compliance checks, and interactive training with AI-driven virtual agents. These extensions make VEsNA more suitable for simulation-based learning and decision-support applications.
Runtime Verification is used to monitor the behaviour of software systems while they are running. The idea behind this work is to integrate runtime verification into SARL-based multi-agent systems in order to check agent interactions and protocols at runtime. A monitoring framework is designed to detect violations of correctness and safety properties and provide immediate feedback. The approach is evaluated on a multi-agent case study to assess its impact on reliability and performance.
Mesa is a Python-based framework for agent-based modelling that is widely used for simulation and analysis. This work addresses the integration of Runtime Verification into Mesa by developing a dedicated library for monitoring simulations at runtime. The library allows developers to specify formal requirements and constraints that agent-based simulations must satisfy during execution. Its effectiveness is validated by applying it to existing Mesa simulation examples.
StarCraft is a highly challenging environment for artificial intelligence due to its complexity, partial observability, and long-term planning requirements. The aim is to apply reinforcement learning techniques to develop adaptive agents capable of making strategic decisions in real-time strategy games. Different approaches, including deep reinforcement learning and multi-agent methods, are explored and evaluated using existing StarCraft II AI environments. The resulting agents are compared against baseline models to assess their performance and generalisation abilities.
This thesis applies Runtime Verification to the training phase of neural networks to assess whether learning is proceeding correctly. Monitors are inserted between neural layers to observe internal changes and check formally defined properties related to convergence, gradients, and training stability. This allows early detection of problematic learning dynamics such as overfitting or vanishing gradients. The approach improves interpretability and trustworthiness in neural network training.
Large Language Models (LLMs) show strong language generation skills, but their reasoning abilities are still not fully understood. This work explores the strengths and limitations of LLM reasoning by analysing their performance on tasks such as logical puzzles, mathematical problems, and multi-step reasoning. Different benchmarks and evaluation criteria are used to assess consistency, coherence, and alignment with formal reasoning principles. The results help identify current gaps and possible directions for more trustworthy and explainable AI reasoning.
Neuro-symbolic models such as Logical Neural Networks (LNNs) combine neural learning with symbolic reasoning, but they can still produce inconsistent or incorrect inferences. The goal is to use Runtime Verification to monitor reasoning steps at runtime and detect violations of formal correctness rules. These violations are then used as feedback to guide and improve the learning process. The result is a more accurate and trustworthy neuro-symbolic AI system.
This thesis introduces monitor-neurons, specialised components embedded in neural networks that perform Runtime Verification over temporal and semantic properties. These monitors check formal constraints during learning and influence training through feedback mechanisms such as loss functions or gating signals. Different strategies for placing monitor-neurons inside the network are explored, both manual and automatic. The approach is evaluated on sequential learning tasks to improve interpretability, correctness, and semantic awareness of neural models.
Traditional BDI (Belief-Desire-Intention) agents rely on predefined plans, which may fail in highly dynamic or unpredictable environments. This work investigates how neural models can be integrated into the BDI architecture to generate plans on the fly when no suitable symbolic plan is available. A hybrid approach is designed where learning-based components propose alternative action sequences after plan failure. The solution is evaluated in dynamic domains to assess improvements in robustness, adaptability, and goal achievement.
BDI agents usually manage intentions using fixed rules, which can be fragile in dynamic and uncertain environments. The aim is to integrate learning-based components into the BDI reasoning loop to support more flexible intention selection and progression. Neural models are trained to suggest when intentions should be dropped, maintained, or reprioritised based on context and past experience. The approach is evaluated in simulated scenarios requiring complex and adaptive intention management.
This thesis explores whether and how the concept of agency can be applied to generative models such as large language models. While classical agent architectures like BDI rely on goals and intentions, LLMs operate through probabilistic text generation without explicit deliberation. The work analyses philosophical and AI perspectives on agency and compares agent-based and generative systems. The outcome is a conceptual framework or taxonomy that clarifies differences, limitations, and possible hybrid views of agency in AI.
This thesis investigates contemporary agentic AI frameworks that build autonomous or semi-autonomous agents on top of large language models. The work focuses on experimentally comparing different frameworks in terms of their architectural principles, coordination mechanisms, memory handling, and tool integration. Through a set of controlled experiments and real-world case studies—such as task automation, collaborative agents, and interactive systems—the thesis evaluates strengths, limitations, and trade-offs of each approach. The outcome is an empirical comparison and a set of practical guidelines for selecting and designing agent-based systems for specific application domains.
Building on established expertise in multi-agent systems, verification, and engineered autonomy, this thesis investigates how modern agent AI frameworks (including those driven by large language models) relate to, extend, or challenge classical formal approaches. The work involves a comparative evaluation of agent frameworks in terms of correctness, coordination, and reliability, and the design of case studies that integrate formal verification techniques with emergent capabilities of LLM-supported agents. Core examples may include verifying agent interactions in safety-critical scenarios and enabling runtime monitoring of autonomous agents deployed in real or simulated environments. The thesis aims to contribute both a principled methodology for evaluating agent frameworks and a set of validated case studies demonstrating their strengths and limits.
This thesis addresses the lack of systematic evaluation methodologies for agentic AI systems, particularly those combining large language models with autonomous decision-making and multi-agent coordination. Drawing on principles from multi-agent systems and engineered autonomy, the work investigates how classical evaluation criteria—such as correctness, robustness, coordination quality, and failure modes—can be adapted to modern agent-based AI frameworks. The thesis proposes a structured evaluation methodology and applies it to a set of representative case studies, highlighting where current agentic systems succeed, fail, or violate implicit assumptions of autonomy and rationality. The outcome is a set of evaluation guidelines and reusable benchmarks aimed at improving the scientific rigor of agentic AI research.
The Digital Product Passport (DPP) provides detailed information about a product’s lifecycle to support sustainability and circular economy goals. This project focuses on developing a chatbot based on Large Language Models to make DPP data easier to access and understand. Users can interact with the product through natural language, as if the product itself were responding. The chatbot is integrated with DPP databases to retrieve real-time information and provide personalised answers.
Traditional BDI agents support goal-directed reasoning but lack natural language interaction, while Large Language Models excel at communication without true intentionality. This work explores a hybrid approach that integrates BDI agents with LLMs to combine structured decision-making and fluent language interaction. The system builds on frameworks such as ChatBDI, which use Knowledge Query and Manipulation Language (KQML) to enable structured communication between agents and language models. The approach is evaluated through case studies in multi-agent environments.
This project builds on an existing implementation of the Overcooked cooperative game, where a human player collaborates with an AI teammate. The goal is to extend a Jason-based BDI agent with Large Language Models to enable strategic dialogue with the human partner. Using the ChatBDI tool, the agent engages in natural language discussions about strategy, explains its behaviour, and adapts its plans based on the outcome of these interactions. The work involves understanding the game architecture, state representation, and the Python–Java communication bridge, as well as integrating and using LLM APIs. The result is an end-to-end pipeline demonstrating how BDI agents and LLMs can be combined to improve collaboration, transparency, and teamwork in human–AI systems.
Robots used in military and rescue missions may carry sensitive evidence that must be protected while in the robot’s custody. This project explores the use of the RV-TEE architecture, which combines Runtime Verification with secure hardware to protect cryptographic operations and sensitive data. RV-TEE uses a hardware security module to isolate keys and a runtime monitor that strictly controls how and when secure functions can be accessed. The approach is studied in the ROS ecosystem (ROS), using secure hardware such as SECube.
Runtime monitors are critical security components and are often prime targets for attackers. This work focuses on extending RVsec, a security stack that protects runtime verification monitors deployed in secure environments. In particular, the project develops and evaluates a monitor code attestation mechanism that continuously checks whether the monitoring code has been tampered with. The solution is assessed by measuring overhead and simulating attacks to verify its effectiveness in preserving trustworthy monitoring.
Runtime monitors are critical security components and must be protected against tampering and disabling attacks. This project aims to evaluate the applicability of RVsec, a security stack for protecting runtime verification monitors, beyond its initial validation. While RVsec has already been tested in a secure chat application, it is applied here to a substantially different case study, potentially in the robotics domain. The focus is on assessing how well RVsec transfers to new environments and identifying its strengths and limitations across domains.
Runtime verification monitors can themselves become targets of cyberattacks and therefore require strong security guarantees. This project focuses on extending RVsec by identifying and formalising security- and performance-related runtime properties that indicate ongoing attacks, such as suspicious commands, abnormal traffic, or privilege escalation attempts. These properties are implemented as generic runtime monitors that observe both application behaviour and host-level signals. A case study is used to validate the approach and measure the overhead introduced by the additional monitoring.
This thesis focuses on extending and applying ROSMonitoring, a framework for monitoring and verifying robotic systems. New features are developed to strengthen its monitoring and verification capabilities. The enhanced framework is then applied to different robotic domains to evaluate its effectiveness and versatility. More information on the framework is available at ROSMonitoring.
Runtime verification is essential for ensuring the safety and reliability of robotic systems during execution. This project aims to integrate two existing tools, LARVA (from the University of Malta) and ROSMonitoring, into a unified framework for verifying ROS-based robotic applications. The integration combines LARVA’s monitoring of Java programs with ROSMonitoring’s verification of ROS communications. The resulting framework is evaluated on robotic case studies to assess its effectiveness in detecting runtime violations.
This thesis aims to enhance an agent-based parking system presented in De Soricellis et al. (PAAMS 2025) by integrating reinforcement learning into vehicle agents. Instead of relying on fixed heuristics, agents learn parking strategies from past experience, balancing distance, availability, and competition in dynamic urban settings. The solution is implemented and evaluated in traffic simulations, comparing learned behaviours with static approaches. The study analyses performance gains, reward design, and the impact of learning on urban mobility.
This work explores fairness and environmental sustainability in an infrastructure-free agent-based parking system presented in De Soricellis et al. (PAAMS 2025). New metrics are introduced to evaluate fairness in parking allocation, such as walking distance, waiting time, and success rate across agents. The approach also estimates CO₂ emissions by comparing different parking strategies in traffic simulations. Experiments in SUMO analyse how agent behaviours and adoption levels affect equity and environmental impact.
The Operational Design Domain (ODD) defines the conditions under which an autonomous vehicle can safely operate. The goal is to use formal methods to automatically determine the minimal set of sensors, actuators, and system upgrades required for a vehicle to comply with a given ODD. Instead of manual analysis, a formal synthesis framework generates recommendations directly from ODD specifications. This approach reduces development effort and improves the reliability of autonomous vehicle design.
This thesis introduces a formal verification framework to check whether an autonomous vehicle’s existing hardware and software configuration complies with a given Operational Design Domain. Instead of relying on manual inspection, the framework models both vehicle capabilities and ODD requirements to automatically verify compliance. It identifies mismatches and missing components that could compromise safe operation. The approach improves reliability and supports safer deployment of autonomous vehicles.
If you are interested, please send me an email including: