CA | ES | EN
Counterexample Guided Program Repair Using Large Language Models and MaxSAT-based Fault Localization

Debugging and repair are costly in both software development and programming education. This talk presents two approaches that combine formal methods with large language models (LLMs) to improve accuracy and scalability. First, we introduce CFaults, a MaxSAT-based fault localization tool for C programs that ensures consistent, subset-minimal diagnoses across multiple failing tests. It outperforms existing tools like BugAssist and SNIPER in speed and precision. Second, we apply a hybrid method to automated repair of C code, using MaxSAT, CFaults, to localize bugs and LLMs to replace the faulty code via a counterexample-guided loop. This approach produces more targeted and effective repairs than existing symbolic or LLM-only methods, improving outcomes across over 1,400 incorrect student submissions. Together, these methods show how combining formal reasoning with generative models enables more efficient and precise debugging and repair.

Pedro Orvalho is a Postdoctoral Research Associate in the Department of Computer Science at the University of Oxford, where he works with Professor Marta Kwiatkowska. He holds a PhD in Computer Science from Instituto Superior Técnico (IST), Universidade de Lisboa (UL), conducted in collaboration with the Czech Technical University (CTU) in Prague. His doctoral research was hosted across INESC-ID, IST, and CIIRC at CTU, under the supervision of Vasco Manquinho and Mikoláš Janota. Pedro was previously a Junior Researcher in the Automated Reasoning and Software Reliability group at INESC-ID. He completed both his BSc and MSc in Information Systems and Computer Engineering at IST-UL, with research experience at Carnegie Mellon University (CMU), USA, and as a Research Intern at OutSystems. Pedro’s research lies at the intersection of Automated Reasoning (AR) and Machine Learning (ML), with a current focus on the robustness of Large Language Models (LLMs) for code understanding. His work aims to improve the accuracy and reliability of LLMs in software engineering tasks by combining automated reasoning with machine learning techniques.