Using machine learning, LLMs, and retrieval-augmented generation to automate how software requirements are elicited, analysed, validated, and translated into verifiable system behaviour. The primary focus of the research programme since 2013.
Theme Overview
Requirements are the bridge between human intent and software behaviour. When that bridge is built poorly, systems fail — sometimes catastrophically. This theme asks: how can AI help engineers build that bridge more reliably, at scale, in domains where mistakes are costly?
Related Publications
Full title: Automated Question Answering for Improved Understanding of Compliance Requirements. One of the earliest applications of transformer-based multi-document QA to regulatory compliance, evaluated on financial and healthcare specifications — predating widespread LLM adoption. 80+ citations.
Full title: AI-Based Question Answering Assistance for Analyzing Natural-Language Requirements. Scalable NLP pipeline validated across 1,250+ industrial requirements from eight domains. Open dataset and replication package released. 100+ citations.
Full title: Automated Demarcation of Requirements in Textual Specifications: A Machine Learning Approach. Foundational ML approach for identifying requirements in natural language documents, evaluated on 22 real specifications across 8 domains. A frequently cited benchmark in the field. 200+ citations.
One of the first published applications of RAG to requirements engineering in a live industrial setting, developed with automotive industry partners. Demonstrates how retrieval-augmented generation bridges the gap between natural language requirements and executable test scenarios. First author.
Systematic approach to deriving executable test cases from natural language requirements using LLMs, evaluated in industrial contexts. Led by PhD student Fanyu Wang — published in ACM Transactions on Software Engineering and Methodology.
Pipeline for extracting structured requirements from unstructured user reviews at scale, applied to app store data across multiple domains.
Emerging Direction
As AI systems move from answering queries to taking autonomous actions — browsing the web, writing and executing code, managing files, calling APIs — the problem of specification becomes dramatically harder. Agentic systems operate across longer horizons, with less human oversight at each step. When they fail, the failure often traces back not to the model, but to an underspecified goal, a missing constraint, or a boundary condition nobody thought to define.
This emerging research direction asks: what does it mean to specify requirements for an agent rather than a function? How do we define, verify, and monitor bounded autonomy? What guardrails are needed when an LLM-powered agent makes decisions that cascade across systems? This research programme is actively building on its RE and responsible AI foundations to address these questions — in collaboration with industry partners operating in high-stakes domains.
People
Related PhD Research
Interested in collaborating or contributing to this research?
Get in Touch →