February 20, 2025
Software shapes the modern world, yet software development remains challenging and resource-intensive. It involves a complex life cycle—design, implementation, testing, and maintenance. While AI assistants have the potential to support software development, their adoption in practice has been limited due to accuracy and reliability concerns. My research focuses on integrating software domain knowledge into AI models to build more accurate and practical AI assistants for software development.
In this talk, I will introduce KNOD, LeDex, and Nova. KNOD is an efficient tool for automated program repair, which designs a novel tree encoder-decoder architecture to learn source code syntax, distills code semantics during model training, and leverages syntax and semantic knowledge for inference time improvement. LeDex is a framework for building large language model (LLM) assistants that can not only write code but also review and debug code, which aligns better with developers’ workflow. Nova is the first series of LLM that can understand assembly code and translate assembly code to source code. It employs hierarchical attention to better capture assembly code semantics and designs contrastive learning objectives for learning the diverse compiler optimizations. Together, these works demonstrate the power of incorporating software domain knowledge into AI approaches, paving the way for more effective AI-assisted software development.
About Nan Jiang
Nan Jiang is a final-year Ph.D. candidate in Computer Science at Purdue University, advised by Professor Lin Tan. His research focuses on developing AI-driven tools to support various stages of software development, including front-end implementation in the “design” stage, source code generation in the “implementation” stage, automated program repair in the “testing” stage, and security fixing and reverse engineering in the “maintenance” stage. Working at the intersection of artificial intelligence and software engineering, he has published in top-tier conferences across both fields, including ICSE, FSE, ISSTA, NeurIPS, AAAI, and ICLR.