November 6, 2025
Reward shaping in multi-agent reinforcement learning (MARL) for complex tasks remains a significant challenge. Existing approaches often fail to find optimal solutions or cannot efficiently handle such tasks. We propose HYPRL, a specification-guided reinforcement learning framework that learns control policies w.r.t. hyperproperties expressed in HyperLTL. Hyperproperties constitute a powerful formalism for specifying objectives and constraints over sets of execution traces across agents. To learn policies that maximize the satisfaction of a HyperLTL formula, we apply Skolemization to manage quantifier alternations and define quantitative robustness functions to shape rewards over execution traces of a Markov decision process with unknown transitions. A suitable RL algorithm is then used to learn policies that collectively maximize the expected reward and, consequently, increase the probability of satisfying the formula. We evaluate HYPRL on a diverse set of benchmarks, including safety-aware planning, Deep Sea Treasure, and the Post Correspondence Problem. We also compare with specification-driven baselines to demonstrate the effectiveness and efficiency of HYPRL.
About Tzu-Han Hsu
Tzu-Han Hsu is a sixth-year Ph.D. candidate in Computer Science and Engineering at Michigan State University, advised by Dr. Borzoo Bonakdarpour. Her research develops automated and formal verification techniques to ensure the reliability and trustworthiness of software systems. She focuses on hyperproperties, a specification framework for reasoning about relationships among multiple program executions, which provides a powerful foundation for modeling and verifying security, privacy, and consistency requirements in concurrent and distributed systems. Her work has appeared in leading venues such as NeurIPS, CAV, TACAS, and CSF, and received the Best Research Award at the MSU Graduate School Research Symposium in 2024. Currently, she explores the use of hyperproperties in program enforcement, automated repair, reinforcement learning, and program synthesis.