Revolutionizing Software Security and Reliability with Grammar Inference
Decision procedures have revolutionized software verification and security. Nowadays, it's difficult to find a software security paper not using a decision procedure at a premiere security conference. However, there are many other formal reasoning techniques that could improve reasoning about software, so what is next?
Key Ideas
- My observation is that most existing test generation techniques (e.g., DART, concolic,...) lack global understanding of program's structure. Grammar inference is capable of recovering (i.e., learning) such high-level info and encode it compactly as a suitable automaton.
- Besides recovering the program structure, the approach can be used to reverse-engineer program's communication with the environment.
Published Work
My first paper on this topic appeared at CCS'2010. In that paper, my coauthors and I improved
on the state-of-the-art automatic finite-state-machine inference, and
applied it to automatic inference of proprietary botnet protocols. We
managed not only to infer the protocol, but we also (1) discovered flaws
enabling us to milk the spam templates, meaning that we could update
spam filters even before the first spam hits the network, (2) proved
existence of background communication channels even though we could
neither access nor eavesdrop those channels, and (3) identified differences
between the botnet SMTP server and the standard SMTP implementations
(e.g., Postfix), which means that we could detect active bots on a live
network. Our work even got some press coverage in an Information Week's
article.
The follow-up paper focuses on a new grammar inference technique that works in a lock-step with concolic execution so as to explore the program's state space more effectively. We found a number of new vulnerabilities and showed a significant improvement in the coverage and exploration depth.