Microsoft on Tuesday announced an autonomous artificial intelligence (AI) agent that can analyze and classify software without assistance in an effort to advance malware detection efforts.
The large language model (LLM)-powered autonomous malware classification system, currently a prototype, has been codenamed Project Ire by the tech giant.
The system “automates what is considered the gold standard in malware classification: fully reverse engineering a software file without any clues about its origin or purpose,” Microsoft said. “It uses decompilers and other tools, reviews their output, and determines whether the software is malicious or benign.”
Project Ire, per the Windows maker, is an effort to enable malware classification at scale, accelerate threat response, and reduce the manual efforts that analysts have to undertake in order to examine samples and determine if they are malicious or benign.

Specifically, it uses specialized tools to reverse engineer software, conducting analysis at various levels, ranging from low-level binary analysis to control flow reconstruction and high-level interpretation of code behavior.
“Its tool-use API enables the system to update its understanding of a file using a wide range of reverse engineering tools, including Microsoft memory analysis sandboxes based on Project Freta (opens in new tab), custom and open-source tools, documentation search, and multiple decompilers,” Microsoft said.
Project Freta is a Microsoft Research initiative that enables “discovery sweeps for undetected malware,” such as rootkits and advanced malware, in memory snapshots of live Linux systems during memory audits.
The evaluation is a multi-step process –
- Automated reverse engineering tools identify the file type, its structure, and potential areas of interest
- The system reconstructs the software’s control flow graph using frameworks like angr and Ghidra
- The LLM invokes specialized tools through an API to identify and summarize key functions
- The system calls a validator tool to verify its findings against evidence used to reach the verdict and classify the artifact
The summarization leaves a detailed “chain of evidence” log that details how the system arrived at its conclusion, allowing security teams to review and refine the process in case of a misclassification.
In tests conducted by the Project Ire team on a dataset of publicly accessible Windows drivers, the classifier has been found to correctly flag 90% of all files and incorrectly identify only 2% of benign files as threats. A second evaluation of nearly 4,000 “hard-target” files rightly classified nearly 9 out of 10 malicious files as malicious, with a false positive rate of only 4%.

“Based on these early successes, the Project Ire prototype will be leveraged inside Microsoft’s Defender organization as Binary Analyzer for threat detection and software classification,” Microsoft said.
“Our goal is to scale the system’s speed and accuracy so that it can correctly classify files from any source, even on first encounter. Ultimately, our vision is to detect novel malware directly in memory, at scale.”
The development comes as Microsoft said it awarded a record $17 million in bounty awards to 344 security researchers from 59 countries through its vulnerability reporting program in 2024.
A total of 1,469 eligible vulnerability reports were submitted between July 2024 and June 2025, with the highest individual bounty reaching $200,000. Last year, the company paid $16.6 million in bounty awards to 343 security researchers from 55 countries.