By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Meta Launches AutoPatchBench to Evaluate LLM Agents on Security Fixes
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Meta Launches AutoPatchBench to Evaluate LLM Agents on Security Fixes
News

Meta Launches AutoPatchBench to Evaluate LLM Agents on Security Fixes

News Room
Last updated: 2025/05/07 at 3:35 PM
News Room Published 7 May 2025
Share
SHARE

AutoPatchBench is a standardized benchmark designed to help researchers and developers evaluate and compare how effectively LLM agents can automatically patch security vulnerabilities in C/C++ native code.

AutoPatchBench comprises a collection of tests aimed at evaluating the ability of LLMs to autonomously generate security patches for vulnerabilities identified using fuzz testing.

This benchmark aims to facilitate a comprehensive understanding of the capabilities and limitations of various AI-driven approaches to repairing fuzzing-found bugs. By offering a consistent set of evaluation criteria, AutoPatchBench fosters transparency and reproducibility in research.

Compared to general-purpose benchmarks for evaluating software engineering agents like SWE-Bench and SWE-Bench Verified, AutoPatchBench focuses on the specific challenges posed by bugs uncovered through fuzzing techniques, which often involve security vulnerabilities.

AutoPatchBench is based on a subset of ARVO, a dataset of over 5,000 real-world C/C++ vulnerabilities discovered by Google’s OSS-Fuzz across more than 250 projects. Each vulnerability in ARVO is paired with a triggering input and the canonical patch the developer wrote to fix the issue.

We retained 136 samples for AutoPatchBench that fulfill the necessary conditions for both patch generation and verification. From this refined set, we created a down-sampled subset of 113 AutoPatchBench-Lite samples to provide a focused benchmark for testing AI patch generation tools. These subsets preserves the diversity and complexity of real-world vulnerabilities including 11 distinct crash types, offering a solid foundation for advancing AI-driven security solutions.

Fuzz testing is a technique used to uncover security exploits and vulnerabilities by reaching edge cases that are difficult for human testers to encounter. As noted by the creators of OpenSSF’s Fuzz Introspector, fuzz testing is a promising approach, but its challenge lies in writing effective fuzzers that provide good coverage.

Additionally, once a crash is uncovered via fuzzing, resolving it is no trivial task requiring a thorough analysis of the crash stack trace to identify the root cause, followed by patching the code and verifying the effectiveness of the fix. This is where AI systems may offer assistance, as demonstrated by Google in its tech report on AI-powered patching and more recently with its GITS-Eval benchmark.

One key aspect of patch verification is ensuring the patched program maintains its intended behavior, which goes well beyond checking the program builds and does not crash when fed with the input that originally triggered the crash. To address this concern, AutoPatchBench applies a specific technique to evaluate whether the generated patch produces a program state identical to the ground truth program after the patched function returns.

Along with AutoPatchBench, which includes the full set of 136 samples from ARVO, Meta also released AutoPatchBench-Lite, a smaller subset of only 113 samples where the root cause of the crash is confined to a single function. This makes it better suited for tools in early development or those focused on simpler crash scenarios.

AutoPatchBench is part of CyberSecEval 4, an extensive benchmark suite for assessing vulnerabilities defensive capabilities of LLMs. Meta open sourced its reference implementation for the community to leverage it in open-source projects employing fuzzing or to build better patching models.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article 10 Best Almanac Alternatives in 2025 |
Next Article Next China: YOUIBOT, China’s industrial mobile robot pioneer is going global · TechNode
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

FBI and Europol Disrupt Lumma Stealer Malware Network Linked to 10 Million Infections
Computing
HashiCorp Releases Terraform MCP Server for AI Integration
News
CentOS Now Providing Initial RISC-V Support
Computing
Review: Eve Cam Gets USB-C and Better Night Vision, But Still Has 1080p Resolution
News

You Might also Like

News

HashiCorp Releases Terraform MCP Server for AI Integration

3 Min Read
News

Review: Eve Cam Gets USB-C and Better Night Vision, But Still Has 1080p Resolution

9 Min Read
News

Today's NYT Wordle Hints, Answer and Help for May 23, #1434 – CNET

2 Min Read
News

Digital pension group Penfold raises £3.9m

2 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?