By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: OpenAI at QCon AI NYC: Fine Tuning the Enterprise
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > OpenAI at QCon AI NYC: Fine Tuning the Enterprise
News

OpenAI at QCon AI NYC: Fine Tuning the Enterprise

News Room
Last updated: 2025/12/17 at 2:39 PM
News Room Published 17 December 2025
Share
OpenAI at QCon AI NYC: Fine Tuning the Enterprise
SHARE

At QCon AI NYC 2025, Will Hang from OpenAI presented an overview of Agent RFT, a reinforcement fine-tuning approach intended to improve the performance of tool-using agents.

Hang described a pragmatic improvement path that starts with prompt and task optimization before changing model weights. Examples included simplifying requirements, adding guardrails to prevent tool misuse, improving tool descriptions, and improving tool outputs so the agent can make better downstream decisions. He argued that these measures are often high leverage but can plateau on tasks that require consistent multi-step reasoning across tool interactions.

He positioned fine-tuning options as a spectrum. Supervised fine-tuning was described as effective when there is a predictable mapping from input to output and the goal is to imitate a consistent style or structure. Preference optimization was described as a method for shifting outputs toward preferred responses using paired comparisons, and OpenAI’s Direct Preference Optimization guide describes it as fine-tuning by comparing model outputs and notes it is currently limited to text inputs and outputs. Reinforcement fine-tuning was emphasized as a better fit for tasks where the model needs to discover strategies over longer trajectories rather than reproduce a single demonstrated completion pattern.

Beware of reward hacking! Resolve any edge cases in your grader. Continuous rewards work better than binary rewards. – Will Hang, OpenAI

Agent RFT was presented as reinforcement fine-tuning adapted to tool-using agents, where the model explores different strategies during training rollouts and receives a learning signal from a grader. OpenAI’s documentation describes the loop as sampling candidate responses, scoring them with a grader you define, and updating the model based on those scores. Hang emphasized credit assignment across the full trajectory so earlier decisions, including tool selection and tool-call structure, can be reinforced or discouraged based on downstream outcomes. He described an agent as a system that can interact with the outside world through tools, not only respond to a user prompt.

Hang described tool examples including terminals for coding agents, internal business systems for customer support, and document search or retrieval endpoints. He emphasized that tool outputs flow back into the same context window, so tool calls, tool outputs, reasoning tokens, and the final response form a single multi-step trajectory. He also said that graders become a core artifact in the workflow. The session described multiple grading styles, including simple matchers, model-based judges, code-based graders, endpoint graders, and combinations of graders to jointly optimize accuracy and latency.

The session also focused on operational properties that are not captured by answer accuracy alone. Hang described using Agent RFT to reduce unnecessary tool calls, enforce tool-call budgets, and reduce the long tail of very long trajectories that can create unpredictable latency and degraded user experience. Slides referenced training traces where reasoning tokens and tool calls decreased over training, consistent with the idea that the agent can learn to use fewer steps to reach similar or better task outcomes.

Wenjie Zi then picked up the latter part of the presentation with use case presentations and platform setup details, including a finance-oriented example where a model must locate relevant content across a large document corpus under a constrained tool-call budget. In that setup, the agent uses search, listing, and file-reading tools exposed behind endpoints, then a grader scores the final answer. She highlighted using a model-based grader even for numeric answers to reduce false negatives caused by superficial formatting differences, units, or small variations.

Zi also described broader examples across agentic coding and other domains, focusing on environments with many tools, isolated execution contexts, and reward designs that balance correctness with process and efficiency. Reported outcomes emphasized improved planning, reduced long trajectory tails, and in some cases a shift toward parallel tool calls to reduce sequential turns.

Developers who want to learn more can review OpenAI’s reinforcement fine-tuning and model optimization documentation and watch infoq.com in the coming weeks for video of the presentation to become available.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Jaguar’s Type 00 Still Looks Odd, but This Is One Seriously Quick EV Jaguar’s Type 00 Still Looks Odd, but This Is One Seriously Quick EV
Next Article Myanmar declares a “zero tolerance” policy for cyberscams. But the fraud goes on
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Beyond antivirus: Why anti-scam tech is now your digital must-have
Beyond antivirus: Why anti-scam tech is now your digital must-have
News
MaGGIe: Achieving Temporal Consistency in Video Instance Matting | HackerNoon
MaGGIe: Achieving Temporal Consistency in Video Instance Matting | HackerNoon
Computing
Jared Isaacman confirmed as next head of NASA |  News
Jared Isaacman confirmed as next head of NASA | News
News
Democratic lawmakers are investigating data centers’ impact on electricity costs 
Democratic lawmakers are investigating data centers’ impact on electricity costs 
News

You Might also Like

Beyond antivirus: Why anti-scam tech is now your digital must-have
News

Beyond antivirus: Why anti-scam tech is now your digital must-have

7 Min Read
Jared Isaacman confirmed as next head of NASA |  News
News

Jared Isaacman confirmed as next head of NASA | News

1 Min Read
Democratic lawmakers are investigating data centers’ impact on electricity costs 
News

Democratic lawmakers are investigating data centers’ impact on electricity costs 

4 Min Read
NASA Lost Contact With Its MAVEN Mission – Here’s What We Know – BGR
News

NASA Lost Contact With Its MAVEN Mission – Here’s What We Know – BGR

3 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?