By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: AI Is Learning to Do the Jobs of Doctors, Lawyers, and Consultants
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Software > AI Is Learning to Do the Jobs of Doctors, Lawyers, and Consultants
Software

AI Is Learning to Do the Jobs of Doctors, Lawyers, and Consultants

News Room
Last updated: 2026/01/06 at 3:47 PM
News Room Published 6 January 2026
Share
AI Is Learning to Do the Jobs of Doctors, Lawyers, and Consultants
SHARE

The tasks resemble those that lawyers, doctors, financial analysts, and management consultants solve for a living. One asks for a diagnosis of a six-year-old patient based on nine pieces of multimedia evidence; another asks for legal advice on a musician’s estate; a third calls for a valuation of part of a healthcare technology company.

Mercor, which claims to supply “expert data” to every top AI company, says that it spent more than $500,000 to develop 200 tasks that test whether AIs “can perform knowledge work with high economic value” across law, medicine, finance, and management consulting. The resulting AI Productivity Index (APEX), published Wednesday, lists among its co-authors a former global managing director of McKinsey, a former dean of Harvard Business School, and a Harvard Law School professor, who advised on the design and scope of the tasks in their respective domains, according to Mercor. APEX is “focused on going very deep,” says Brendan Foody, the company’s 22-year-old CEO. “How do we get very comprehensive about what it means to be a consultant or a banker or a doctor or lawyer?”

(time-brightcove not-tgx=”true”)

To create the tasks, Mercor contracted white-collar professionals whose former employers include top banks (Goldman Sachs, JPMorgan), consulting firms (McKinsey, Boston Consulting Group), law firms (Latham & Watkins) and hospitals (Mount Sinai). They average 7.25 years of professional experience, and their pay at Mercor is competitive with their previous, highly prestigious employers. Mercor’s website advertises an average hourly rate of $81 per hour, reaching over $200 per hour—equivalent to an annual salary of about $400,000—for “Senior Domain Experts,” who require at least four years’ professional experience to apply.

“It’s hard to imagine a better hourly job from a pay perspective,” says Matt Seck, a former investment banking analyst at Bank of America, who is contracted by Mercor to write finance tasks similar to those included in the paper.

Benchmarks have long been used to assess AI capability, but directly quantifying AI models’ ability to do economically useful work represents a “paradigm shift,” says Osvald Nitski, one of the paper’s authors. On Mercor’s benchmark, “getting 100% would mean that you’d basically have an analyst or an associate in a box that you could go and send tasks to, and then they deliver it to the requirements of a partner, or an MD, or whoever would be grading the work of that person,” says Nitski.

The models aren’t there yet, but they are improving fast. OpenAI’s GPT-4o, released in May 2024, scored 35.9% on the benchmark. GPT-5, released just over a year later, achieved 64.2%—the top score on the benchmark. Getting 64.2% on the benchmark doesn’t mean that GPT-5 is delivering 64.2% of the value of a human worker—work that doesn’t hit 100% “might be effectively useless,” write the paper authors. GPT-5 only got full marks in two out of the 200 tasks—one in law and one in investment banking—which “primarily involve basic reasoning, simple calculations, and a lot of basic information searching,” according to Mercor.

Even if a model hits 100% on Mercor’s benchmark, it would probably make a poor substitute for human professionals. The tasks in Mercor’s benchmark focus on “well scoped deliverables,” such as making diagnoses or building financial models, rather than more open-ended tasks which might admit multiple right answers. This requires that the task descriptions include numerous assumptions needed to ensure that the desired output is well specified. The AIs’ outputs are entirely text-based, meaning that the benchmark doesn’t test AIs’ ability to use a computer, the way that a human worker would. (Mercor says that future versions of APEX will address these limitations.) And drafting the lengthy prompts needed for models to complete the tasks “would be more tedious than just doing it yourself,” says Seck.

Still, there are signs that AI models are becoming competitive with humans. Another benchmark, published Thursday, Sept. 25, by OpenAI, showed that expert human evaluators preferred an AI’s work to human work 47.6% of the time on 220 tasks including designing a sales brochure for a property and assessing images of a skin lesion. OpenAI also found that the performance of its models has increased substantially in a short space of time, more than doubling in their “win rate” against humans between June 2024 and Sept. 2025.

As model capability has grown, so has the complexity of the tasks that they’re being tested on and the human skill needed to create sufficiently challenging tasks. Earlier tests measured relatively abstract capabilities on reasoning puzzles and exam-style questions. Benchmarks before the 2022 release of ChatGPT, often sourced data from crowdworker services, which paid workers a few dollars an hour. By 2023, Ph.D. students were being asked to create challenging multiple-choice questions in biology, physics and chemistry. In September, xAI reportedly laid off 500 of its “generalist” data workers as part of an “expansion and prioritization” of the company’s “specialist” data workers. To be sure, low-paid data workers still contribute to the development of AI models, but the upper bound of skill and compensation needed to develop AI benchmarks is increasing rapidly.

Directly measuring the utility of AI models on economically valuable tasks is “very hard to pull off,” says Nitski. The success criteria in domains such as finance and consulting are harder to define than, for example, in software engineering. Even with the perfect criteria in hand, marking an AI’s output on a large scale is harder than in software engineering, where automated tests can check whether a piece of code runs correctly. This explains, in part, why tests aiming to measure the real-world utility of AI models have existed for software engineering since at least 2023, but have lagged in other white-collar domains. However, as AIs have improved, they have helped solve the problem of grading complex tasks. The success criteria for Mercor’s tasks are written by human experts, but the marking is done by AIs, which Mercor says agreed with human graders 89% of the time, helping to scale the evaluations.

Developing benchmarks isn’t just about knowing how good models are. In AI, as in business, “what gets measured gets done”—good tests often precipitate AI progress on those tests. “It’s ultimately the same data type for both evaluation and training,” says Foody. Evaluating performance in games such as Go is straightforward; AI was beating Go masters by 2016. In 2023, benchmarks began evaluating AIs on real-world tasks in software engineering. Two years later, the labor statistics for junior programmers look dubious.

“AI got its Ph.D.,” says Foody. “Now it’s starting to enter the job market.”

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article The Gold Standard is Back But Making It Work Digitally Is Harder Than It Sounds | HackerNoon The Gold Standard is Back But Making It Work Digitally Is Harder Than It Sounds | HackerNoon
Next Article CES 2026: LG 6K Monitor, New Qi2.2 Chargers and AI Products Everywhere CES 2026: LG 6K Monitor, New Qi2.2 Chargers and AI Products Everywhere
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Total phone hijack: New Hugging Face malware grants hackers full remote access
Total phone hijack: New Hugging Face malware grants hackers full remote access
News
OpenAI to retire GPT-4o. AI companion community is not OK.
OpenAI to retire GPT-4o. AI companion community is not OK.
News
Workit Health Sets National Standard in Telehealth Addiction Care | HackerNoon
Workit Health Sets National Standard in Telehealth Addiction Care | HackerNoon
Computing
Christopher Lambert’s Cult Sci-Fi Prison Movie Is Streaming For Free – BGR
Christopher Lambert’s Cult Sci-Fi Prison Movie Is Streaming For Free – BGR
News

You Might also Like

Google Pixel 10 Pro Fold review: dust-resistant and more durable foldable phone
Software

Google Pixel 10 Pro Fold review: dust-resistant and more durable foldable phone

12 Min Read
Is it time to break up with US tech? – The Latest
Software

Is it time to break up with US tech? – The Latest

1 Min Read
Why Waymo’s London Launch Matters
Software

Why Waymo’s London Launch Matters

9 Min Read
Apple’s iPhone sales broke a new quarterly record during the holidays, despite AI blunders
Software

Apple’s iPhone sales broke a new quarterly record during the holidays, despite AI blunders

5 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?