By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Can a Powerful AI Model Be Built on a Budget? | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Can a Powerful AI Model Be Built on a Budget? | HackerNoon
Computing

Can a Powerful AI Model Be Built on a Budget? | HackerNoon

News Room
Last updated: 2025/02/26 at 11:05 PM
News Room Published 26 February 2025
Share
SHARE

A few weeks ago, DeepSeek’s announcement of a super-capable R1 model that combines high performance with low resource costs thrilled the entire tech community as well as the US stock market. R1 model is part of a growing trend—AI models trained using a technique called distillation. Essentially, distillation is an approach to training a smaller, faster AI by letting it learn from a bigger, smarter AI. Thus, the smaller AI keeps most of its intelligence but runs more efficiently. However, we won’t be focusing on this technique here.

OpenAI and similar companies are trying to protect their intellectual property, limiting how their models are used to train competitors. Companies may take countermeasures, such as banning certain accounts/IP addresses, reducing model request limits, and legally prohibiting the use of the model to create competitors.

Can a powerful model be built on a budget?

Recent experiment conducted by researchers from Stanford and the University of Washington demonstrated it is indeed possible.

TLDR: Researchers created a new s1 model based on Alibaba’s Qwen2.5 and paid $50 for tokens to Gemini 2.0 Flash Thinking (free with limits), 16 NVIDIA H100 GPUs, and in 26 minutes got a competitor to the o1-preview model that answers math questions 27% better, the paper says.

The s1 model demonstrates how AI systems can be trained efficiently through strategic data curation, supervised fine-tuning (SFT), and budget forcing. Rather than depending on costly, large-scale datasets, the researchers developed s1K, a compact yet high-quality dataset containing 1,000 reasoning questions. which consists of 1,000 carefully curated questions paired with reasoning traces and answers distilled from Gemini Thinking Experimental model, enabling the capture of complex problem-solving patterns without requiring manual annotation. By fine-tuning Alibaba’s Qwen2.5-32B-Instruct on this dataset, they built a highly capable model at a fraction of the usual cost.

The core of their training method was supervised fine-tuning, where the model directly learned from Gemini’s reasoning traces instead of following the conventional teacher-student distillation approach. This 26-minute fine-tuning process on 16 NVIDIA H100 GPUs cost less than $50, proving that fine-tuning a strong open-weight model with well-curated data can lead to significant performance gains.

To optimize inference efficiency, the researchers implemented budget forcing, a technique that regulates how long the model spends on reasoning. If a response exceeded a certain length, an end-of-thinking token signaled the model to stop and deliver an answer. Conversely, incorporating the word “Wait” prompted the model to extend its reasoning, leading to more accurate answers. This simple yet powerful adjustment boosted the model’s accuracy on American Invitational Mathematics Examination (AIME) 2024 from 50% to 57%.

The s1-32B model surpassed OpenAI’s o1-preview model by 27% on competitive math benchmarks, demonstrating that small, well-trained models can compete with those built using vast computational resources. This research challenges the notion that state-of-the-art AI requires billion-dollar training pipelines. Instead, it underscores a future where strategic dataset design, fine-tuning, and inference optimization can democratize AI model training.

If someone wants to run this process independently, the price of one H100 GPU is $30,000, making a total GPU cost of $480,000. That’s a $500,000 investment versus the billions spent by major AI players—for nearly the same results.

New LLM-based products are just around the corner

If AI can be trained this efficiently, what’s stopping individuals or small teams from building their models? With the right expertise and a few hundred dollars, crafting a custom AI could soon be as accessible as, say, getting a dog. 🐶

Models like Mistral, Qwen, and Llama are getting closer to proprietary ones like GPT, reducing the big tech dominance. Distillation allows teams to train high-quality models using API access instead of building from scratch – at a fraction of the cost. As a bonus, we can reduce dependency on a single provider.

If this trend continues, AI might evolve the cloud computing model: big companies still dominate the infrastructure, but smaller players gain power by optimising and customising models for specific needs, cost efficiency, and control.

The barriers to AI development are falling. What happens when anyone can train a high-performing AI assistant for the price of a laptop?

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article 5 movies worth watching before they leave Netflix in March 2025
Next Article Level up your wristwear with a Samsung Galaxy Watch Ultra Bespoke Edition for its lowest price yet
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Secret ancient papal palace where Popes lived before the Vatican is unearthed
News
Olympic champion Zheng Qinwen reportedly set to become Audi brand ambassador · TechNode
Computing
Today's NYT Strands Hints, Answer and Help for May 11 #434 – CNET
News
Investigating Ring’s Crime Alert System for Police Departments: How We Did It | HackerNoon
Computing

You Might also Like

Computing

Olympic champion Zheng Qinwen reportedly set to become Audi brand ambassador · TechNode

1 Min Read
Computing

Investigating Ring’s Crime Alert System for Police Departments: How We Did It | HackerNoon

29 Min Read
Computing

MediaTek launches Dimensity 9300+ chipset and AI Program · TechNode

5 Min Read
Computing

Code Smell 299 – How to Fix Overloaded Test Setups | HackerNoon

8 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?