By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Apple Shares Details on Upcoming AI Foundation Models for iOS 26
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Apple Shares Details on Upcoming AI Foundation Models for iOS 26
News

Apple Shares Details on Upcoming AI Foundation Models for iOS 26

News Room
Last updated: 2025/07/28 at 5:10 PM
News Room Published 28 July 2025
Share
SHARE

In a recent tech report, Apple has provided more details on the performance and characteristics of the new Apple Intelligence Foundation Models that will be part of iOS 26, as announced at the latest WWDC 2025.

Apple foundation models include a 3B-parameter version optimized to run on Apple Silicon-powered devices, as well as a larger model designed to run on Apple’s Private Cloud Compute platform. Apple emphasizes that both models were trained using responsible web crawling, licensed corpora, and synthetic data. A further training stage included supervised fine-tuning and reinforcement learning.

According to Apple, the 3B parameter model is designed for efficiency, low-latency, and minimal resource usage. The larger model, by contrast, aims to deliver high accuracy and scalability. Apple notes that, given its reduced size, the on-device model isn’t intended to implement a world-knowledge chat, but can support advanced capabilities such as text extraction, summarization, image understanding, and reasoning with just a few lines of code.

On the architecture side, the 3B-parameter model uses KV-cache sharing, a technique used to reduce the time-to-first-token, and is compressed using 2-bit quantization-aware training. Sharing the key-value caches between the two blocks the model is divided into enables a reduction of memory usage by 37.5%, says Apple. Quantization-aware training is a technique that allows to recover quality by simulating the effect of 2-bit quantization at training-time:

Unlike the conventional quantization scheme which derives the scale from weights W, we introduce a learnable scaling factor f that adaptively fine-tunes the quantization range for each weight tensor.

For the server-side model, Apple used a novel Parallel-Track Mixture-of-Experts (PT-MoE) transformer that combines track parallelism, sparse computation, and interleaved global–local attention. comprises multiple transformers that process tokens independently, each with its own set of MoE layers. Apple says that the combination of parallel token processing with the MoE approach delivers reduced synchronization overhead and allows the model to scale more efficiently.

To evaluate its foundation models, Apple researchers relied on human graders to assess each model’s ability to produce a native-sounding response. The results show that the on-device model performs well against Qwen-2.5-3B across all supported languages, and remains competitive with larger models like Qwen-3-4B and Gemma-3-4B in English. The larger server-side model performs favorably against Llama-4-Scout, but falls short compared to much larger models such as Qwen-3-235B and GPT-4o.

For image understanding, Apple followed the same approach by asking humans to evaluate image-question pairs, including text-rich images like infographics:

We found that Apple’s on-device model performs favorably against the larger InternVL and Qwen and competitively against Gemma, and our server model outperforms Qwen-2.5-VL, at less than half the inference FLOPS, but is behind Llama-4-Scout and GPT–4o.

As a final note, Apple researchers emphasizes their approach to Responsible AI, which includes enforcing a baseline of safety and guardrails to mitigate harmful model input and output. These safeguards were also evaluated through a combination of human assessment and auto-grading. Apple has also published educational resources for developers to apply Responsible AI principles.

As mentioned, Apple’s AI foundation models require XCode 26 and iOS 26 and are currently available as beta software.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article GNU C Library 2.42 Released With SFrame Support, Newer Intel CPU Detection
Next Article ‘Such a fun ride’: Seattle family sells popular card game ‘Taco vs. Burrito’ after more than 1M sales
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

European electronic commerce grows with AI and automation
Mobile
Sky expert reveals little-known buttons to skip ads & unlock secret ‘night mode’
News
Botched by the AI bot: Grok wrecks Carl Pei's giveaway and people huff and puff
News
PyPI Warns of Ongoing Phishing Campaign Using Fake Verification Emails and Lookalike Domain
Computing

You Might also Like

News

Sky expert reveals little-known buttons to skip ads & unlock secret ‘night mode’

3 Min Read
News

Botched by the AI bot: Grok wrecks Carl Pei's giveaway and people huff and puff

4 Min Read
News

Yelp is creating its own AI videos about restaurants

2 Min Read
News

Stream and Batch Processing Convergence in Apache Flink

43 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?