By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Magika 1.0: Smarter, Faster File Detection with Rust and AI
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Magika 1.0: Smarter, Faster File Detection with Rust and AI
News

Magika 1.0: Smarter, Faster File Detection with Rust and AI

News Room
Last updated: 2025/12/12 at 4:45 PM
News Room Published 12 December 2025
Share
Magika 1.0: Smarter, Faster File Detection with Rust and AI
SHARE

Google has just released version 1.0 of Magika, a substantial rewrite of its open-source file type detection system. The new version leverages AI to support a broader range of file types and is built in Rust for maximum speed and security.

Magika 1.0 brings the number of supported file types to over 200, up from 100 in the previous Python version.

Google highlights that many of the newly added file types are specialized text-based file types that were previously difficult to detect. These include Dockerfiles, TOML, HCL, Bazel files and many more. Magika 1.0 can also distinguish between source code files written in Swift, Kotlin, TypeScript, Dart, Web Assembly, and Zig (zig). Additionally, it supports file types commonly used in data science, such as Jupyter Notebooks, Numpy arrays, PyTorch models, ONNX files, and others.

In addition to supporting a wider range of file types, Magika 1.0 offer greater granularity, distinguishing similar formats that were previously grouped together, such as TypeScript and JavaScript, C++ and C, TSV and CSV, etc.

To enable the tool to detect this wide diversity of formats, Google engineers created a large dataset of file format samples to train a specialized AI model. The sheer volume of data represented a challenge in itself:

Our training dataset grew to over 3TB when uncompressed, which required an efficient processing pipeline. To handle this, we leveraged our recently released SedPack dataset library. This tool allows us to stream and decompress this large dataset directly to memory during training, bypassing potential I/O bottlenecks and making the process feasible.

At the same time, several formats, including recent, legacy, and specialized formats, were significantly underrepresented. Google addressed this using Gemini to “create a high-quality, synthetic training set by translating existing code and other structured files from one format to another”.

Google says that Magika achieves ~99% average precision and recall, outperforming existing approaches, especially on textual content types.

Another major advantage of Magika 1.0 is its completely rewritten core, which uses Rust to maximize performance and enhance memory safety. The new Rust-based engine is at the heart of Magika’s command line tool, which is able to scan hundreds of files per second on a single CPU:

Magika is able to identify hundreds of files per second on a single core and easily scale to thousands per second on modern multi-core CPUs thanks to the use of the high-performance ONNX Runtime for model inference and Tokio for asynchronous parallel processing,

Based on Google’s benchmarks, this approach makes it possible to process nearly 1,000 files per second on a MacBook Pro (M4). As Reddit user robertknight2 explains, in this workflow:

Rust is used for extracting feature vectors from files using a small subset of the content and driving the scanning process via a tokio-based loop. The ML inference which predicts the file type based on extracted features is however done in C++ by ONNX Runtime (via the ort crate).

The tool incurs a one-time performance cost when initially loading the model, but afterwards it achieves around 5ms per file, with nearly constant inference time independently from file size.

Although some have viewed the adoption of Rust negatively, X user Caleb Maclennan noted that “the security implications of heuristic guessing how to handle inputs makes Rust a good pick”. User Mazzarito added:

When file extensions are missing, or when they cannot be trusted such as during file uploads this type of program is actually quite valuable. File types are simply conventions- but there is no standard way to determine its type other than trying to read it its decoder.

You can install Magika’s command line tool by executing:


curl -LsSf https://securityresearch.google/magika/install.sh | sh

or getting the Python package, which includes the CLI tool, running pipx install magika.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article I Hacked The YouTube Algorithm For More Views I Hacked The YouTube Algorithm For More Views
Next Article F1: The Movie comes to Apple TV before it adds the real stuff F1: The Movie comes to Apple TV before it adds the real stuff
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Why celebrities are loving crypto again in Trump’s second term
Why celebrities are loving crypto again in Trump’s second term
News
Schrödinger’s Bitcoin Market: Is the Cycle Alive or Dead? | HackerNoon
Schrödinger’s Bitcoin Market: Is the Cycle Alive or Dead? | HackerNoon
Computing
The Quantum Collectivist: I Built an AI Bot That Runs on True Vacuum Noise (For Free) | HackerNoon
The Quantum Collectivist: I Built an AI Bot That Runs on True Vacuum Noise (For Free) | HackerNoon
Computing
When To Use Small Language Models Over Large Language Models | HackerNoon
When To Use Small Language Models Over Large Language Models | HackerNoon
Computing

You Might also Like

Why celebrities are loving crypto again in Trump’s second term
News

Why celebrities are loving crypto again in Trump’s second term

14 Min Read
WhatsApp’s biggest market is becoming its toughest test |  News
News

WhatsApp’s biggest market is becoming its toughest test | News

8 Min Read
ServiceNow reportedly in advanced talks to buy Armis for up to  billion –  News
News

ServiceNow reportedly in advanced talks to buy Armis for up to $7 billion – News

4 Min Read
Apple Releases tvOS 26.2 With New TV Profiles, Dedicated Kids Mode
News

Apple Releases tvOS 26.2 With New TV Profiles, Dedicated Kids Mode

6 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?