By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Magika 1.0: Smarter, Faster File Detection with Rust and AI
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Magika 1.0: Smarter, Faster File Detection with Rust and AI
News

Magika 1.0: Smarter, Faster File Detection with Rust and AI

News Room
Last updated: 2025/12/12 at 4:45 PM
News Room Published 12 December 2025
Share
Magika 1.0: Smarter, Faster File Detection with Rust and AI
SHARE

Google has just released version 1.0 of Magika, a substantial rewrite of its open-source file type detection system. The new version leverages AI to support a broader range of file types and is built in Rust for maximum speed and security.

Magika 1.0 brings the number of supported file types to over 200, up from 100 in the previous Python version.

Google highlights that many of the newly added file types are specialized text-based file types that were previously difficult to detect. These include Dockerfiles, TOML, HCL, Bazel files and many more. Magika 1.0 can also distinguish between source code files written in Swift, Kotlin, TypeScript, Dart, Web Assembly, and Zig (zig). Additionally, it supports file types commonly used in data science, such as Jupyter Notebooks, Numpy arrays, PyTorch models, ONNX files, and others.

In addition to supporting a wider range of file types, Magika 1.0 offer greater granularity, distinguishing similar formats that were previously grouped together, such as TypeScript and JavaScript, C++ and C, TSV and CSV, etc.

To enable the tool to detect this wide diversity of formats, Google engineers created a large dataset of file format samples to train a specialized AI model. The sheer volume of data represented a challenge in itself:

Our training dataset grew to over 3TB when uncompressed, which required an efficient processing pipeline. To handle this, we leveraged our recently released SedPack dataset library. This tool allows us to stream and decompress this large dataset directly to memory during training, bypassing potential I/O bottlenecks and making the process feasible.

At the same time, several formats, including recent, legacy, and specialized formats, were significantly underrepresented. Google addressed this using Gemini to “create a high-quality, synthetic training set by translating existing code and other structured files from one format to another”.

Google says that Magika achieves ~99% average precision and recall, outperforming existing approaches, especially on textual content types.

Another major advantage of Magika 1.0 is its completely rewritten core, which uses Rust to maximize performance and enhance memory safety. The new Rust-based engine is at the heart of Magika’s command line tool, which is able to scan hundreds of files per second on a single CPU:

Magika is able to identify hundreds of files per second on a single core and easily scale to thousands per second on modern multi-core CPUs thanks to the use of the high-performance ONNX Runtime for model inference and Tokio for asynchronous parallel processing,

Based on Google’s benchmarks, this approach makes it possible to process nearly 1,000 files per second on a MacBook Pro (M4). As Reddit user robertknight2 explains, in this workflow:

Rust is used for extracting feature vectors from files using a small subset of the content and driving the scanning process via a tokio-based loop. The ML inference which predicts the file type based on extracted features is however done in C++ by ONNX Runtime (via the ort crate).

The tool incurs a one-time performance cost when initially loading the model, but afterwards it achieves around 5ms per file, with nearly constant inference time independently from file size.

Although some have viewed the adoption of Rust negatively, X user Caleb Maclennan noted that “the security implications of heuristic guessing how to handle inputs makes Rust a good pick”. User Mazzarito added:

When file extensions are missing, or when they cannot be trusted such as during file uploads this type of program is actually quite valuable. File types are simply conventions- but there is no standard way to determine its type other than trying to read it its decoder.

You can install Magika’s command line tool by executing:


curl -LsSf https://securityresearch.google/magika/install.sh | sh

or getting the Python package, which includes the CLI tool, running pipx install magika.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article I Hacked The YouTube Algorithm For More Views I Hacked The YouTube Algorithm For More Views
Next Article F1: The Movie comes to Apple TV before it adds the real stuff F1: The Movie comes to Apple TV before it adds the real stuff
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Rivian Refuses to Support Apple CarPlay, But Its New Workaround Looks Promising
Rivian Refuses to Support Apple CarPlay, But Its New Workaround Looks Promising
News
Community Commerce Brands Can Drive Authentic Engagement: Here’s How | HackerNoon
Community Commerce Brands Can Drive Authentic Engagement: Here’s How | HackerNoon
Computing
GPT-5.2 vs Grok 4: Comparing benchmarks, price, and features
GPT-5.2 vs Grok 4: Comparing benchmarks, price, and features
News
When the room reads you: Student prints resume on T-shirt in bid to attract employer interest
When the room reads you: Student prints resume on T-shirt in bid to attract employer interest
Computing

You Might also Like

Rivian Refuses to Support Apple CarPlay, But Its New Workaround Looks Promising
News

Rivian Refuses to Support Apple CarPlay, But Its New Workaround Looks Promising

8 Min Read
GPT-5.2 vs Grok 4: Comparing benchmarks, price, and features
News

GPT-5.2 vs Grok 4: Comparing benchmarks, price, and features

6 Min Read
Star Wars, Tomb Raider and a big night for Expedition 33 – what you need to know from The Game Awards
News

Star Wars, Tomb Raider and a big night for Expedition 33 – what you need to know from The Game Awards

5 Min Read
Do Different Ethernet Cables Really Matter? How You’re Slowing Down Your Internet – BGR
News

Do Different Ethernet Cables Really Matter? How You’re Slowing Down Your Internet – BGR

4 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?