By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: TornadoVM 2.0 Brings Automatic GPU Acceleration and LLM support to Java
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > TornadoVM 2.0 Brings Automatic GPU Acceleration and LLM support to Java
News

TornadoVM 2.0 Brings Automatic GPU Acceleration and LLM support to Java

News Room
Last updated: 2025/12/17 at 1:24 AM
News Room Published 17 December 2025
Share
TornadoVM 2.0 Brings Automatic GPU Acceleration and LLM support to Java
SHARE

The TornadoVM project recently reached version 2.0, a major milestone for the open-source project that aims to provide a heterogeneous hardware runtime for Java. This release is likely to be of particular interest to teams developing LLM solutions on the JVM.

The project automatically accelerates Java programs on multi-core CPUs, GPUs, and FPGAs. It does not replace existing JVMs, but instead adds the capability of offloading Java code to the backends, handling memory management between Java and hardware accelerators, and running the compute-kernels. This capability provides a key component of modern cloud and ML workloads.

InfoQ has previously covered the project in 2020 and 2022.

TornadoVM compiles Java bytecode at runtime (by acting as a JIT compiler) to one of three backends: OpenCL C, NVIDIA CUDA PTX, and SPIR-V binary. Developers can choose which backends to install and run depending on their specific systems.

Note that not every sort of Java computation is amenable to being offloaded to TornadoVM. For example, workloads with for-loops that do not have dependencies between iterations are very good candidates, as these allow computation in parallel.

In particular, matrix-based applications such as machine learning and deep learning are good candidates. Other good examples of this pattern are physics simulations (e.g., N-body particle computation), financial applications such as Black-Scholes, and a range of applications in computer vision, computational photography, natural language processing, and signal processing.

TornadoVM offers two complementary ways to express parallelism: the Loop Parallel API, which uses Java annotations such as @Parallel and @Reduce to parallelize loops, and the Kernel API, which uses a KernelContext for explicit GPU-style programming (with concepts such as thread IDs, local memory, barriers available), and which is similar to CUDA/OpenCL/SYCL.

The Loop Parallel API can be as simple as adding a type annotation:


public static void vectorMul(FloatArray a, FloatArray b, FloatArray result) {
    for (@Parallel int i = 0; i < result.getSize(); i++) {
        result.set(i, a.get(i) * b.get(i));
    }
}

Whereas the Kernel Context style explicitly builds a TaskGraph as a Java object, like this:


var taskGraph = new TaskGraph("multiply")
      .transferToDevice(DataTransferMode.FIRST_EXECUTION, a, b)
      .task("vectorMul", Example::vectorMul, a, b, result)
      .transferToHost(DataTransferMode.EVERY_EXECUTION, result);

var snapshot = taskGraph.snapshot();
new TornadoExecutionPlan(snapshot).execute();

The team is also shipping a complete LLM inference library built with it in pure Java that provides LLM inference on GPUs, all in Java without external dependencies.

The just-shipped release v0.3.0 of GPULlama3.java brings significant performance and usability improvements.

  • ~30% performance boost on NVIDIA GPUs (tokens/sec)
  • Optimized FP16 and Q8 kernel generation.
  • Easier setup thanks to the new TornadoVM SDKs — no complex GPU configuration.
  • Run across NVIDIA PTX, OpenCL, and early Apple Silicon support.
  • Enhanced Quarkus support
  • Integration with LangChain4j

GPULlama3.java currently supports several FP16 (16-bit floating point) and 8-bit quantized models, in the single-digit billions of parameters range:

  • Llama 3.2 (1B) – FP16
  • Llama 3.2 (3B) – FP16
  • Llama 3 (8B) – FP16
  • Mistral (7B) – FP16
  • Qwen3 (0.6B) – FP16
  • Qwen3 (1.7B) – FP16
  • Qwen3 (4B) – FP16
  • Qwen3 (8B) – FP16
  • Phi-3-mini-4k – FP16
  • Qwen2.5 (0.5B)
  • Qwen2.5 (1.5B)
  • DeepSeek-R1-Distill-Qwen (1.5B)

Depending on the selected model, a different execution plan will be built, corresponding to the relevant model architecture.

The project is led by the Beehive lab, which is part of the Advanced Processor Technologies Group at the University of Manchester, specializing in the codesign of combined hardware / software solutions.

The team has also developed TornadoInsight, a plugin for IntelliJ IDEA that enhances the developer experience when working with TornadoVM.

Future work on the roadmap includes making TornadoVM available on SDKman and moving the JNI components in the codebase to use the new FFM API instead.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article 👨🏿‍🚀 Daily – The 50 most consequential people in African tech | 👨🏿‍🚀 Daily – The 50 most consequential people in African tech |
Next Article From Oracle to Broadcom, concerns about artificial intelligence stocks are starting to mount From Oracle to Broadcom, concerns about artificial intelligence stocks are starting to mount
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Today&apos;s NYT Strands Hints, Answer and Help for Dec. 17 #654 – CNET
Today's NYT Strands Hints, Answer and Help for Dec. 17 #654 – CNET
News
SK Hynix to produce HBM4 with TSMC’s 3nm process, prototype in March 2025 · TechNode
SK Hynix to produce HBM4 with TSMC’s 3nm process, prototype in March 2025 · TechNode
Computing
DJI Osmo Nano
DJI Osmo Nano
Gadget
Apple has made it much easier to replace the battery in a 14-inch MacBook Pro
Apple has made it much easier to replace the battery in a 14-inch MacBook Pro
News

You Might also Like

Today&apos;s NYT Strands Hints, Answer and Help for Dec. 17 #654 – CNET
News

Today's NYT Strands Hints, Answer and Help for Dec. 17 #654 – CNET

3 Min Read
Apple has made it much easier to replace the battery in a 14-inch MacBook Pro
News

Apple has made it much easier to replace the battery in a 14-inch MacBook Pro

1 Min Read
Texas Sues Top TV Makers for ‘Secretly Recording’ What You Watch
News

Texas Sues Top TV Makers for ‘Secretly Recording’ What You Watch

7 Min Read
Show your PC some love with this  Microsoft Office license
News

Show your PC some love with this $30 Microsoft Office license

3 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?