By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Running Ray at Scale on AKS
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > News > Running Ray at Scale on AKS
News

Running Ray at Scale on AKS

News Room
Last updated: 2026/03/12 at 5:04 AM
News Room Published 12 March 2026
Share
Running Ray at Scale on AKS
SHARE

The Azure Kubernetes Service (AKS) team at Microsoft has shared guidance for running Anyscale’s managed Ray service at scale. They focus on three key issues: GPU capacity limits, scattered ML storage, and problems with credential expiry.

This post expands on a previous overview of open-source KubeRay on AKS. Now, it highlights Anyscale’s improved runtime, previously known as RayTurbo. This runtime offers smart autoscaling, improved monitoring, and fault-tolerant training features. They are all based on the open-source Ray framework.

Ray is a Python-native distributed compute framework designed to scale AI and ML workloads from a single laptop to clusters spanning thousands of nodes. Anyscale’s managed platform enhances Ray with features for production use. The new guidance shows a partnership between Microsoft and Anyscale to improve Azure integration.

GPU scarcity is one of the most significant operational challenges in large-scale ML. High-demand accelerators, such as NVIDIA GPUs, often have quota and availability issues in Azure regions. This can delay cluster setup and job scheduling.

Microsoft’s proposed solution uses a multi-cluster, multi-region setup. Distributing Ray clusters across different AKS instances in various Azure regions allows teams to: Aggregate GPU quota beyond regional limits, automatically reroute workloads during outages or capacity issues and extend the compute pool to on-premises systems or other cloud providers using Azure Arc with AKS.

The Anyscale console shows these registered clusters in one view. Anyscale Workspaces manages workload scheduling using available capacity, either manually or automatically. You can add new regions by creating a cloud_resource.yaml manifest. Then, apply it using the Anyscale CLI. This configuration-first approach makes multi-region expansion easy to manage.

A common issue in ML operations is transferring training data, model checkpoints, and artifacts between pipeline stages. This includes moving them from pre-training to fine-tuning and then to inference. The guidance addresses this with Azure BlobFuse2, which mounts Azure Blob Storage into Ray worker pods as a POSIX-compatible filesystem.

From Ray’s perspective, the mount point is just a local directory. Tasks and actors read datasets and write checkpoints using standard file I/O. BlobFuse2 then saves data to Azure Blob Storage. This makes data available across pods and node pools. Local caching prevents GPU stalls during large training runs, and because data is decoupled from compute, Ray clusters can scale up and down without data loss.

To set up, enable the blob CSI driver when creating the cluster. Then, define a StorageClass that uses workload identity for authentication. Finally, create a PersistentVolumeClaim with ReadWriteMany access. This allows multiple Ray workers on different nodes to access shared data at the same time. This approach makes Ray code portable. It also adds the durability and scalability of Azure-native storage to the infrastructure layer.

Another important topic is the authentication reliability. Anyscale and Azure used to integrate with CLI tokens or API keys that expired every 30 days. This meant manual rotation was needed, which risked service disruption.

The new method uses Microsoft Entra service principals and AKS workload identity. It issues short-lived tokens automatically. The Anyscale Kubernetes Operator pod uses a user-assigned managed identity. This identity requests an access token for the Anyscale service principal from Entra ID. Azure handles token refresh transparently, meaning no long-lived credentials are stored in the cluster and no manual rotation is required.

The authors say this is especially important in multi-cluster environments. Here, managing credentials by hand across many clusters adds to the operational burden. The workload identity model provides fine-grained RBAC for Azure resource access and produces full audit trails through Azure Activity Logs as a byproduct.

The Anyscale on AKS integration is currently in private preview. Teams wanting access should reach out to their Microsoft account team. They can also file a request on the AKS GitHub repository. Include details about Ray workloads and target regions. You can check out example setups and workloads for fine-tuning with DeepSpeed and LLaMA-Factory in the Azure-Samples/aks-anyscale repository on GitHub. This also includes LLM inference endpoints.

Microsoft is not the sole entity making this wager. AWS announced its Anyscale partnership at Ray Summit 2024. This connects EKS clusters to the RayTurbo runtime. It highlights hardware flexibility by combining NVIDIA GPUs with AWS’s Trainium and Inferentia accelerators. Additionally, SageMaker HyperPod is now a deployment target for long-running training jobs that need node-level resilience. Google Cloud leads in open-source contributions.

The GKE team worked with Anyscale engineers to upstream label-based scheduling into Ray v2.49. They also created a ray.util.tpu layer to reduce resource fragmentation in multi-chip TPU setups. Additionally, they added Dynamic Resource Allocation for the new GB200-backed instances.

All three hyperscalers have chosen the same managed Ray operator, and each has added their infrastructure. This shows the industry prefers Kubernetes-plus-Ray for AI workloads. Now, the competition is less about the runtime and more about which cloud can streamline the surrounding infrastructure best.

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Six Android Malware Families Target Pix Payments, Banking Apps, and Crypto Wallets Six Android Malware Families Target Pix Payments, Banking Apps, and Crypto Wallets
Next Article Clippy: Here’s What You Should Know | HackerNoon Clippy: Here’s What You Should Know | HackerNoon
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

These skins give your MacBook a retro look – and we love it
These skins give your MacBook a retro look – and we love it
Gadget
Best coffee machine deal: Save 0 on Breville Barista Pro
Best coffee machine deal: Save $200 on Breville Barista Pro
News
Mesa 26.0.2 Has Plenty Of Graphics Driver Fixes From Intel & RADV Vulkan To Old R300g
Mesa 26.0.2 Has Plenty Of Graphics Driver Fixes From Intel & RADV Vulkan To Old R300g
Computing
Expect to pay 16-inch MacBook Pro money for an iPhone Fold with 1TB storage
Expect to pay 16-inch MacBook Pro money for an iPhone Fold with 1TB storage
News

You Might also Like

Best coffee machine deal: Save 0 on Breville Barista Pro
News

Best coffee machine deal: Save $200 on Breville Barista Pro

3 Min Read
Expect to pay 16-inch MacBook Pro money for an iPhone Fold with 1TB storage
News

Expect to pay 16-inch MacBook Pro money for an iPhone Fold with 1TB storage

1 Min Read
Apple Maps Might Be Cooked After This Huge Google Maps Update – BGR
News

Apple Maps Might Be Cooked After This Huge Google Maps Update – BGR

5 Min Read
Is the AYANEO Pocket DS spying on you? Here’s why a user found over 1,000 hidden screenshots (Updated)
News

Is the AYANEO Pocket DS spying on you? Here’s why a user found over 1,000 hidden screenshots (Updated)

3 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?