The Agent Sandbox is an open-source Kubernetes controller that provides a declarative API for managing a single, stateful pod with stable identity and persistent storage. It is particularly well suited for creating isolated environments to execute untrusted, LLM-generated code, as well as for running other stateful workloads.
Running ephemeral environments helps mitigate the risks of executing untrusted code directly in a cluster, where it could potentially interfere with other applications or gain access to the underlying cluster node itself.
Agent Sandbox provides a secure and isolated environment for executing untrusted code, such as code generated by large language models (LLMs). Running this type of code directly in a cluster poses security risks, because untrusted code could potentially access or interfere with other apps or the underlying cluster node itself.
The Agent Sandbox achieves isolation using gVisor to create a secure barrier between the application and the cluster node’s OS, and it can also leverage other sandboxing technologies like Kata containers.
The Sandbox custom resource definitions (CRD) provides stable identity, persisted storage that persists across restarts, and lifecycle management features like creation, scheduled deletion, pausing and resuming. Moreover, it supports automatically resuming a sandbox on network reconnection, memory sharing across sandboxes, and a rich API that allows developers to control sandboxes from applications or agents.
In addition to the Sandbox API, the Agent Sandbox provides a templating mechanism that simplifies defining large numbers of similar sandboxes (SandboxTemplate) and instantiating them (SandboxClaim), as well as a pool of pre-warmed sandbox pods to reduce the time required to start a new sandbox.
Besides isolating AI agents, the Agent Sandbox is well suited for hosting single-instance applications such as build agents and small databases that require a stable identity, as well as for running persistent, single-container sessions for tools like Jupyter Notebooks.
OWASP identified Agent too interaction manipulation as one of the top 10 AI agents threats:
Agent Tool Interaction manipulation vulnerabilities occur when AI agents interact with tools which may include critical infrastructure, IoT devices, or sensitive operational systems. This vulnerability class is particularly dangerous as it can lead to tools being manipulated in unintended ways.
According to OWASP, the primary measure to prevent this type of exploit is implementing system isolation, along with access segregation, permission management, command validation, and other safeguards.
Security engineer Yassine Bargach writes on HackerNook that every AI agent needs a sandbox, citing recent incidents and vulnerability disclosures that demonstrate how vulnerabilities in AI agents can lead to remote code exploits (RCEs). Examples include the langflow RCS discovered by Horizon3, a vulnerability in Cursor allowing RCE through auto-execution, a database wipe-out affecting Replit, and others. He also emphasizes that sandboxing may be the best approach to mitigate risks from malicious prompt engineering:
Most of the work that is done to counter these attacks is focused on guardrails, classifiers, and scanners. Supposedly, this should resolve most of the issues. However, the question is: Is it better to spend time looking at each user input to see if it is malicious, or to be able to run anything in a secure environment that doesn’t affect the end-user?
Developers interested in sandboxing their AI agents can also consider alternatives to the Agent Sandbox, including container-use and Lightning AI’s litsandbox.
