OpenAI's Agents SDK Gets Sandbox Execution: Why This Matters for Enterprise AI
OpenAI has released significant updates to its Agents SDK that make it practical to deploy autonomous AI agents in enterprise environments. The new capabilities include native sandbox execution, which isolates agent operations in controlled computing environments, and an improved orchestration layer called a "long-horizon harness" that helps agents reliably complete complex, multi-step tasks over extended periods. These features address a critical gap between experimental agent prototypes and production-ready systems .
What Are Sandboxes and Why Do AI Agents Need Them?
Sandboxes are isolated computing environments where agents can safely access only the files, tools, and dependencies they need for a specific task. This containment approach prevents agents from accidentally or maliciously accessing sensitive data, modifying files they shouldn't touch, or executing unintended commands. In the past, autonomous agents in demo scenarios have occasionally "gone rogue," accessing resources beyond their intended scope and spiraling into uncontrolled behavior .
The updated Agents SDK now supports sandbox execution natively, meaning developers no longer need to build custom isolation layers themselves. The SDK works with multiple sandbox providers, including Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel. Developers can also bring their own sandbox infrastructure if they prefer .
A key innovation is the "Manifest" abstraction, which lets developers describe the agent's workspace in a portable way. This means developers can mount local files, define output directories, and pull data from cloud storage services like AWS S3, Google Cloud Storage, Azure Blob Storage, and Cloudflare R2. The same configuration works whether an agent is running locally during development or in production on a cloud provider .
How to Build Long-Running AI Agents With the New SDK?
- Configure Memory and State: The updated harness includes configurable memory that persists across multiple steps, allowing agents to retain context and continue work even if a sandbox container fails or expires.
- Use Standardized Tool Integrations: The SDK now includes built-in support for common agentic patterns, including tool use via MCP (Model Context Protocol), progressive disclosure via skills, custom instructions via AGENTS.md files, code execution using shell commands, and file editing using patch tools.
- Enable Durable Execution: The SDK supports snapshotting and rehydration, meaning if a sandbox environment fails, the agent's state can be restored in a fresh container and execution can resume from the last checkpoint without losing progress.
- Separate Harness From Compute: By keeping the orchestration layer separate from the execution environment, developers can scale agents more efficiently, route different tasks to isolated environments, and parallelize work across multiple containers.
Previously, developers had to write substantial custom engineering to create agents capable of reliably continuing from where they left off, maintaining persistent state across multiple steps, and coordinating work on complex tasks. The new harness abstracts away much of this infrastructure work, allowing developers to focus on domain-specific logic that makes their agents useful .
What Real-World Problems Does This Solve?
One concrete example comes from Oscar Health, a health insurance company. Rachael Burns, Staff Engineer and AI Tech Lead at Oscar Health, explained how the updated SDK enabled a critical workflow: "The updated Agents SDK made it production-viable for us to automate a critical clinical records workflow that previous approaches couldn't handle reliably enough. For us, the difference was not just extracting the right metadata, but correctly understanding the boundaries of each encounter in long, complex records. As a result, we can more quickly understand what's happening for each patient in a given visit, helping members with their care needs and improving their experience with us" .
Rachael Burns, Staff Engineer and AI Tech Lead at Oscar Health
"The updated Agents SDK made it production-viable for us to automate a critical clinical records workflow that previous approaches couldn't handle reliably enough," said Rachael Burns.
Rachael Burns, Staff Engineer and AI Tech Lead, Oscar Health
This use case highlights why the new features matter: healthcare workflows involve long sequences of steps, require reliable state management across multiple document reviews, and demand security controls to protect sensitive patient information. The sandbox execution and long-horizon harness directly address these requirements .
How Does This Compare to Other Agent Development Approaches?
The AI agent development landscape includes three main approaches, each with tradeoffs. Model-agnostic frameworks like LangChain or LlamaIndex offer flexibility and work with multiple AI models, but they don't fully leverage the capabilities of frontier models like GPT-5. Model-provider SDKs, built specifically for a single company's models, can be closer to the model's native capabilities but often lack visibility into the orchestration layer. Managed agent APIs simplify deployment but constrain where agents run and how they access sensitive data .
OpenAI's updated Agents SDK attempts to bridge these gaps by providing a model-native harness that's specifically optimized for OpenAI models while remaining flexible enough to adapt to different deployment scenarios. The SDK is designed to keep agents closer to the model's natural operating pattern, which improves reliability and performance on complex tasks, particularly for long-running work that coordinates across diverse tools and systems .
What Features Are Coming Next?
OpenAI has announced several features currently in development. "Subagents" will allow a primary agent to delegate work to additional specialized agents, mirroring how effective teams collaborate. "Code mode" will enable agents to write and execute code as part of their workflow, moving closer to the autonomous developer experience that OpenAI has been building toward. Both features are planned for Python and TypeScript support .
The company is also expanding sandbox provider support and working to bring additional integrations so developers can plug the SDK into tools and systems they already use. The new harness and sandbox capabilities are currently available in Python, with TypeScript support planned for a future release .
Who Can Use This and What Does It Cost?
The updated Agents SDK is generally available to all OpenAI API customers. Pricing uses standard API rates based on tokens and tool use, meaning there's no separate charge for the SDK itself; developers pay only for the API calls their agents make .
Notably, the SDK works with over 100 non-OpenAI large language models (LLMs) via the Chat Completions API, making it useful even for teams that prefer to use models from other providers. An LLM is an AI model trained on vast amounts of text data to understand and generate human language .
The evolution of the Agents SDK from an experimental prototype called Swarm into a production-grade framework reflects a broader shift in how AI agent development is maturing. As more companies move from proof-of-concept agents to systems handling real business workflows, the infrastructure supporting those agents has become increasingly important. The focus on sandboxing, durability, and standardized tool integration suggests that the industry is moving toward treating agent development as a serious engineering discipline rather than a novelty application .