OpenAI has released an updated Agents SDK designed to give developers standardized infrastructure for building agents that can inspect files, run commands, edit code, and work on long-horizon tasks within controlled sandbox environments.
The new version introduces a model-native harness that lets agents work across files and tools on a computer, paired with native sandbox execution for running that work safely. The harness becomes more capable for agents working with documents and files, now including configurable memory, sandbox-aware orchestration, and Codex-like filesystem tools.
The SDK supports tool use via MCP, progressive disclosure via skills, custom instructions via AGENTS.md, code execution using the shell tool, and file edits using the apply patch tool. OpenAI states the harness will continue to incorporate new agentic patterns and primitives over time, allowing developers to focus less on core infrastructure updates and more on domain-specific logic.
For sandbox environments, developers can bring their own sandbox implementation or use built-in support for Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel. The SDK introduces a Manifest abstraction for describing the agent's workspace, enabling developers to mount local files, define output directories, and bring in data from storage providers including AWS S3, Google Cloud Storage, Azure Blob Storage, and Cloudflare R2. This provides a consistent way to shape the agent's environment from local prototype to production deployment.
According to OpenAI, the harness helps developers unlock more of a frontier model's capability by aligning execution with the way those models perform best. This keeps agents closer to the model's natural operating pattern, improving reliability and performance on complex tasks, particularly when work is long-running or coordinated across a diverse set of tools and systems.
Oscar Health, one of the customers who tested the new SDK, reports that the updated Agents SDK made it production-viable to automate a critical clinical records workflow that previous approaches could not handle reliably. Rachael Burns, Staff Engineer and AI Tech Lead at Oscar Health, stated the difference was not just extracting the right metadata but correctly understanding the boundaries of each encounter in long, complex records. The company notes this helps them more quickly understand what is happening for each patient in a given visit, helping members with their care needs and improving their experience.
Other customers who tested the SDK include Actively, LexisNexis, FurtherAI, Thomson Reuters, Zoom, and Tomoro AI.