AI Tools15 min read

OpenAI Responses API Agentic Features: Shell Tools, Containers & Context Compaction (2026 Guide)

OpenAI's March 2026 Responses API update brings shell tools, hosted containers, built-in execution loops, and context compaction. Complete developer guide with code examples.

A
Admin
36 views

OpenAI Just Changed How You Build AI Agents

On March 11, 2026, OpenAI quietly published an engineering post that changes what it means to build an autonomous AI agent. The Responses API — already the preferred foundation for agentic apps — gained five new first-class primitives: a Unix shell tool, a built-in agent execution loop, a hosted container workspace, context compaction, and reusable agent skills.

If you've been cobbling together your own execution sandboxes, managing file state between turns, or hand-rolling retry logic, these new capabilities eliminate most of that boilerplate. This guide walks through what each feature does, why it matters, and how to start using it today.


What Is the OpenAI Responses API?

The Responses API (introduced in early 2026, priced at the same per-token rates as the Chat Completions API) is OpenAI's canonical interface for building multi-turn, tool-using agents. Unlike the Chat Completions API, it is stateful by design — each response can reference prior turns without you manually appending message history.

Think of it as the difference between a dumb pipe (Chat Completions) and a managed runtime (Responses). With the March 2026 update, that runtime now comes with an actual computer attached.

Current pricing (as of March 2026):

  • Input tokens: $2.50 / 1M (GPT-5.3 Instant), $10.00 / 1M (GPT-5.4)
  • Output tokens: $10.00 / 1M (GPT-5.3 Instant), $30.00 / 1M (GPT-5.4)
  • Hosted container storage: $0.03 / GB-hour
  • Context compaction: billed at input token rates for the compressed context

The 5 New Agentic Primitives

1. The Shell Tool

Before March 2026, the only built-in code execution tool in the Responses API was the code interpreter, limited to running Python inside an ephemeral sandbox. The new shell tool runs arbitrary Unix commands — grep, curl, awk, go run, node, java, whatever your task requires.

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5.4",
    tools=[{"type": "shell"}],
    input="Download the latest Bitcoin price from CoinGecko's public API, parse the JSON, and return the current USD price and 24h change percentage.",
)

print(response.output)

The key constraint: the model proposes the command, the runtime executes it. The model cannot self-execute — every shell command goes through a controlled policy layer. This is critical for enterprise deployments where you need audit trails and predictable behavior.

What this unlocks:

  • Run compiled languages (Go, Java, Rust, C++) not possible with Python-only interpreter
  • Spin up local servers (node server.js, python -m http.server)
  • Chain complex Unix pipelines (cat data.csv | awk -F, '{sum+=$3} END {print sum}')
  • Git operations, file manipulation, and package installs inside the container

2. The Built-In Agent Execution Loop

Standard LLM calls are one-shot: you send a prompt, you get a response. Building an agent traditionally meant writing your own loop:

# Old way — you had to write all of this yourself
while not taskcomplete:
    response = callllm(messages)
    if response.hastoolcall:
        result = executetool(response.toolcall)
        messages.append(result)
    else:
        taskcomplete = True

finalanswer = response.content

With the new Responses API execution loop, OpenAI handles this for you. Set agentloop=True and the API iterates — propose action → execute → feed result back → propose next action — until the model signals completion or you hit a configurable maxturns limit.

response = client.responses.create(
    model="gpt-5.4",
    tools=[{"type": "shell"}, {"type": "websearch"}],
    agentloop=True,
    maxturns=20,
    input="Research the top 3 competitors of Stripe launched in 2025-2026, find their pricing pages, and produce a comparison table saved as /workspace/comparison.md",
)

Single API call — runs the entire multi-step agent workflow

print(response.final
output)

print(f"Turns used: {response.usage.turns}")

Why this matters: Most agent bugs live in the execution loop — off-by-one errors, broken retry logic, context window mismanagement. Offloading this to OpenAI's battle-tested infrastructure removes an entire category of failure modes from your codebase.

3. Hosted Container Workspace

One of the biggest pain points in production agent development is file persistence across turns. Previously, every tool call started fresh — you couldn't have the agent download a PDF in turn 1 and parse it in turn 3.

The Responses API now provisions a hosted container per session: a sandboxed Linux environment with a persistent /workspace directory, network access (controlled by policy), and pre-installed runtimes.

# Create a session with a persistent container
session = client.responses.sessions.create(
    model="gpt-5.4",
    tools=[{"type": "shell"}],
    containerconfig={
        "persistworkspace": True,
        "networkpolicy": "restricted",  # blocks arbitrary outbound by default
        "alloweddomains": ["api.github.com", "pypi.org"]
    }
)

Turn 1: Download data

r1 = client.responses.create( sessionid=session.id, input="Download the CSV of top 1000 HN posts from the Firebase HN API and save to /workspace/hnposts.csv" )

Turn 3: Analyze data — file is still there!

r3 = client.responses.create( sessionid=session.id, input="Using /workspace/hnposts.csv, calculate the average score by domain and find the top 10 domains."

)

Container specs (March 2026):

  • 4 vCPU, 8GB RAM per container
  • Up to 10GB workspace storage
  • Containers persist for 24 hours by default (configurable up to 7 days)
  • Pre-installed: Python 3.12, Node.js 22, Go 1.24, Java 21

Network policy controls are essential for enterprise deployments — you don't want your agent accidentally exfiltrating data or making unexpected API calls.

4. Context Compaction

Long agent runs are expensive. A 20-turn research task using GPT-5.4 can easily exceed 200K tokens of context, costing $2–6 per run in input tokens alone. Context compaction is OpenAI's solution: the API automatically summarizes earlier turns when the context window fills up, keeping the agent functional without blowing up your token budget.

response = client.responses.create(
    model="gpt-5.4",
    agentloop=True,
    contextcompaction={
        "strategy": "auto",           # or "aggressive" | "disabled"
        "preservelastnturns": 5,   # always keep last 5 turns verbatim
        "maxcontexttokens": 64000   # trigger compaction above this threshold
    },
    input="Perform a full technical audit of the React codebase at github.com/myorg/myapp — check for security vulnerabilities, performance issues, and outdated dependencies. Write a detailed report."

)

Compaction strategies:

  • auto: OpenAI decides when to compact (recommended for most use cases)
  • aggressive: compact early and often, prioritize cost savings
  • disabled: raw context, no compaction (for debugging or when you need full history)

According to OpenAI's benchmarks published in March 2026, aggressive compaction reduced average token costs for 15+ turn agent runs by 47% with less than 3% degradation in task completion quality. That's a significant saving for anyone running agents at scale.

5. Reusable Agent Skills

The fifth primitive is the most forward-looking: agent skills. These are pre-packaged behaviors you define once and reuse across agents — essentially function libraries, but for agent capabilities rather than pure code.

# Define a skill once, reuse everywhere
researchskill = client.agentskills.create(
    name="webresearcher",
    description="Searches the web, fetches pages, and extracts structured information from multiple sources",
    tools=[{"type": "websearch"}, {"type": "shell"}],
    systemprompt="You are a meticulous researcher. Always cite sources with URLs and dates. Cross-reference claims across at least 3 sources before reporting.",
    model="gpt-5.3-instant"  # use cheaper model for skill sub-tasks
)

Attach to any agent

response = client.responses.create( model="gpt-5.4", skills=[researchskill.id], input="Create a comprehensive report on the current state of fusion energy commercialization in 2026."

)

Skills can be shared within an organization, versioned, and composed. A researchskill + dataanalysisskill + reportwritingskill becomes a full research pipeline without re-engineering from scratch for every new project.


Real-World Use Case: Competitive Intelligence Agent in 50 Lines

Here's a practical example combining all five primitives into a production-ready agent:

from openai import OpenAI

client = OpenAI()

Step 1: Create persistent session with container

session = client.responses.sessions.create( model="gpt-5.4", tools=[ {"type": "shell"}, {"type": "websearch"}, {"type": "fileread"}, {"type": "filewrite"} ], containerconfig={ "persistworkspace": True, "alloweddomains": [".crunchbase.com", ".linkedin.com", ".techcrunch.com"] } )

Step 2: Run the full multi-step workflow — single API call

response = client.responses.create( sessionid=session.id, agentloop=True, maxturns=30, contextcompaction={"strategy": "auto"}, input="Competitive intelligence task for Q1 2026: 1) Research the top 5 AI coding assistant tools (excluding GitHub Copilot). 2) For each: find pricing, recent feature updates (Jan-Mar 2026), review sentiment, notable customers. 3) Save raw data to /workspace/raw/. 4) Produce an executive summary as /workspace/report.md. 5) Also output a structured JSON dataset as /workspace/data.json.", ) print(f"Agent complete! Turns: {response.usage.turns}") print(f"Total tokens: {response.usage.totaltokens:,}")

print(f"Estimated cost: ${response.usage.estimatedcost:.4f}")

This workflow — which previously required a custom agent framework, a sandbox provider, file management code, retry handling, and context management — now runs in under 50 lines of Python with managed infrastructure underneath.


How the New Responses API Compares to Alternatives

When evaluating whether to use the managed Responses API or roll your own agent stack, the trade-offs are real:

OpenAI Responses API (managed)

  • ✅ Built-in execution loop, no boilerplate
  • ✅ Hosted containers included, no separate sandbox setup
  • ✅ Context compaction handled automatically
  • ✅ Reusable skills and session management
  • ❌ Vendor lock-in — your agents only run on OpenAI
  • ❌ Network policies may not suit every enterprise security requirement
  • ❌ Less control over the exact execution environment

LangGraph / LangChain (self-managed)

  • ✅ Model-agnostic — use Claude, Gemini, local models
  • ✅ Full control over execution logic and state
  • ✅ Open source, auditable
  • ❌ You build and maintain the execution loop
  • ❌ You provision and manage your own sandbox environment
  • ❌ Context management is your problem

CrewAI (multi-agent framework)

  • ✅ Great for orchestrating multiple specialized agents
  • ✅ Simpler mental model for task decomposition
  • ❌ Limited built-in execution infrastructure
  • ❌ Dependency on external tool integrations

The bottom line: OpenAI's approach optimizes for speed-to-deployment at the cost of portability. For teams that want to ship production agents in days rather than weeks, and are comfortable with OpenAI as a vendor, the new Responses API is hard to beat. For teams that need model flexibility or full infrastructure control, LangGraph remains the better choice.


What Developers Should Do Right Now

1. Audit your current agent loops. If you're managing your own while-loop, retry logic, or execution sandbox, evaluate whether the managed Responses API loop can replace it. Most straightforward single-agent tasks can migrate in a day or two.

2. Test context compaction on your longest runs. Set strategy: "auto" and monitor the compactedturns field in response metadata. If you're running agents with 10+ turns, expect meaningful cost reductions.

3. Start building reusable agent skills for your common sub-tasks. Web research, data transformation, code review, document summarization — package these as skills and reuse them. This is the foundation of a composable, maintainable agent architecture.

4. Respect network policies in containers. The default is restricted mode. Only open the specific domains your agent needs. Unconstrained network access is how agent costs spiral and security incidents happen.

5. Set maxturns conservatively. Start at 10-15 turns. An agent in an infinite loop will burn through your API budget fast. Monitor, tune, and set budget alerts in the OpenAI dashboard.


The Bigger Picture

What OpenAI shipped on March 11, 2026 isn't just a feature update — it's a philosophical statement about where AI development is heading. The goal is to build managed runtimes where models can act reliably in the world without developers having to engineer all the plumbing.

The Responses API is increasingly resembling AWS Lambda for AI: opinionated, managed, and designed to make the 80% use case trivially easy while trading away some flexibility for edge cases.

Whether that trade-off is right for your project depends on your requirements. But for developers shipping production agents in 2026, these five new primitives are worth understanding deeply — they represent the current frontier of what's possible without writing a custom agent framework from scratch.

The era of spending two weeks building infrastructure before writing your first agent prompt is ending. That's a good thing.


Further Reading

Last updated: March 29, 2026. API pricing and specs are current as of this date and subject to change.*