Sandboxing

An LLM-optimized bundle of this entire section is available at section.md. This single file contains all pages in this section, optimized for AI coding agent context.

A sandbox is an isolated, secure environment where code can run without affecting the host system. Sandboxes restrict what the executing code can do — limiting filesystem access, blocking network calls, and preventing arbitrary system operations — so that even malicious or buggy code cannot cause harm.

The exact restrictions depend on the sandboxing approach: some sandboxes eliminate dangerous operations entirely, while others provide full capabilities within an isolated, disposable container.

Why sandboxing matters for AI

LLM-generated code is inherently untrusted. The model may produce code that is correct and useful, but it can also produce code that is dangerous — and it does so without intent or awareness.

Risk	Example
Data destruction	`DELETE FROM orders WHERE 1=1` — wipes an entire table
Credential exfiltration	Reads environment variables and sends API keys to an external endpoint
Infinite loops	`while True: pass` — consumes CPU indefinitely
Resource abuse	Spawns thousands of threads or allocates unbounded memory
Filesystem damage	`rm -rf /` or overwrites critical configuration files
Network abuse	Makes unauthorized API calls, sends spam, or joins a botnet

Running LLM-generated code without a sandbox means trusting the model to never make these mistakes. Sandboxing eliminates this trust requirement by making dangerous operations structurally impossible.

Types of sandboxes

There are three broad approaches to sandboxing LLM-generated code, each with different tradeoffs:

Type	How it works	Tradeoffs	Examples
One-shot execution	Code runs to completion in a disposable container, then the container is discarded. Stdout, stderr, and outputs are captured.	Simple, no state reuse. Good for single-turn tasks.	Container tasks, serverless functions
Interactive sessions	A persistent VM or container where you send commands incrementally and observe results between steps. Sessions last for the lifetime of the VM.	Flexible and multi-turn, but heavier to provision and manage.	E2B, Daytona, fly.io
Programmatic tool calling	The LLM generates orchestration code that calls a predefined set of tools. The orchestration code runs in a sandbox while the tools run in full containers.	Durable, observable, and secure. Tools are known ahead of time.	Flyte workflow sandboxing

	Workflow sandbox	Code sandbox
Runtime	Monty (Rust-based Python interpreter)	Ephemeral Docker container
Startup	Microseconds	Seconds (image build + container spin-up)
Capabilities	Pure Python control flow only — no imports, no I/O, no network	Full Python environment — any package, any library, full I/O
Use case	LLM-generated orchestration logic that calls registered tools	Arbitrary computation — data processing, test execution, ETL, shell pipelines
State	Runs within a worker container process	Stateless — fresh container per invocation
Security model	Dangerous operations are structurally impossible	Isolated container

Use the workflow sandbox when you need to run untrusted control flow (loops, conditionals, routing) that dispatches work to known tasks. It starts in microseconds and provides the strongest isolation guarantees.
Use the code sandbox when you need full Python capabilities — third-party packages, file I/O, shell commands, or any computation that goes beyond pure control flow.

Learn more

Workflow sandboxing — How the Monty-based sandboxed orchestrator works, with examples
Programmatic tool calling for agents — The concept behind programmatic tool calling and how to build agents that use it
Code sandboxing — Running arbitrary code and commands in ephemeral containers with flyte.sandbox.create()

Sandboxing

Why sandboxing matters for AI

Types of sandboxes

What Flyte offers

Workflow sandbox (Monty)

Code sandbox (container)

When to use which

Learn more