AgentLang Specification
A declarative language for defining AI agents with reproducible builds, versioned dependencies, and encrypted prompts.
Architecture
Three layers, clear separation:
Spec agent.yaml
│
▼
Runtime LangGraph, CrewAI, Harness
│
▼
Host Rush, Claude Desktop, Docker, OS| Layer | Responsibility | Examples |
|---|---|---|
| Spec | What the agent is (tools, prompts, permissions) | AgentLang YAML |
| Runtime | How the agent loop executes | LangGraph, CrewAI, Harness |
| Host | Where the agent runs, what capabilities it grants | Rush, Claude Desktop, Docker, OS |
Same agent spec, different runtimes, different hosts. Like how Kubernetes YAML runs on EKS, GKE, or bare metal - the spec doesn't care.
Why AgentLang?
Today, agents are defined in code. Code that drifts. Dependencies that break. Prompts committed in plaintext. No way to reproduce yesterday's agent.
AgentLang treats agents like software artifacts. You define them declaratively. You build them into signed, versioned containers. You ship them knowing exactly what's inside.
# The problem
$ pip install my-agent
# Which version of web_search? Which model? What prompts?
# Nobody knows. Every install is different.
# The solution
$ agentlang build ./my-agent
# Locked dependencies, hashed files, signed container.
# Same agent, every time.Agent Definition
An agent is defined in agent.yaml. This is the source of truth.
name: research-assistant
version: 1.0.0
title: Research Assistant
subtitle: McKinsey-grade research
description: |
Deep research across academic papers, market reports, and web sources.
Synthesizes findings into actionable reports.
developer: acme
category: Business
models:
claude-sonnet-4:
provider: anthropic
options:
temperature: 0.7
max_tokens: 8192
tools:
- name: web_search
version: ^1.2.0
- name: web_fetch
- name: artifacts
- name: sqlite
config:
tables:
citations:
columns:
id: { type: integer, primary_key: true }
url: { type: text, required: true }
title: { type: text }
relevance: { type: real }
tools:
- name: web_search
version: ^1.2.0
- name: web_fetch
- name: artifacts
- name: sqlite
config:
tables:
citations:
columns:
id: { type: integer, primary_key: true }
url: { type: text, required: true }
title: { type: text }
relevance: { type: real }
- name: delegate
config:
max_depth: 3
# Co-located subagents (discovered automatically)
# ./specialists/academic-researcher.yaml
# ./specialists/market-analyst.yamlPrompt Format (PromptLang)
Prompts use a sectioned YAML format. Each section has a name and content. This compiles to markdown or XML tags depending on the target runtime.
# prompt.yaml
version: 2
role: user
sections:
- name: mindset
content: |
You are genuinely trying to discover something that doesn't exist yet.
When you derive something, ask: "Have I seen this before?"
If yes, that's a signal to try something else.
- name: who_you_are
content: |
You're the person who checks their own work obsessively.
Not because you're told to - because you genuinely want to know.
- name: your_tools
content: |
web_search - Find current information across the web
web_fetch - Read detailed content from specific URLs
artifacts - Save reports and findings for the user
sqlite - Query and store structured research data
- name: workflow
content: |
1. Check user_memory for existing context
2. Search multiple sources (academic, news, industry)
3. Cross-reference findings
4. Store citations in sqlite
5. Synthesize into artifactsCompiles to markdown:
## mindset
You are genuinely trying to discover something...
## who_you_are
You're the person who checks their own work...
## your_tools
web_search - Find current information...Or XML tags:
<mindset>
You are genuinely trying to discover something...
</mindset>
<who_you_are>
You're the person who checks their own work...
</who_you_are>Tool Specification
Tools are declared with version constraints. Runtimes resolve and lock versions.
Tool Declaration
tools:
- name: web_search
version: ^1.2.0 # Semver constraint
- name: web_fetch
version: ~1.0.0 # Patch updates only
- name: artifacts # Latest compatible
- name: storage # Abstract storage interface
config:
schema: # Tool-specific config
tables:
records:
columns:
id: { type: integer, primary_key: true }
data: { type: text, required: true }Tool Schemas (MCP Extension)
MCP defines input schemas for tools. AgentLang extends this with optional output schemas, enabling runtimes to validate responses, route to UI components, and provide reliable error handling.
# Tool definition with input + output schemas
tools:
- name: web_search
version: ^1.2.0
input_schema: # Standard MCP
type: object
properties:
query:
type: string
description: "Search query"
max_results:
type: integer
default: 10
required: [query]
output_schema: # AgentLang extension
type: object
properties:
results:
type: array
items:
type: object
properties:
title: { type: string }
url: { type: string }
snippet: { type: string }
total_count:
type: integerOutput schemas enable:
| Feature | Benefit |
|---|---|
| Response validation | Catch malformed tool outputs before LLM sees them |
| Auto UI routing | Map outputs to UI components without LLM intervention |
| Type-safe chaining | Verify tool A's output matches tool B's input |
| Error classification | Distinguish validation errors from execution errors |
MCP Servers
External tools via Model Context Protocol:
mcp_servers:
Stripe:
command: npx
args: ["-y", "@stripe/mcp", "--tools=all"]
env:
STRIPE_SECRET_KEY: "{{.env.STRIPE_SECRET_KEY}}"
tools: # Whitelist (optional)
- list_subscriptions
- list_customers
- list_invoicesHTTP Tools (OAuth APIs)
http_tools:
twitter_post_tweet:
description: "Post a tweet to Twitter/X"
endpoint: "/api/v1/twitter/tweets"
method: POST
auth: bearer
twitter_token: true # Requires OAuth
params:
text:
type: string
description: "Tweet text (max 280 chars)"
required: true
reply_to:
type: string
required: false
body_template: '{
"text": "{{text}}"
{{if reply_to}}, "reply_to": "{{reply_to}}"{{end}}
}'Bash Tools (Scoped Commands)
bash_tools:
convert_video:
description: "Convert video to different format"
command: ffmpeg
help_command: "ffmpeg -h"
labels:
running: "Converting video"
finished: "Conversion complete"
dependencies:
- name: ffmpeg
check: "ffmpeg -version"
install:
darwin: "brew install ffmpeg"
linux: "apt-get install ffmpeg"Permissions
Agents declare capabilities they request. Hosts decide what to grant. This separation is critical - an agent built for a sandboxed web runtime shouldn't assume it has the same access as one running on a desktop app with system privileges.
Capability Declaration
Agents declare required and optional capabilities:
permissions:
required:
- network # Internet access
- storage # Persist data between sessions
optional:
- microphone # Audio input
- camera # Video input
- screen_capture # Screenshot/recording
- filesystem # Read/write local files
- notifications # System notifications
- background # Run when user isn't active
- location # GPS/IP location
# Capability-specific config
microphone:
mode: on_demand # user_triggered | on_demand | continuous
reason: "Voice commands and meeting transcription"
camera:
mode: on_demand
reason: "Security monitoring when requested"
background:
mode: scheduled # scheduled | continuous
reason: "Hourly inbox check"Permission Modes
Different capabilities require different trust levels:
| Mode | When Granted | Example |
|---|---|---|
user_triggered | Only when user explicitly clicks/taps | Upload photo button |
on_demand | Agent can request, host may prompt | "Check my camera feed" |
continuous | Always active while agent runs | Meeting transcription |
scheduled | Runs at specified intervals | Hourly security check |
Host Responsibility
The host (runtime environment) decides how to handle permission requests:
# Host capability matrix (not in agent.yaml - runtime config)
#
# Claude Desktop -> sandboxed, no background, user_triggered only
# Replit Agent -> filesystem + network, no sensors
# Desktop App -> full system access, all modes
# Mobile App -> requires OS permission prompts
# Browser Extension -> limited to active tabWhen an agent requests a capability the host doesn't support:
- Required capability missing - Agent fails to start with clear error
- Optional capability missing - Agent runs in degraded mode
- Mode downgrade - Host can grant
user_triggeredwhenon_demandrequested
Autonomous Operations
For agents that operate without active user supervision:
# User asks: "Check my camera every hour while I'm away
# and send me a photo of what things look like"
permissions:
required:
- camera
- background
- notifications
camera:
mode: scheduled
reason: "Periodic security snapshots"
schedule: "0 * * * *" # Cron: every hour
background:
mode: scheduled
wake_triggers:
- schedule: "0 * * * *" # Matches camera schedule
max_runtime: 60 # Seconds per wake
notifications:
mode: on_demand
channels:
- push # Mobile push
- email # FallbackThe host must support background execution and scheduled wake. If it doesn't, the agent declares this as a required capability and fails gracefully on unsupported hosts rather than silently not working.
Capability Categories
| Category | Capabilities | Risk Level |
|---|---|---|
| Network | network, websocket, p2p | Low |
| Storage | storage, filesystem, keychain | Medium |
| Sensors | microphone, camera, screen_capture, location | High |
| System | background, notifications, clipboard, shell | High |
| Identity | oauth, wallet, signing | Critical |
Context Management
Agents declare how conversation history and context should be managed. Runtimes implement the actual storage - the spec defines the interface.
context:
# Sliding window for conversation history
window:
max_tokens: 100000
strategy: sliding # sliding | summarize | truncate
# Compaction when context fills up
compaction:
strategy: summarize # summarize | drop_oldest | checkpoint
trigger: 0.8 # Compact at 80% capacity
preserve:
- tool_results # Never drop tool outputs
- user_messages # Keep recent user turns
# Injected context (available as template vars)
inject:
- time # Current timestamp
- location # User location (if permitted)
- user_profile # From runtime user storeCompaction strategies:
| Strategy | Behavior | Use Case |
|---|---|---|
summarize | LLM summarizes older context | Long research sessions |
drop_oldest | Remove oldest messages | Stateless assistants |
checkpoint | Save full context, start fresh | Multi-phase workflows |
Multi-Agent Orchestration
AgentLang supports two orchestration modes:
Dynamic Orchestration
The LLM decides which specialists to invoke at runtime based on the task. Subagents are discovered from co-located YAML files.
# Orchestrator (agent.yaml)
tools:
- name: delegate
config:
mode: dynamic # LLM picks specialists
discovery: directory # Find co-located *.yaml files
max_depth: 3
# Directory structure - subagents discovered automatically
research-agent/
agent.yaml # Orchestrator
academic-researcher.yaml # Discovered as specialist
market-analyst.yaml # Discovered as specialist
report-writer.yaml # Discovered as specialist
# Subagent definition (academic-researcher.yaml)
name: academic-researcher
description: |
Specialist in research papers, academic publications,
and scholarly work.
# ^ Description used for capability-based routing
tools:
- name: web_search
- name: artifactsStatic Orchestration
Fixed agent sequence defined upfront. Useful for deterministic pipelines.
# Static crew (agent.yaml)
delegate:
mode: static
process: sequential # sequential | parallel | hierarchical
agents:
- path: ./researcher.yaml
task: "Research the topic thoroughly"
- path: ./writer.yaml
task: "Write a report based on research"
depends_on: [researcher] # Waits for researcher to completeSession Sharing
Subagents share the session context and artifacts with the orchestrator. A child agent runs in the same session — same artifacts directory, same message history (distinguished by agent_id).
Protocols
Agents declare which communication protocols they support.
Protocol Declaration
protocols:
- name: a2a # Agent-to-agent messaging
version: ^1.0.0
- name: mcp # Tool interface
version: ^2024.11Versioning and Deprecation
Protocols evolve. Like Kubernetes API versions:
protocols:
- name: a2a
version: ^2.0.0 # Stable
# When v1 deprecated:
# - Runtime warns: "a2a v1 deprecated, migrate to v2"
# - Conversion layer translates v1 <-> v2
# - Eventually v1 removed, old agents fail validationCustom Protocols
Like Kubernetes CRDs, anyone can define custom protocols:
protocols:
- name: custom/acme-sync # Namespaced custom protocol
version: ^1.0.0
# Custom protocol definition (published separately or inline)
x-protocols:
acme-sync:
version: 1.0.0
schema:
message_format: json
transport: websocket
auth: bearerExecution Constraints
Agents declare resource limits and budgets. Runtimes enforce them. This prevents runaway costs, infinite loops, and unbounded delegation chains.
Budget
Cost-based enforcement with tolerance. Runtimes track actual token usage and convert to dollars using declared pricing.
constraints:
budget:
max_cost_usd: 2.00 # Hard cap in dollars
tolerance: 0.20 # Allow 20% overage before cutoff
input_1m_price: 3.00 # $/1M input tokens (model-specific)
output_1m_price: 15.00 # $/1M output tokens
# When budget runs low, runtime injects a warning
# into the agent's context so it can wrap up gracefullyExecution Limits
constraints:
limits:
max_turns: 50 # Conversation turn cap (0 = unlimited)
max_delegation_depth: 3 # How deep agent chains can go
timeout: 300 # Max execution time in seconds
sandbox:
disk_quota: 1073741824 # 1GB max disk usage
max_exec_time: 60 # Max seconds per tool invocation
allowed_paths: # Filesystem restrictions
- "./output"
- "./data"
allowed_urls: # Network restrictions
- "https://api.example.com"
- "https://*.googleapis.com"Child Budget Allocation
When an agent delegates to a child, it allocates a portion of its own budget. Unused budget is returned to the parent on completion.
# Parent has $2.00 budget
# Delegates to researcher with $0.50 allocation
# Researcher uses $0.30, returns $0.20
# Parent continues with $1.70 remaining| Constraint | Scope | Default |
|---|---|---|
max_cost_usd | Per execution | Unlimited |
max_turns | Per execution | Unlimited |
max_delegation_depth | Agent chain | 3 |
timeout | Per execution | None |
disk_quota | Per agent | Unlimited |
max_exec_time | Per tool call | Unlimited |
Generative UI
Agents produce rich, interactive interfaces — not just text. Components are declared in agent.yaml, which generates typed render_* tools the agent calls at runtime. Each component has a schema for its props and declares the actions users can take on it.
AgentLang defines the declaration format — what components exist and how actions route. Any UI library can implement the rendering: Vercel AI SDK, CopilotKit, Streamlit, custom React, or native mobile. The spec is the contract between the agent definition and whatever renders it.
Dual-Path Action Routing
The key innovation: user actions on components route through two distinct paths. Deterministic operations (archive, star, delete) execute instantly via tool calls. Reasoning operations (reply, summarize, analyze) route to the LLM. The agent definition declares which path each action takes.
ui_components:
email_card:
# Sync actions: execute immediately, no LLM involved
sync_actions:
archive:
tool: gmail_modify_labels
params:
message_id: "{{messageId}}"
remove_labels: "INBOX"
star:
tool: gmail_modify_labels
params:
message_id: "{{messageId}}"
add_labels: "STARRED"
# Agent actions: route to LLM for reasoning
agent_actions:
reply:
params:
messageId: "{{messageId}}"
threadId: "{{threadId}}"
summarize:
params:
threadId: "{{threadId}}"
video_player: {}
metric_card: {}At runtime, the agent calls typed render tools:
render_email_card({
messageId: "msg_123",
from: { name: "Alice", email: "alice@example.com" },
subject: "Project update",
snippet: "Latest progress report...",
timestamp: "2025-01-29T10:30:00Z"
})| Action Type | Declaration | Behavior | Latency |
|---|---|---|---|
| Sync | Has tool: field | Executes tool directly, no LLM round-trip | <50ms |
| Agent | No tool: field | Sends context to LLM for reasoning | 500ms-2s |
This separation is what makes agent interfaces feel responsive. An inbox agent handling 100 daily actions costs ~$0.90/day with sync routing vs ~$30/day if every click goes through the LLM.
Tool UI Rendering
HTTP tools can declare UI that renders automatically based on execution status:
http_tools:
generate_headshot:
endpoint: /api/v1/proxy
method: POST
params:
input_image: { type: string, required: true }
ui_component:
completed:
component: headshot_gallery
props:
images: "{{json.messages[0].images}}"
cost: "{{json.usage.cost_usd}}"
actions:
sync_actions:
download:
tool: system://download_file
params: { url: "{{selectedImage}}" }
agent_actions:
- regenerate
failed:
component: error_card
props:
message: "{{error}}"Skills
Reusable bundles that constrain which tools an agent can use and provide guided workflows.
# agents/my-agent/skills/test-helper/SKILL.md
---
name: test-helper
description: Skill for testing capabilities
allowed-tools:
- sqlite
- delegate
---
# Test Helper Skill
When testing a tool:
1. Call with minimal valid input
2. Validate response structure
3. Test edge cases
4. Document results# agent.yaml
skills:
- name: test-helper # Restricts to: sqlite, delegate onlyBuild Output
Lockfile
# agent.lock
version: "1"
resolved_at: "2025-01-29T10:30:00Z"
tools:
web_search:
version: 1.2.0
hash: sha256:e5f6a7b8...
web_fetch:
version: 1.0.0
hash: sha256:c9d0e1f2...
agents:
academic-researcher:
version: 1.0.0
hash: sha256:f7e8d9c0...
models:
claude-sonnet-4:
context_window: 200000
supports_tools: true
files:
agent.yaml: sha256:1a2b3c4d...
prompt.yaml: sha256:5e6f7a8b...Container
The .agent container is a signed, encrypted archive:
research-assistant@1.0.0.agent
├── manifest.json # Metadata + file list
├── agent.yaml # Definition (plaintext)
├── prompt.yaml.enc # Encrypted (AES-256-GCM)
├── agent.lock # Locked dependencies
├── signature.sig # Ed25519 signature
└── context/ # Bundled resources
└── templates/UX Hints
Optional ux.yaml for runtime UI improvements:
integrations:
- name: Gmail
website: https://gmail.com
- name: Stripe
website: https://stripe.com
suggestions:
- "Help me catch up on emails"
- "What's my MRR this month?"
tools:
gmail_list_messages:
labels:
running: "Loading inbox"
finished: "Loaded inbox"
suggestions:
- "Show my unread emails"
mcp_tools:
playwright-web:
web_navigate:
labels:
running: "Navigating"
finished: "Navigation complete"CLI Reference
# Build agent (resolves deps, generates lockfile, signs)
agentlang build ./my-agent
# Build with quality check (LLM reviews prompts)
agentlang build ./my-agent --quality
# Validate without building
agentlang validate ./my-agent
# Inspect container contents
agentlang inspect my-agent@1.0.0.agent
# Publish to registry
agentlang publish ./my-agent
# Run locally (requires compatible runtime)
agentlang run ./my-agent --prompt "Research X"Events
Standard event format for observability, debugging, and session replay. Runtimes emit these events; any compatible viewer can consume them.
Event Schema
events:
- type: agent.think
timestamp: 2025-01-29T10:00:00Z
agent_id: researcher
content: "I should search for recent papers..."
- type: tool.call
timestamp: 2025-01-29T10:00:01Z
agent_id: researcher
tool: web_search
input: { query: "transformer architectures 2025" }
call_id: abc123
- type: tool.result
timestamp: 2025-01-29T10:00:03Z
call_id: abc123
output: { results: [...] }
duration_ms: 2000
- type: agent.delegate
timestamp: 2025-01-29T10:00:05Z
from: orchestrator
to: academic-researcher
task: "Analyze these findings"
- type: agent.message
timestamp: 2025-01-29T10:00:10Z
agent_id: researcher
role: assistant
content: "Based on my analysis..."Event Types
| Type | When Emitted | Key Fields |
|---|---|---|
agent.think | LLM reasoning (if exposed) | agent_id, content |
tool.call | Tool invocation starts | agent_id, tool, input, call_id |
tool.result | Tool returns | call_id, output, duration_ms, error? |
agent.delegate | Orchestrator hands off to specialist | from, to, task |
agent.message | Message added to conversation | agent_id, role, content |
context.compact | Context window compacted | strategy, tokens_before, tokens_after |
Runtime Adapters
Each runtime translates its internal events to this format:
LangGraph state transitions ──┐
CrewAI task events ──┼──▶ AgentLang Events ──▶ Replay UI
AutoGen messages ──┤
Harness events ──┘Same replay viewer works regardless of which runtime executed the agent.
Runtime Requirements
AgentLang defines agents. Runtimes execute them. A compatible runtime must:
- Parse agent.yaml and prompt.yaml formats
- Verify container signatures
- Decrypt prompts with provided key
- Resolve tools from lockfile versions
- Implement the standard tool interfaces
- Handle multi-agent delegation
- Emit AgentLang events for observability
- Render UI components (if supported)
Related Specifications
AgentLang builds on and complements existing standards:
| Specification | Used For | Link |
|---|---|---|
| Agent Format (Snap) | Declarative agent definitions, execution policies, governance | agentformat.org |
| Model Context Protocol (MCP) | Tool interfaces, server connections | spec.modelcontextprotocol.io |
| JSON Schema | Tool input/output validation | json-schema.org |
| Agent-to-Agent Protocol (A2A) | Multi-agent communication | google.github.io/A2A |
| AGENTS.md | Project-level agent instructions | github.com/agentsmd |
| OpenTelemetry | Event format inspiration | opentelemetry.io |
| Semver | Tool and agent versioning | semver.org |