AgentLangv0.1.0

AgentLang Specification

A declarative language for defining AI agents with reproducible builds, versioned dependencies, and encrypted prompts.

Architecture

Three layers, clear separation:

Spec        agent.yaml
               │
               ▼
Runtime     LangGraph, CrewAI, Harness
               │
               ▼
Host        Rush, Claude Desktop, Docker, OS

Layer	Responsibility	Examples
Spec	What the agent is (tools, prompts, permissions)	AgentLang YAML
Runtime	How the agent loop executes	LangGraph, CrewAI, Harness
Host	Where the agent runs, what capabilities it grants	Rush, Claude Desktop, Docker, OS

Same agent spec, different runtimes, different hosts. Like how Kubernetes YAML runs on EKS, GKE, or bare metal - the spec doesn't care.

Why AgentLang?

Today, agents are defined in code. Code that drifts. Dependencies that break. Prompts committed in plaintext. No way to reproduce yesterday's agent.

AgentLang treats agents like software artifacts. You define them declaratively. You build them into signed, versioned containers. You ship them knowing exactly what's inside.

# The problem
$ pip install my-agent
# Which version of web_search? Which model? What prompts?
# Nobody knows. Every install is different.

# The solution
$ agentlang build ./my-agent
# Locked dependencies, hashed files, signed container.
# Same agent, every time.

Agent Definition

An agent is defined in agent.yaml. This is the source of truth.

name: research-assistant
version: 1.0.0
title: Research Assistant
subtitle: McKinsey-grade research
description: |
  Deep research across academic papers, market reports, and web sources.
  Synthesizes findings into actionable reports.

developer: acme
category: Business

models:
  claude-sonnet-4:
    provider: anthropic
    options:
      temperature: 0.7
      max_tokens: 8192

tools:
  - name: web_search
    version: ^1.2.0
  - name: web_fetch
  - name: artifacts
  - name: sqlite
    config:
      tables:
        citations:
          columns:
            id: { type: integer, primary_key: true }
            url: { type: text, required: true }
            title: { type: text }
            relevance: { type: real }

tools:
  - name: web_search
    version: ^1.2.0
  - name: web_fetch
  - name: artifacts
  - name: sqlite
    config:
      tables:
        citations:
          columns:
            id: { type: integer, primary_key: true }
            url: { type: text, required: true }
            title: { type: text }
            relevance: { type: real }
  - name: delegate
    config:
      max_depth: 3

# Co-located subagents (discovered automatically)
# ./specialists/academic-researcher.yaml
# ./specialists/market-analyst.yaml

Prompt Format (PromptLang)

Prompts use a sectioned YAML format. Each section has a name and content. This compiles to markdown or XML tags depending on the target runtime.

# prompt.yaml
version: 2
role: user
sections:
  - name: mindset
    content: |
      You are genuinely trying to discover something that doesn't exist yet.
      When you derive something, ask: "Have I seen this before?"
      If yes, that's a signal to try something else.

  - name: who_you_are
    content: |
      You're the person who checks their own work obsessively.
      Not because you're told to - because you genuinely want to know.

  - name: your_tools
    content: |
      web_search - Find current information across the web
      web_fetch - Read detailed content from specific URLs
      artifacts - Save reports and findings for the user
      sqlite - Query and store structured research data

  - name: workflow
    content: |
      1. Check user_memory for existing context
      2. Search multiple sources (academic, news, industry)
      3. Cross-reference findings
      4. Store citations in sqlite
      5. Synthesize into artifacts

Compiles to markdown:

## mindset
You are genuinely trying to discover something...

## who_you_are
You're the person who checks their own work...

## your_tools
web_search - Find current information...

Or XML tags:

<mindset>
You are genuinely trying to discover something...
</mindset>

<who_you_are>
You're the person who checks their own work...
</who_you_are>

Tool Specification

Tools are declared with version constraints. Runtimes resolve and lock versions.

Tool Declaration

tools:
  - name: web_search
    version: ^1.2.0          # Semver constraint
  - name: web_fetch
    version: ~1.0.0          # Patch updates only
  - name: artifacts          # Latest compatible
  - name: storage            # Abstract storage interface
    config:
      schema:                # Tool-specific config
        tables:
          records:
            columns:
              id: { type: integer, primary_key: true }
              data: { type: text, required: true }

Tool Schemas (MCP Extension)

MCP defines input schemas for tools. AgentLang extends this with optional output schemas, enabling runtimes to validate responses, route to UI components, and provide reliable error handling.

# Tool definition with input + output schemas
tools:
  - name: web_search
    version: ^1.2.0
    input_schema:            # Standard MCP
      type: object
      properties:
        query:
          type: string
          description: "Search query"
        max_results:
          type: integer
          default: 10
      required: [query]

    output_schema:           # AgentLang extension
      type: object
      properties:
        results:
          type: array
          items:
            type: object
            properties:
              title: { type: string }
              url: { type: string }
              snippet: { type: string }
        total_count:
          type: integer

Output schemas enable:

Feature	Benefit
Response validation	Catch malformed tool outputs before LLM sees them
Auto UI routing	Map outputs to UI components without LLM intervention
Type-safe chaining	Verify tool A's output matches tool B's input
Error classification	Distinguish validation errors from execution errors

MCP Servers

External tools via Model Context Protocol:

mcp_servers:
  Stripe:
    command: npx
    args: ["-y", "@stripe/mcp", "--tools=all"]
    env:
      STRIPE_SECRET_KEY: "{{.env.STRIPE_SECRET_KEY}}"
    tools:                   # Whitelist (optional)
      - list_subscriptions
      - list_customers
      - list_invoices

HTTP Tools (OAuth APIs)

http_tools:
  twitter_post_tweet:
    description: "Post a tweet to Twitter/X"
    endpoint: "/api/v1/twitter/tweets"
    method: POST
    auth: bearer
    twitter_token: true      # Requires OAuth
    params:
      text:
        type: string
        description: "Tweet text (max 280 chars)"
        required: true
      reply_to:
        type: string
        required: false
    body_template: '{
      "text": "{{text}}"
      {{if reply_to}}, "reply_to": "{{reply_to}}"{{end}}
    }'

Bash Tools (Scoped Commands)

bash_tools:
  convert_video:
    description: "Convert video to different format"
    command: ffmpeg
    help_command: "ffmpeg -h"
    labels:
      running: "Converting video"
      finished: "Conversion complete"

dependencies:
  - name: ffmpeg
    check: "ffmpeg -version"
    install:
      darwin: "brew install ffmpeg"
      linux: "apt-get install ffmpeg"

Permissions

Agents declare capabilities they request. Hosts decide what to grant. This separation is critical - an agent built for a sandboxed web runtime shouldn't assume it has the same access as one running on a desktop app with system privileges.

Capability Declaration

Agents declare required and optional capabilities:

permissions:
  required:
    - network              # Internet access
    - storage              # Persist data between sessions

  optional:
    - microphone           # Audio input
    - camera               # Video input
    - screen_capture       # Screenshot/recording
    - filesystem           # Read/write local files
    - notifications        # System notifications
    - background           # Run when user isn't active
    - location             # GPS/IP location

  # Capability-specific config
  microphone:
    mode: on_demand        # user_triggered | on_demand | continuous
    reason: "Voice commands and meeting transcription"

  camera:
    mode: on_demand
    reason: "Security monitoring when requested"

  background:
    mode: scheduled        # scheduled | continuous
    reason: "Hourly inbox check"

Permission Modes

Different capabilities require different trust levels:

Mode	When Granted	Example
`user_triggered`	Only when user explicitly clicks/taps	Upload photo button
`on_demand`	Agent can request, host may prompt	"Check my camera feed"
`continuous`	Always active while agent runs	Meeting transcription
`scheduled`	Runs at specified intervals	Hourly security check

Host Responsibility

The host (runtime environment) decides how to handle permission requests:

# Host capability matrix (not in agent.yaml - runtime config)
#
# Claude Desktop    -> sandboxed, no background, user_triggered only
# Replit Agent      -> filesystem + network, no sensors
# Desktop App       -> full system access, all modes
# Mobile App        -> requires OS permission prompts
# Browser Extension -> limited to active tab

When an agent requests a capability the host doesn't support:

Required capability missing - Agent fails to start with clear error
Optional capability missing - Agent runs in degraded mode
Mode downgrade - Host can grant user_triggered when on_demand requested

Autonomous Operations

For agents that operate without active user supervision:

# User asks: "Check my camera every hour while I'm away
#              and send me a photo of what things look like"

permissions:
  required:
    - camera
    - background
    - notifications

  camera:
    mode: scheduled
    reason: "Periodic security snapshots"
    schedule: "0 * * * *"        # Cron: every hour

  background:
    mode: scheduled
    wake_triggers:
      - schedule: "0 * * * *"    # Matches camera schedule
    max_runtime: 60              # Seconds per wake

  notifications:
    mode: on_demand
    channels:
      - push                     # Mobile push
      - email                    # Fallback

The host must support background execution and scheduled wake. If it doesn't, the agent declares this as a required capability and fails gracefully on unsupported hosts rather than silently not working.

Capability Categories

Category	Capabilities	Risk Level
Network	network, websocket, p2p	Low
Storage	storage, filesystem, keychain	Medium
Sensors	microphone, camera, screen_capture, location	High
System	background, notifications, clipboard, shell	High
Identity	oauth, wallet, signing	Critical

Context Management

Agents declare how conversation history and context should be managed. Runtimes implement the actual storage - the spec defines the interface.

context:
  # Sliding window for conversation history
  window:
    max_tokens: 100000
    strategy: sliding        # sliding | summarize | truncate

  # Compaction when context fills up
  compaction:
    strategy: summarize      # summarize | drop_oldest | checkpoint
    trigger: 0.8             # Compact at 80% capacity
    preserve:
      - tool_results         # Never drop tool outputs
      - user_messages        # Keep recent user turns

  # Injected context (available as template vars)
  inject:
    - time                   # Current timestamp
    - location               # User location (if permitted)
    - user_profile           # From runtime user store

Compaction strategies:

Strategy	Behavior	Use Case
`summarize`	LLM summarizes older context	Long research sessions
`drop_oldest`	Remove oldest messages	Stateless assistants
`checkpoint`	Save full context, start fresh	Multi-phase workflows

Multi-Agent Orchestration

AgentLang supports two orchestration modes:

Dynamic Orchestration

The LLM decides which specialists to invoke at runtime based on the task. Subagents are discovered from co-located YAML files.

# Orchestrator (agent.yaml)
tools:
  - name: delegate
    config:
      mode: dynamic            # LLM picks specialists
      discovery: directory     # Find co-located *.yaml files
      max_depth: 3

# Directory structure - subagents discovered automatically
research-agent/
  agent.yaml                   # Orchestrator
  academic-researcher.yaml     # Discovered as specialist
  market-analyst.yaml          # Discovered as specialist
  report-writer.yaml           # Discovered as specialist

# Subagent definition (academic-researcher.yaml)
name: academic-researcher
description: |
  Specialist in research papers, academic publications,
  and scholarly work.
  # ^ Description used for capability-based routing

tools:
  - name: web_search
  - name: artifacts

Static Orchestration

Fixed agent sequence defined upfront. Useful for deterministic pipelines.

# Static crew (agent.yaml)
delegate:
  mode: static
  process: sequential          # sequential | parallel | hierarchical
  agents:
    - path: ./researcher.yaml
      task: "Research the topic thoroughly"
    - path: ./writer.yaml
      task: "Write a report based on research"
      depends_on: [researcher]  # Waits for researcher to complete

Session Sharing

Subagents share the session context and artifacts with the orchestrator. A child agent runs in the same session — same artifacts directory, same message history (distinguished by agent_id).

Protocols

Agents declare which communication protocols they support.

Protocol Declaration

protocols:
  - name: a2a                    # Agent-to-agent messaging
    version: ^1.0.0
  - name: mcp                    # Tool interface
    version: ^2024.11

Versioning and Deprecation

Protocols evolve. Like Kubernetes API versions:

protocols:
  - name: a2a
    version: ^2.0.0              # Stable

  # When v1 deprecated:
  # - Runtime warns: "a2a v1 deprecated, migrate to v2"
  # - Conversion layer translates v1 <-> v2
  # - Eventually v1 removed, old agents fail validation

Custom Protocols

Like Kubernetes CRDs, anyone can define custom protocols:

protocols:
  - name: custom/acme-sync       # Namespaced custom protocol
    version: ^1.0.0

# Custom protocol definition (published separately or inline)
x-protocols:
  acme-sync:
    version: 1.0.0
    schema:
      message_format: json
      transport: websocket
      auth: bearer

Execution Constraints

Agents declare resource limits and budgets. Runtimes enforce them. This prevents runaway costs, infinite loops, and unbounded delegation chains.

Budget

Cost-based enforcement with tolerance. Runtimes track actual token usage and convert to dollars using declared pricing.

constraints:
  budget:
    max_cost_usd: 2.00             # Hard cap in dollars
    tolerance: 0.20                # Allow 20% overage before cutoff
    input_1m_price: 3.00           # $/1M input tokens (model-specific)
    output_1m_price: 15.00         # $/1M output tokens

  # When budget runs low, runtime injects a warning
  # into the agent's context so it can wrap up gracefully

Execution Limits

constraints:
  limits:
    max_turns: 50                  # Conversation turn cap (0 = unlimited)
    max_delegation_depth: 3        # How deep agent chains can go
    timeout: 300                   # Max execution time in seconds

  sandbox:
    disk_quota: 1073741824         # 1GB max disk usage
    max_exec_time: 60              # Max seconds per tool invocation
    allowed_paths:                 # Filesystem restrictions
      - "./output"
      - "./data"
    allowed_urls:                  # Network restrictions
      - "https://api.example.com"
      - "https://*.googleapis.com"

Child Budget Allocation

When an agent delegates to a child, it allocates a portion of its own budget. Unused budget is returned to the parent on completion.

# Parent has $2.00 budget
# Delegates to researcher with $0.50 allocation
# Researcher uses $0.30, returns $0.20
# Parent continues with $1.70 remaining

Constraint	Scope	Default
`max_cost_usd`	Per execution	Unlimited
`max_turns`	Per execution	Unlimited
`max_delegation_depth`	Agent chain	3
`timeout`	Per execution	None
`disk_quota`	Per agent	Unlimited
`max_exec_time`	Per tool call	Unlimited

Generative UI

Agents produce rich, interactive interfaces — not just text. Components are declared in agent.yaml, which generates typed render_* tools the agent calls at runtime. Each component has a schema for its props and declares the actions users can take on it.

AgentLang defines the declaration format — what components exist and how actions route. Any UI library can implement the rendering: Vercel AI SDK, CopilotKit, Streamlit, custom React, or native mobile. The spec is the contract between the agent definition and whatever renders it.

Dual-Path Action Routing

The key innovation: user actions on components route through two distinct paths. Deterministic operations (archive, star, delete) execute instantly via tool calls. Reasoning operations (reply, summarize, analyze) route to the LLM. The agent definition declares which path each action takes.

ui_components:
  email_card:
    # Sync actions: execute immediately, no LLM involved
    sync_actions:
      archive:
        tool: gmail_modify_labels
        params:
          message_id: "{{messageId}}"
          remove_labels: "INBOX"
      star:
        tool: gmail_modify_labels
        params:
          message_id: "{{messageId}}"
          add_labels: "STARRED"

    # Agent actions: route to LLM for reasoning
    agent_actions:
      reply:
        params:
          messageId: "{{messageId}}"
          threadId: "{{threadId}}"
      summarize:
        params:
          threadId: "{{threadId}}"

  video_player: {}
  metric_card: {}

At runtime, the agent calls typed render tools:

render_email_card({
  messageId: "msg_123",
  from: { name: "Alice", email: "alice@example.com" },
  subject: "Project update",
  snippet: "Latest progress report...",
  timestamp: "2025-01-29T10:30:00Z"
})

Action Type	Declaration	Behavior	Latency
Sync	Has `tool:` field	Executes tool directly, no LLM round-trip	<50ms
Agent	No `tool:` field	Sends context to LLM for reasoning	500ms-2s

This separation is what makes agent interfaces feel responsive. An inbox agent handling 100 daily actions costs ~$0.90/day with sync routing vs ~$30/day if every click goes through the LLM.

Tool UI Rendering

HTTP tools can declare UI that renders automatically based on execution status:

http_tools:
  generate_headshot:
    endpoint: /api/v1/proxy
    method: POST
    params:
      input_image: { type: string, required: true }
    ui_component:
      completed:
        component: headshot_gallery
        props:
          images: "{{json.messages[0].images}}"
          cost: "{{json.usage.cost_usd}}"
        actions:
          sync_actions:
            download:
              tool: system://download_file
              params: { url: "{{selectedImage}}" }
          agent_actions:
            - regenerate
      failed:
        component: error_card
        props:
          message: "{{error}}"

Skills

Reusable bundles that constrain which tools an agent can use and provide guided workflows.

# agents/my-agent/skills/test-helper/SKILL.md
---
name: test-helper
description: Skill for testing capabilities
allowed-tools:
  - sqlite
  - delegate
---

# Test Helper Skill

When testing a tool:
1. Call with minimal valid input
2. Validate response structure
3. Test edge cases
4. Document results

# agent.yaml
skills:
  - name: test-helper    # Restricts to: sqlite, delegate only

Build Output

Lockfile

# agent.lock
version: "1"
resolved_at: "2025-01-29T10:30:00Z"

tools:
  web_search:
    version: 1.2.0
    hash: sha256:e5f6a7b8...
  web_fetch:
    version: 1.0.0
    hash: sha256:c9d0e1f2...

agents:
  academic-researcher:
    version: 1.0.0
    hash: sha256:f7e8d9c0...

models:
  claude-sonnet-4:
    context_window: 200000
    supports_tools: true

files:
  agent.yaml: sha256:1a2b3c4d...
  prompt.yaml: sha256:5e6f7a8b...

Container

The .agent container is a signed, encrypted archive:

research-assistant@1.0.0.agent
├── manifest.json        # Metadata + file list
├── agent.yaml           # Definition (plaintext)
├── prompt.yaml.enc      # Encrypted (AES-256-GCM)
├── agent.lock           # Locked dependencies
├── signature.sig        # Ed25519 signature
└── context/             # Bundled resources
    └── templates/

UX Hints

Optional ux.yaml for runtime UI improvements:

integrations:
  - name: Gmail
    website: https://gmail.com
  - name: Stripe
    website: https://stripe.com

suggestions:
  - "Help me catch up on emails"
  - "What's my MRR this month?"

tools:
  gmail_list_messages:
    labels:
      running: "Loading inbox"
      finished: "Loaded inbox"
    suggestions:
      - "Show my unread emails"

mcp_tools:
  playwright-web:
    web_navigate:
      labels:
        running: "Navigating"
        finished: "Navigation complete"

CLI Reference

# Build agent (resolves deps, generates lockfile, signs)
agentlang build ./my-agent

# Build with quality check (LLM reviews prompts)
agentlang build ./my-agent --quality

# Validate without building
agentlang validate ./my-agent

# Inspect container contents
agentlang inspect my-agent@1.0.0.agent

# Publish to registry
agentlang publish ./my-agent

# Run locally (requires compatible runtime)
agentlang run ./my-agent --prompt "Research X"

Events

Standard event format for observability, debugging, and session replay. Runtimes emit these events; any compatible viewer can consume them.

Event Schema

events:
  - type: agent.think
    timestamp: 2025-01-29T10:00:00Z
    agent_id: researcher
    content: "I should search for recent papers..."

  - type: tool.call
    timestamp: 2025-01-29T10:00:01Z
    agent_id: researcher
    tool: web_search
    input: { query: "transformer architectures 2025" }
    call_id: abc123

  - type: tool.result
    timestamp: 2025-01-29T10:00:03Z
    call_id: abc123
    output: { results: [...] }
    duration_ms: 2000

  - type: agent.delegate
    timestamp: 2025-01-29T10:00:05Z
    from: orchestrator
    to: academic-researcher
    task: "Analyze these findings"

  - type: agent.message
    timestamp: 2025-01-29T10:00:10Z
    agent_id: researcher
    role: assistant
    content: "Based on my analysis..."

Event Types

Type	When Emitted	Key Fields
`agent.think`	LLM reasoning (if exposed)	agent_id, content
`tool.call`	Tool invocation starts	agent_id, tool, input, call_id
`tool.result`	Tool returns	call_id, output, duration_ms, error?
`agent.delegate`	Orchestrator hands off to specialist	from, to, task
`agent.message`	Message added to conversation	agent_id, role, content
`context.compact`	Context window compacted	strategy, tokens_before, tokens_after

Runtime Adapters

Each runtime translates its internal events to this format:

LangGraph state transitions  ──┐
CrewAI task events           ──┼──▶  AgentLang Events  ──▶  Replay UI
AutoGen messages             ──┤
Harness events               ──┘

Same replay viewer works regardless of which runtime executed the agent.

Runtime Requirements

AgentLang defines agents. Runtimes execute them. A compatible runtime must:

Parse agent.yaml and prompt.yaml formats
Verify container signatures
Decrypt prompts with provided key
Resolve tools from lockfile versions
Implement the standard tool interfaces
Handle multi-agent delegation
Emit AgentLang events for observability
Render UI components (if supported)

Related Specifications

AgentLang builds on and complements existing standards:

Specification	Used For	Link
Agent Format (Snap)	Declarative agent definitions, execution policies, governance	agentformat.org
Model Context Protocol (MCP)	Tool interfaces, server connections	spec.modelcontextprotocol.io
JSON Schema	Tool input/output validation	json-schema.org
Agent-to-Agent Protocol (A2A)	Multi-agent communication	google.github.io/A2A
AGENTS.md	Project-level agent instructions	github.com/agentsmd
OpenTelemetry	Event format inspiration	opentelemetry.io
Semver	Tool and agent versioning	semver.org