feat(workflow): implement workflow graph system with node types and agent vertex

- Added `mod.rs` for the workflow graph system, outlining the structure and usage. - Introduced `node.rs` defining various node types (Agent, Tool, Router, etc.) and their configurations. - Created `vertices/agent.rs` implementing the `AgentVertex` for LLM-based processing with tool calling capabilities. - Added `vertices/mod.rs` to organize vertex implementations. - Implemented serialization and deserialization for node configurations using Serde. - Included tests for node configurations and agent vertex functionality.
2026-01-02 12:34:02 +09:00
parent 755c7f1321
commit d8f2eb68d4
22 changed files with 4790 additions and 8345 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -18,10 +18,6 @@ wheels/
 *_api/

 # AI
-AGENTS.md
-CLAUDE.md
-GEMINI.md
-QWEN.md
 .serena/

 # Others
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,43 @@
+# Repository Guidelines
+
+## Project Structure & Module Organization
+- `research_agent/` contains the core Python agents, prompts, tools, and subagent utilities.
+- `skills/` holds project-level skills as `SKILL.md` files (YAML frontmatter + instructions).
+- `research_workspace/` is the agent’s working filesystem for generated outputs; keep it clean or example-only.
+- `deep-agents-ui/` is the Next.js/React UI with source under `deep-agents-ui/src/`.
+- `deepagents_sourcecode/` vendors upstream library sources for reference and comparison.
+- `rust-research-agent/` is a standalone Rust tutorial agent with its own build/test flow.
+- `langgraph.json` defines the LangGraph deployment entrypoint for the research agent.
+
+## Build, Test, and Development Commands
+Use the UI commands from `deep-agents-ui/` when working on the frontend:
+```bash
+cd deep-agents-ui && yarn install   # install deps
+cd deep-agents-ui && yarn dev       # run local UI
+cd deep-agents-ui && yarn build     # production build
+cd deep-agents-ui && yarn lint      # eslint checks
+cd deep-agents-ui && yarn format    # prettier format
+```
+Python tooling is configured in `pyproject.toml` (ruff + mypy):
+```bash
+uv run ruff format .
+uv run ruff check .
+uv run mypy .
+```
+
+## Coding Style & Naming Conventions
+- Python: follow ruff defaults and Google-style docstrings (see `pyproject.toml`); prefer `snake_case` modules and functions.
+- TypeScript/React: keep `PascalCase` for components, `camelCase` for hooks/utilities; rely on ESLint + Prettier (Tailwind plugin).
+- Skill definitions: keep one skill per directory with a `SKILL.md` entrypoint and clear, task-focused naming.
+
+## Testing Guidelines
+- There are no repository-wide tests for `research_agent/` yet; add `pytest` tests when introducing new logic.
+- Subprojects have their own suites: see `deepagents_sourcecode/libs/*/Makefile` and `rust-research-agent/README.md` for `make test` or `cargo test`.
+
+## Commit & Pull Request Guidelines
+- Git history uses short, descriptive messages in English or Korean with no enforced prefix; keep summaries concise and imperative.
+- For PRs, include: a brief summary, testing notes (or “not run”), linked issues, and UI screenshots for frontend changes.
+
+## Configuration & Secrets
+- Copy `env.example` to `.env` for API keys; never commit secrets.
+- UI-only keys can be set via `NEXT_PUBLIC_LANGSMITH_API_KEY` in `deep-agents-ui/`.
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,288 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+A multi-agent research system demonstrating **FileSystem-based Context Engineering** using LangChain's DeepAgents framework. The system includes:
+- **Python DeepAgents**: LangChain-based multi-agent orchestration with web research capabilities
+- **Rust `rig-deepagents`**: A port/reimagining using the Rig framework with Pregel-inspired graph execution
+
+The system enables agents to conduct web research, delegate tasks to sub-agents, and generate comprehensive reports with persistent filesystem state.
+
+## Development Commands
+
+### Python Backend
+
+```bash
+# Install dependencies (uses uv package manager)
+uv sync
+
+# Start LangGraph development server (API on localhost:2024)
+langgraph dev
+
+# Linting and formatting
+ruff check research_agent/
+ruff format research_agent/
+
+# Type checking
+mypy research_agent/
+```
+
+### Frontend UI (deep-agents-ui/)
+
+```bash
+cd deep-agents-ui
+yarn install
+yarn dev          # Dev server on localhost:3000
+yarn build        # Production build
+yarn lint         # ESLint
+yarn format       # Prettier
+```
+
+### Interactive Notebook Development
+
+```bash
+# Open Jupyter for interactive agent testing
+jupyter notebook DeepAgent_research.ipynb
+```
+
+The `research_agent/utils.py` module provides Rich-formatted display helpers for notebooks:
+- `format_messages(messages)` - Renders messages with colored panels (Human=blue, AI=green, Tool=yellow)
+- `show_prompt(text, title)` - Displays prompts with XML/header syntax highlighting
+
+### Rust `rig-deepagents` Crate
+
+```bash
+cd rust-research-agent/crates/rig-deepagents
+
+# Run all tests (159 tests)
+cargo test
+
+# Run tests for a specific module
+cargo test pregel::          # Pregel runtime tests
+cargo test workflow::        # Workflow node tests
+cargo test middleware::      # Middleware tests
+
+# Linting (strict, treats warnings as errors)
+cargo clippy -- -D warnings
+
+# Build with optional features
+cargo build --features checkpointer-sqlite
+cargo build --features checkpointer-redis
+cargo build --features checkpointer-postgres
+```
+
+### Running the Full Stack
+
+1. Start backend: `langgraph dev` (port 2024)
+2. Start frontend: `cd deep-agents-ui && yarn dev` (port 3000)
+3. Configure UI with Deployment URL (`http://127.0.0.1:2024`) and Assistant ID (`research`)
+
+## Required Environment Variables
+
+Copy `env.example` to `.env`:
+- `OPENAI_API_KEY` - For gpt-4.1 model
+- `TAVILY_API_KEY` - For web search functionality
+- `LANGSMITH_API_KEY` - Optional, format `lsv2_pt_...` for tracing
+- `LANGSMITH_TRACING` / `LANGSMITH_PROJECT` - Optional tracing config
+
+## Architecture
+
+### Multi-SubAgent System
+
+The system uses a three-tier agent hierarchy with two distinct SubAgent types:
+
+```
+Main Orchestrator Agent (agent.py)
+    │
+    ├── FilesystemBackend (../research_workspace)
+    │   └── Persistent state via virtual filesystem
+    │
+    └── SubAgents
+        ├── researcher (CompiledSubAgent) ─ Autonomous DeepAgent
+        │   └── "Breadth-first, then depth" research pattern
+        ├── explorer (Simple SubAgent) ─ Fast read-only exploration
+        └── synthesizer (Simple SubAgent) ─ Research result integration
+```
+
+**CompiledSubAgent vs Simple SubAgent:**
+
+| Type | Definition | Execution | Use Case |
+|------|------------|-----------|----------|
+| CompiledSubAgent | `{"runnable": CompiledStateGraph}` | Multi-turn autonomous | Complex research with self-planning |
+| Simple SubAgent | `{"system_prompt": str}` | Single response | Quick tasks, file ops |
+
+### Core Components
+
+**`research_agent/agent.py`** - Orchestrator configuration:
+- LLM: `ChatOpenAI(model="gpt-4.1", temperature=0.0)`
+- Creates researcher via `get_researcher_subagent()` (CompiledSubAgent)
+- Defines `explorer_agent`, `synthesizer_agent` (Simple SubAgents)
+- Assembles `ALL_SUBAGENTS = [researcher_subagent, *SIMPLE_SUBAGENTS]`
+
+**`research_agent/researcher/`** - Autonomous researcher module:
+- `agent.py`: `create_researcher_agent()` factory and `get_researcher_subagent()` wrapper
+- `prompts.py`: `AUTONOMOUS_RESEARCHER_INSTRUCTIONS` with three-phase workflow (Exploratory → Directed → Synthesis)
+
+**Backend Factory Pattern** - The `backend_factory(rt: ToolRuntime)` function demonstrates the recommended pattern:
+```python
+CompositeBackend(
+    default=StateBackend(rt),      # In-memory state (temporary files)
+    routes={"/": fs_backend}       # Route "/" paths to FilesystemBackend
+)
+```
+This enables routing: paths starting with "/" go to persistent local filesystem (`research_workspace/`), others use ephemeral state.
+
+**`research_agent/prompts.py`** - Prompt templates:
+- `RESEARCH_WORKFLOW_INSTRUCTIONS` - Main workflow (plan → save → delegate → synthesize → write → verify)
+- `SUBAGENT_DELEGATION_INSTRUCTIONS` - When to parallelize (comparisons) vs single agent (overviews)
+- `EXPLORER_INSTRUCTIONS` - Fast read-only exploration with filesystem tools
+- `SYNTHESIZER_INSTRUCTIONS` - Multi-source integration with confidence levels
+
+**`research_agent/tools.py`** - Research tools:
+- `tavily_search(query, max_results, topic)` - Searches web, fetches full page content, converts to markdown
+- `think_tool(reflection)` - Explicit reflection step for deliberate research
+
+**`langgraph.json`** - Deployment config pointing to `./research_agent/agent.py:agent`
+
+### Context Engineering Pattern
+
+The filesystem acts as long-term memory:
+1. Agent reads/writes files in virtual `research_workspace/`
+2. Structured outputs: reports, TODOs, request files
+3. Middleware auto-injects filesystem and sub-agent tools
+4. Automatic context summarization for token efficiency
+
+### DeepAgents Auto-Injected Tools
+
+The `create_deep_agent()` function automatically adds these tools via middleware:
+- **TodoListMiddleware**: `write_todos` - Task planning and progress tracking
+- **FilesystemMiddleware**: `ls`, `read_file`, `write_file`, `edit_file`, `glob`, `grep` - File operations
+- **SubAgentMiddleware**: `task` - Delegate work to sub-agents
+- **SkillsMiddleware**: Progressive skill disclosure via `skills/` directory
+
+Custom tools (`tavily_search`, `think_tool`) are added explicitly in `agent.py`.
+
+### Skills System
+
+Project-level skills are located in `PROJECT_ROOT/skills/`:
+- `academic-search/` - arXiv paper search with structured output
+- `data-synthesis/` - Multi-source data integration and analysis
+- `report-writing/` - Structured report generation
+- `skill-creator/` - Meta-skill for creating new skills
+
+Each skill has a `SKILL.md` file with YAML frontmatter (name, description) and detailed instructions. The SkillsMiddleware uses Progressive Disclosure: only skill metadata is injected into the system prompt at session start; full skill content is read on-demand when needed.
+
+### Research Workflow
+
+**Orchestrator workflow:**
+```
+Plan → Save Request → Delegate to Sub-agents → Synthesize → Write Report → Verify
+```
+
+**Autonomous Researcher workflow (breadth-first, then depth):**
+```
+Phase 1: Exploratory Search (1-2 searches) → Identify directions
+Phase 2: Directed Research (1-2 searches per direction) → Deep dive
+Phase 3: Synthesis → Combine findings with source agreement analysis
+```
+
+Sub-agents operate with token budgets (5-6 max searches) and explicit reflection loops (Search → think_tool → Decide → Repeat).
+
+## Rust `rig-deepagents` Architecture
+
+The Rust crate provides a Pregel-inspired graph execution runtime for agent workflows.
+
+### Module Structure
+
+```
+rust-research-agent/crates/rig-deepagents/src/
+├── lib.rs              # Library entry point and re-exports
+├── pregel/             # Pregel Runtime (graph execution engine)
+│   ├── runtime.rs      # Superstep orchestration, workflow timeout, retry policies
+│   ├── vertex.rs       # Vertex trait and compute context
+│   ├── message.rs      # Inter-vertex message passing
+│   ├── config.rs       # PregelConfig, RetryPolicy
+│   ├── checkpoint/     # Fault tolerance via checkpointing
+│   │   ├── mod.rs      # Checkpointer trait and factory
+│   │   └── file.rs     # FileCheckpointer implementation
+│   └── state.rs        # WorkflowState trait, UnitState
+├── workflow/           # Workflow Builder DSL
+│   ├── node.rs         # NodeKind (Agent, Tool, Router, SubAgent, FanOut/FanIn)
+│   └── mod.rs          # WorkflowGraph builder API
+├── middleware/         # AgentMiddleware trait and MiddlewareStack
+├── backends/           # Backend trait (Memory, Filesystem, Composite)
+├── llm/                # LLMProvider abstraction (OpenAI, Anthropic)
+└── tools/              # Tool implementations (read_file, write_file, grep, etc.)
+```
+
+### Pregel Execution Model
+
+The runtime executes workflows using synchronized supersteps:
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    PregelRuntime                             │
+│  ┌─────────┐  ┌─────────┐  ┌─────────┐                     │
+│  │Superstep│→ │Superstep│→ │Superstep│→ ...                │
+│  │    0    │  │    1    │  │    2    │                     │
+│  └─────────┘  └─────────┘  └─────────┘                     │
+│       │            │            │                           │
+│       ▼            ▼            ▼                           │
+│  Per-Superstep: Deliver → Compute → Collect → Route        │
+└─────────────────────────────────────────────────────────────┘
+```
+
+- **Vertex**: Computation unit with `compute()` method (Agent, Tool, Router)
+- **Message**: Communication between vertices across supersteps
+- **Checkpointing**: Fault tolerance via periodic state snapshots
+- **Retry Policy**: Exponential backoff with configurable max retries
+
+### Key Types
+
+| Type | Purpose |
+|------|---------|
+| `PregelRuntime<S, M>` | Executes workflow graph with state S and message M |
+| `Vertex<S, M>` | Trait for computation nodes |
+| `WorkflowState` | Trait for workflow state (must be serializable) |
+| `PregelConfig` | Runtime configuration (max supersteps, parallelism, timeout) |
+| `Checkpointer` | Trait for state persistence (Memory, File, SQLite, Redis, Postgres) |
+
+### Design Documents
+
+- `docs/plans/2026-01-02-rig-deepagents-pregel-design.md` - Comprehensive Pregel runtime design
+- `docs/plans/2026-01-02-rig-deepagents-implementation-tasks.md` - Implementation task breakdown
+
+## Key Files for Understanding the System
+
+**Python DeepAgents:**
+1. `research_agent/agent.py` - Orchestrator creation and SubAgent assembly
+2. `research_agent/researcher/agent.py` - Autonomous researcher factory (CompiledSubAgent pattern)
+3. `research_agent/researcher/prompts.py` - Three-phase autonomous workflow
+4. `research_agent/prompts.py` - Orchestrator and Simple SubAgent prompts
+5. `research_agent/tools.py` - Tool implementations
+6. `research_agent/skills/middleware.py` - SkillsMiddleware with progressive disclosure
+
+**Rust rig-deepagents:**
+7. `rust-research-agent/crates/rig-deepagents/src/pregel/runtime.rs` - Pregel execution engine
+8. `rust-research-agent/crates/rig-deepagents/src/pregel/vertex.rs` - Vertex abstraction
+9. `rust-research-agent/crates/rig-deepagents/src/workflow/node.rs` - Node type definitions
+10. `rust-research-agent/crates/rig-deepagents/src/llm/provider.rs` - LLMProvider trait
+
+**Documentation:**
+11. `DeepAgents_Technical_Guide.md` - Python DeepAgents reference (Korean)
+12. `docs/plans/2026-01-02-rig-deepagents-pregel-design.md` - Rust Pregel design
+
+## Tech Stack
+
+- **Python 3.13**: deepagents, langchain-openai, langgraph-cli, tavily-python
+- **Rust**: rig-core 0.27, tokio, serde, async-trait, thiserror
+- **Frontend**: Next.js 16, React, TailwindCSS, Radix UI
+- **Package managers**: uv (Python), Yarn (Node.js), Cargo (Rust)
+
+## External Resources
+
+- [LangChain DeepAgent Docs](https://docs.langchain.com/oss/python/deepagents/overview)
+- [LangGraph CLI Docs](https://docs.langchain.com/langsmith/cli#configuration-file)
+- [DeepAgent UI](https://github.com/langchain-ai/deep-agents-ui)
--- a/docs/plans/2025-12-31-rig-deepagents.md
+++ b/docs/plans/2025-12-31-rig-deepagents.md
--- a/docs/plans/2026-01-01-rig-deepagents-fixes.md
+++ b/docs/plans/2026-01-01-rig-deepagents-fixes.md
@@ -1,804 +0,0 @@
-# Rig-DeepAgents Code Review 수정 계획
-
-> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
-
-**Goal:** Codex 및 Code Review에서 식별된 HIGH/MEDIUM 이슈를 수정하여 Python DeepAgents와의 패리티를 확보하고 프로덕션 준비 상태를 달성한다.
-
-**Architecture:**
- MemoryBackend와 AgentState.files의 이중 소스 문제는 `files_update` 반환 패턴을 통해 해결 (Python 패턴 유지)
- 경로 처리는 중앙화된 `normalize_path` 유틸리티로 통일
- CompositeBackend의 glob/write/edit 결과 집계 및 경로 복원 로직 보강
-
-**Tech Stack:** Rust 1.75+, tokio, async-trait, glob crate
-
-**검증 피드백 출처:**
- Codex CLI (gpt-5.2-codex): 55,431 토큰 분석
- Code Reviewer Subagent: 독립 검증
-
---
-
-## Phase 1: HIGH Severity 수정 (필수)
-
-### Task 1.1: MemoryBackend glob() base_path 필터 적용
-
-**Files:**
- Modify: `rust-research-agent/crates/rig-deepagents/src/backends/memory.rs:191-213`
- Test: 동일 파일 하단 tests 모듈
-
-**문제:** `glob()`이 `base_path`를 검증만 하고 실제 필터링에 사용하지 않아 모든 파일이 검색됨
-
-**Step 1: 실패하는 테스트 작성**
-
-`memory.rs` tests 모듈에 추가:
-
-```rust
-#[tokio::test]
-async fn test_memory_backend_glob_respects_base_path() {
-    let backend = MemoryBackend::new();
-    backend.write("/src/main.rs", "fn main()").await.unwrap();
-    backend.write("/src/lib.rs", "pub mod").await.unwrap();
-    backend.write("/docs/readme.md", "# Readme").await.unwrap();
-    backend.write("/tests/test.rs", "test code").await.unwrap();
-
-    // /src 하위에서만 검색해야 함
-    let files = backend.glob("**/*.rs", "/src").await.unwrap();
-
-    // /src 하위의 .rs 파일만 포함되어야 함
-    assert_eq!(files.len(), 2);
-    assert!(files.iter().all(|f| f.path.starts_with("/src")));
-
-    // /tests/test.rs는 포함되면 안 됨
-    assert!(!files.iter().any(|f| f.path.contains("/tests/")));
-}
-```
-
-**Step 2: 테스트 실패 확인**
-
-Run: `source ~/.cargo/env && cd rust-research-agent/crates/rig-deepagents && cargo test test_memory_backend_glob_respects_base_path`
-Expected: FAIL - 현재 구현은 모든 .rs 파일(4개)을 반환
-
-**Step 3: glob() 구현 수정**
-
-`memory.rs` 191-213 라인을 다음으로 교체:
-
-```rust
-async fn glob(&self, pattern: &str, base_path: &str) -> Result<Vec<FileInfo>, BackendError> {
-    let base = Self::validate_path(base_path)?;
-    let files = self.files.read().await;
-
-    let glob_pattern = Pattern::new(pattern)
-        .map_err(|e| BackendError::Pattern(e.to_string()))?;
-
-    let mut results = Vec::new();
-    for (file_path, data) in files.iter() {
-        // base_path 하위 파일만 검색
-        let normalized_base = base.trim_end_matches('/');
-        if !file_path.starts_with(normalized_base) {
-            continue;
-        }
-        // base_path와 정확히 같거나 base_path/ 로 시작해야 함
-        if file_path.len() > normalized_base.len()
-            && !file_path[normalized_base.len()..].starts_with('/') {
-            continue;
-        }
-
-        let match_path = file_path.trim_start_matches('/');
-        if glob_pattern.matches(match_path) {
-            let size = data.content.iter().map(|s| s.len()).sum::<usize>() as u64;
-            results.push(FileInfo::file_with_time(
-                file_path,
-                size,
-                &data.modified_at,
-            ));
-        }
-    }
-
-    results.sort_by(|a, b| a.path.cmp(&b.path));
-    Ok(results)
-}
-```
-
-**Step 4: 테스트 통과 확인**
-
-Run: `source ~/.cargo/env && cd rust-research-agent/crates/rig-deepagents && cargo test test_memory_backend_glob_respects_base_path`
-Expected: PASS
-
-**Step 5: 전체 테스트 확인**
-
-Run: `source ~/.cargo/env && cd rust-research-agent/crates/rig-deepagents && cargo test`
-Expected: 모든 테스트 PASS
-
---
-
-### Task 1.2: CompositeBackend files_update 키 경로 복원
-
-**Files:**
- Modify: `rust-research-agent/crates/rig-deepagents/src/backends/composite.rs:107-133`
- Test: 동일 파일 하단 tests 모듈
-
-**문제:** routed backend에서 반환된 `files_update`의 키가 stripped 경로로 남아있어 상태 업데이트 시 경로 오류 발생
-
-**Step 1: 실패하는 테스트 작성**
-
-`composite.rs` tests 모듈에 추가:
-
-```rust
-#[tokio::test]
-async fn test_composite_backend_write_files_update_path_restoration() {
-    let default = Arc::new(MemoryBackend::new());
-    let memories = Arc::new(MemoryBackend::new());
-
-    let composite = CompositeBackend::new(default.clone())
-        .with_route("/memories/", memories.clone());
-
-    // routed backend에 파일 쓰기
-    let result = composite.write("/memories/notes.txt", "my notes").await.unwrap();
-
-    assert!(result.is_ok());
-    assert_eq!(result.path, Some("/memories/notes.txt".to_string()));
-
-    // files_update가 있다면 키도 복원되어야 함
-    if let Some(files_update) = &result.files_update {
-        // 키가 /memories/notes.txt 여야 함 (stripped된 /notes.txt가 아님)
-        assert!(files_update.contains_key("/memories/notes.txt"),
-            "files_update key should be '/memories/notes.txt', got keys: {:?}",
-            files_update.keys().collect::<Vec<_>>());
-    }
-}
-
-#[tokio::test]
-async fn test_composite_backend_edit_files_update_path_restoration() {
-    let default = Arc::new(MemoryBackend::new());
-    let memories = Arc::new(MemoryBackend::new());
-
-    let composite = CompositeBackend::new(default.clone())
-        .with_route("/memories/", memories.clone());
-
-    // 먼저 파일 생성
-    composite.write("/memories/notes.txt", "hello world").await.unwrap();
-
-    // 편집
-    let result = composite.edit("/memories/notes.txt", "hello", "goodbye", false).await.unwrap();
-
-    assert!(result.is_ok());
-
-    // files_update가 있다면 키도 복원되어야 함
-    if let Some(files_update) = &result.files_update {
-        assert!(files_update.contains_key("/memories/notes.txt"),
-            "files_update key should be '/memories/notes.txt', got keys: {:?}",
-            files_update.keys().collect::<Vec<_>>());
-    }
-}
-```
-
-**Step 2: 테스트 실패 확인**
-
-Run: `source ~/.cargo/env && cd rust-research-agent/crates/rig-deepagents && cargo test test_composite_backend_write_files_update`
-Expected: FAIL - 키가 `/notes.txt`로 되어있음
-
-**Step 3: write() 및 edit() 구현 수정**
-
-`composite.rs`의 write() 메서드 (107-116 라인)를 다음으로 교체:
-
-```rust
-async fn write(&self, path: &str, content: &str) -> Result<WriteResult, BackendError> {
-    let (backend, stripped) = self.get_backend_and_path(path);
-    let mut result = backend.write(&stripped, content).await?;
-
-    // 경로 복원
-    if result.path.is_some() {
-        result.path = Some(path.to_string());
-    }
-
-    // files_update 키도 복원
-    if let Some(ref mut files_update) = result.files_update {
-        let restored: std::collections::HashMap<String, crate::state::FileData> = files_update
-            .drain()
-            .map(|(k, v)| (self.restore_prefix(&k, path), v))
-            .collect();
-        *files_update = restored;
-    }
-
-    Ok(result)
-}
-```
-
-`composite.rs`의 edit() 메서드 (118-133 라인)를 다음으로 교체:
-
-```rust
-async fn edit(
-    &self,
-    path: &str,
-    old_string: &str,
-    new_string: &str,
-    replace_all: bool
-) -> Result<EditResult, BackendError> {
-    let (backend, stripped) = self.get_backend_and_path(path);
-    let mut result = backend.edit(&stripped, old_string, new_string, replace_all).await?;
-
-    // 경로 복원
-    if result.path.is_some() {
-        result.path = Some(path.to_string());
-    }
-
-    // files_update 키도 복원
-    if let Some(ref mut files_update) = result.files_update {
-        let restored: std::collections::HashMap<String, crate::state::FileData> = files_update
-            .drain()
-            .map(|(k, v)| (self.restore_prefix(&k, path), v))
-            .collect();
-        *files_update = restored;
-    }
-
-    Ok(result)
-}
-```
-
-**Step 4: 상단에 import 추가 확인**
-
-`composite.rs` 상단에 `use crate::state::FileData;` 가 있는지 확인. 없으면 추가.
-
-**Step 5: 테스트 통과 확인**
-
-Run: `source ~/.cargo/env && cd rust-research-agent/crates/rig-deepagents && cargo test test_composite_backend`
-Expected: 모든 composite 테스트 PASS
-
---
-
-### Task 1.3: CompositeBackend glob() 집계 로직 추가
-
-**Files:**
- Modify: `rust-research-agent/crates/rig-deepagents/src/backends/composite.rs:135-144`
- Test: 동일 파일 하단 tests 모듈
-
-**문제:** glob()이 단일 백엔드만 쿼리하여 Python처럼 모든 백엔드 결과를 집계하지 않음
-
-**Step 1: 실패하는 테스트 작성**
-
-```rust
-#[tokio::test]
-async fn test_composite_backend_glob_aggregates_all_backends() {
-    let default = Arc::new(MemoryBackend::new());
-    let docs = Arc::new(MemoryBackend::new());
-
-    let composite = CompositeBackend::new(default.clone())
-        .with_route("/docs/", docs.clone());
-
-    // 각 백엔드에 파일 생성
-    composite.write("/src/main.rs", "fn main()").await.unwrap();
-    composite.write("/docs/guide.md", "# Guide").await.unwrap();
-    composite.write("/docs/api.md", "# API").await.unwrap();
-
-    // 루트에서 모든 .md 파일 검색 - 모든 백엔드에서 집계해야 함
-    let files = composite.glob("**/*.md", "/").await.unwrap();
-
-    // docs 백엔드의 2개 파일이 모두 포함되어야 함
-    assert_eq!(files.len(), 2, "Expected 2 .md files, got: {:?}", files);
-    assert!(files.iter().any(|f| f.path.contains("guide.md")));
-    assert!(files.iter().any(|f| f.path.contains("api.md")));
-}
-```
-
-**Step 2: 테스트 실패 확인**
-
-Run: `source ~/.cargo/env && cd rust-research-agent/crates/rig-deepagents && cargo test test_composite_backend_glob_aggregates`
-Expected: FAIL - 현재는 default 백엔드만 검색
-
-**Step 3: glob() 구현 수정**
-
-`composite.rs`의 glob() 메서드 (135-144 라인)를 다음으로 교체:
-
-```rust
-async fn glob(&self, pattern: &str, base_path: &str) -> Result<Vec<FileInfo>, BackendError> {
-    // 특정 라우트 경로인 경우 해당 백엔드만 검색
-    for route in &self.routes {
-        let route_prefix = route.prefix.trim_end_matches('/');
-        if base_path.starts_with(route_prefix) &&
-           (base_path.len() == route_prefix.len() || base_path[route_prefix.len()..].starts_with('/')) {
-            let (backend, stripped) = self.get_backend_and_path(base_path);
-            let mut results = backend.glob(pattern, &stripped).await?;
-
-            for info in &mut results {
-                info.path = self.restore_prefix(&info.path, base_path);
-            }
-            return Ok(results);
-        }
-    }
-
-    // 루트 또는 라우트되지 않은 경로 - 모든 백엔드에서 집계
-    let mut all_results = self.default.glob(pattern, base_path).await?;
-
-    for route in &self.routes {
-        let mut route_results = route.backend.glob(pattern, "/").await?;
-        for info in &mut route_results {
-            let prefix = route.prefix.trim_end_matches('/');
-            info.path = format!("{}{}", prefix, info.path);
-        }
-        all_results.extend(route_results);
-    }
-
-    all_results.sort_by(|a, b| a.path.cmp(&b.path));
-    Ok(all_results)
-}
-```
-
-**Step 4: 테스트 통과 확인**
-
-Run: `source ~/.cargo/env && cd rust-research-agent/crates/rig-deepagents && cargo test test_composite_backend_glob`
-Expected: PASS
-
---
-
-### Task 1.4: AgentState.clone() extensions 경고 로그 추가
-
-**Files:**
- Modify: `rust-research-agent/crates/rig-deepagents/src/state.rs:169-180`
- Modify: `rust-research-agent/crates/rig-deepagents/Cargo.toml` (tracing 의존성 확인)
-
-**문제:** clone() 시 extensions가 빈 HashMap으로 초기화되어 미들웨어 상태 손실
-
-**Step 1: 현재 Clone 구현 확인**
-
-`state.rs`의 Clone 구현 확인:
-
-```rust
-impl Clone for AgentState {
-    fn clone(&self) -> Self {
-        Self {
-            messages: self.messages.clone(),
-            todos: self.todos.clone(),
-            files: self.files.clone(),
-            structured_response: self.structured_response.clone(),
-            extensions: HashMap::new(),  // 데이터 손실!
-        }
-    }
-}
-```
-
-**Step 2: tracing import 추가**
-
-`state.rs` 상단에 추가:
-
-```rust
-use tracing::warn;
-```
-
-**Step 3: Clone 구현에 경고 로그 추가**
-
-`state.rs`의 Clone 구현을 다음으로 교체:
-
-```rust
-impl Clone for AgentState {
-    fn clone(&self) -> Self {
-        // extensions가 비어있지 않으면 경고
-        if !self.extensions.is_empty() {
-            warn!(
-                extension_count = self.extensions.len(),
-                extension_keys = ?self.extensions.keys().collect::<Vec<_>>(),
-                "AgentState.clone() called with non-empty extensions - extensions will be lost"
-            );
-        }
-
-        Self {
-            messages: self.messages.clone(),
-            todos: self.todos.clone(),
-            files: self.files.clone(),
-            structured_response: self.structured_response.clone(),
-            // extensions는 Box<dyn Any>를 clone할 수 없어서 빈 상태로 시작
-            // 향후 Arc<RwLock<_>> 패턴으로 개선 고려
-            extensions: HashMap::new(),
-        }
-    }
-}
-```
-
-**Step 4: 컴파일 확인**
-
-Run: `source ~/.cargo/env && cd rust-research-agent/crates/rig-deepagents && cargo check`
-Expected: 성공 (경고만 있을 수 있음)
-
---
-
-## Phase 2: MEDIUM Severity 수정
-
-### Task 2.1: 경로 정규화 유틸리티 추가 및 적용
-
-**Files:**
- Create: `rust-research-agent/crates/rig-deepagents/src/backends/path_utils.rs`
- Modify: `rust-research-agent/crates/rig-deepagents/src/backends/mod.rs`
- Modify: `rust-research-agent/crates/rig-deepagents/src/backends/memory.rs`
-
-**Step 1: path_utils.rs 생성**
-
-```rust
-// src/backends/path_utils.rs
-//! 경로 정규화 유틸리티
-//!
-//! 모든 백엔드에서 일관된 경로 처리를 위한 헬퍼 함수들
-
-use crate::error::BackendError;
-
-/// 경로 정규화
-/// - 앞에 `/` 추가
-/// - 연속된 슬래시 제거 (`//` -> `/`)
-/// - 후행 슬래시 제거 (루트 제외)
-/// - 경로 순회 공격 방지
-pub fn normalize_path(path: &str) -> Result<String, BackendError> {
-    // 경로 순회 공격 방지
-    if path.contains("..") || path.starts_with("~") {
-        return Err(BackendError::PathTraversal(path.to_string()));
-    }
-
-    // 빈 경로는 루트로
-    if path.is_empty() {
-        return Ok("/".to_string());
-    }
-
-    // 연속된 슬래시 제거
-    let parts: Vec<&str> = path.split('/')
-        .filter(|p| !p.is_empty())
-        .collect();
-
-    if parts.is_empty() {
-        return Ok("/".to_string());
-    }
-
-    Ok(format!("/{}", parts.join("/")))
-}
-
-/// 경로가 base_path 하위에 있는지 확인
-/// `/dir`은 `/dir2`와 매칭되지 않음 (정확한 디렉토리 경계 확인)
-pub fn is_under_path(path: &str, base_path: &str) -> bool {
-    let normalized_base = base_path.trim_end_matches('/');
-
-    if normalized_base.is_empty() || normalized_base == "/" {
-        return true;  // 루트는 모든 경로 포함
-    }
-
-    if path == normalized_base {
-        return true;  // 정확히 같은 경로
-    }
-
-    // base_path + "/" 로 시작해야 함
-    path.starts_with(&format!("{}/", normalized_base))
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_normalize_path_basic() {
-        assert_eq!(normalize_path("/test.txt").unwrap(), "/test.txt");
-        assert_eq!(normalize_path("test.txt").unwrap(), "/test.txt");
-        assert_eq!(normalize_path("/dir/file.txt").unwrap(), "/dir/file.txt");
-    }
-
-    #[test]
-    fn test_normalize_path_double_slashes() {
-        assert_eq!(normalize_path("/dir//file.txt").unwrap(), "/dir/file.txt");
-        assert_eq!(normalize_path("//dir///file.txt").unwrap(), "/dir/file.txt");
-    }
-
-    #[test]
-    fn test_normalize_path_trailing_slash() {
-        assert_eq!(normalize_path("/dir/").unwrap(), "/dir");
-        assert_eq!(normalize_path("/").unwrap(), "/");
-    }
-
-    #[test]
-    fn test_normalize_path_traversal_attack() {
-        assert!(normalize_path("../etc/passwd").is_err());
-        assert!(normalize_path("/dir/../etc/passwd").is_err());
-        assert!(normalize_path("~/.ssh/id_rsa").is_err());
-    }
-
-    #[test]
-    fn test_is_under_path() {
-        assert!(is_under_path("/dir/file.txt", "/dir"));
-        assert!(is_under_path("/dir/sub/file.txt", "/dir"));
-        assert!(is_under_path("/dir", "/dir"));
-        assert!(is_under_path("/anything", "/"));
-
-        // /dir 은 /dir2 에 포함되지 않음
-        assert!(!is_under_path("/dir2/file.txt", "/dir"));
-        assert!(!is_under_path("/directory/file.txt", "/dir"));
-    }
-}
-```
-
-**Step 2: mod.rs에 모듈 추가**
-
-`backends/mod.rs` 수정:
-
-```rust
-// src/backends/mod.rs
-//! 백엔드 모듈
-//!
-//! 파일시스템 추상화를 제공합니다.
-
-pub mod protocol;
-pub mod memory;
-pub mod filesystem;
-pub mod composite;
-pub mod path_utils;
-
-pub use protocol::{Backend, FileInfo, GrepMatch};
-pub use memory::MemoryBackend;
-pub use filesystem::FilesystemBackend;
-pub use composite::CompositeBackend;
-pub use path_utils::{normalize_path, is_under_path};
-```
-
-**Step 3: MemoryBackend에서 path_utils 사용**
-
-`memory.rs` 상단 import 추가:
-
-```rust
-use super::path_utils::{normalize_path, is_under_path};
-```
-
-`memory.rs`의 `validate_path` 함수를 제거하고 `normalize_path` 사용으로 교체:
-
-기존:
-```rust
-fn validate_path(path: &str) -> Result<String, BackendError> { ... }
-```
-
-모든 `Self::validate_path(path)?` 호출을 `normalize_path(path)?`로 교체.
-
-**Step 4: 테스트 실행**
-
-Run: `source ~/.cargo/env && cd rust-research-agent/crates/rig-deepagents && cargo test path_utils`
-Expected: PASS
-
-Run: `source ~/.cargo/env && cd rust-research-agent/crates/rig-deepagents && cargo test`
-Expected: 모든 테스트 PASS
-
---
-
-### Task 2.2: CompositeBackend 라우트 매칭 후행 슬래시 정규화
-
-**Files:**
- Modify: `rust-research-agent/crates/rig-deepagents/src/backends/composite.rs:48-61`
-
-**문제:** `/memories` 경로가 `/memories/` 라우트와 매칭되지 않음
-
-**Step 1: 실패하는 테스트 작성**
-
-```rust
-#[tokio::test]
-async fn test_composite_backend_route_matching_without_trailing_slash() {
-    let default = Arc::new(MemoryBackend::new());
-    let memories = Arc::new(MemoryBackend::new());
-
-    let composite = CompositeBackend::new(default.clone())
-        .with_route("/memories/", memories.clone());
-
-    // 후행 슬래시 없이도 라우트되어야 함
-    composite.write("/memories/notes.txt", "my notes").await.unwrap();
-
-    // /memories 경로로 읽기 (후행 슬래시 없음)
-    let files = composite.ls("/memories").await.unwrap();
-    assert!(!files.is_empty(), "Should find files under /memories route");
-}
-```
-
-**Step 2: get_backend_and_path() 수정**
-
-`composite.rs`의 `get_backend_and_path` 메서드를 다음으로 교체:
-
-```rust
-/// 경로에 맞는 백엔드와 변환된 경로 반환
-fn get_backend_and_path(&self, path: &str) -> (Arc<dyn Backend>, String) {
-    // 경로 정규화 (후행 슬래시 제거)
-    let normalized_path = path.trim_end_matches('/');
-
-    for route in &self.routes {
-        let route_prefix = route.prefix.trim_end_matches('/');
-
-        // 정확히 일치하거나 route_prefix/ 로 시작하는 경우
-        if normalized_path == route_prefix ||
-           normalized_path.starts_with(&format!("{}/", route_prefix)) {
-            let suffix = if normalized_path == route_prefix {
-                ""
-            } else {
-                &normalized_path[route_prefix.len()..]
-            };
-
-            let stripped = if suffix.is_empty() {
-                "/".to_string()
-            } else {
-                suffix.to_string()
-            };
-            return (route.backend.clone(), stripped);
-        }
-    }
-    (self.default.clone(), path.to_string())
-}
-```
-
-**Step 3: 테스트 통과 확인**
-
-Run: `source ~/.cargo/env && cd rust-research-agent/crates/rig-deepagents && cargo test test_composite_backend_route_matching`
-Expected: PASS
-
---
-
-### Task 2.3: FilesystemBackend grep에서 tokio::fs 사용
-
-**Files:**
- Modify: `rust-research-agent/crates/rig-deepagents/src/backends/filesystem.rs:283-300`
-
-**문제:** async 함수 내에서 sync `std::fs::read_to_string` 사용으로 런타임 블로킹
-
-**Step 1: grep() 내부 파일 읽기를 async로 변경**
-
-`filesystem.rs`의 grep 메서드에서 파일 읽기 부분을 수정.
-
-기존:
-```rust
-let content = match std::fs::read_to_string(entry.path()) {
-    Ok(c) => c,
-    Err(_) => continue,
-};
-```
-
-수정:
-```rust
-let content = match fs::read_to_string(entry.path()).await {
-    Ok(c) => c,
-    Err(e) => {
-        tracing::debug!(path = ?entry.path(), error = %e, "Skipping file in grep due to read error");
-        continue;
-    }
-};
-```
-
-**Step 2: filesystem.rs 상단에 tracing import 추가**
-
-```rust
-use tracing;
-```
-
-**Step 3: 컴파일 확인**
-
-Run: `source ~/.cargo/env && cd rust-research-agent/crates/rig-deepagents && cargo check`
-Expected: 성공
-
---
-
-### Task 2.4: max_recursion 설정 가능하게 변경
-
-**Files:**
- Modify: `rust-research-agent/crates/rig-deepagents/src/runtime.rs:41-48`
-
-**Step 1: RuntimeConfig::new()에 기본값을 Python과 동일하게 수정**
-
-```rust
-impl RuntimeConfig {
-    pub fn new() -> Self {
-        Self {
-            debug: false,
-            max_recursion: 100,  // Python 기본값에 가깝게 조정 (1000은 너무 높음)
-            current_recursion: 0,
-        }
-    }
-
-    /// 커스텀 재귀 제한으로 생성
-    pub fn with_max_recursion(max_recursion: usize) -> Self {
-        Self {
-            debug: false,
-            max_recursion,
-            current_recursion: 0,
-        }
-    }
-}
-```
-
-**Step 2: 테스트 수정**
-
-`runtime.rs`의 기존 테스트 `test_recursion_limit`를 수정:
-
-```rust
-#[test]
-fn test_recursion_limit() {
-    let state = AgentState::new();
-    let backend = Arc::new(MemoryBackend::new());
-
-    // 커스텀 재귀 제한 사용
-    let config = RuntimeConfig::with_max_recursion(10);
-    let mut runtime = ToolRuntime::new(state, backend).with_config(config);
-
-    for _ in 0..10 {
-        runtime = runtime.with_increased_recursion();
-    }
-
-    assert!(runtime.is_recursion_limit_exceeded());
-}
-
-#[test]
-fn test_default_recursion_limit() {
-    let state = AgentState::new();
-    let backend = Arc::new(MemoryBackend::new());
-    let runtime = ToolRuntime::new(state, backend);
-
-    // 기본 제한은 100
-    assert_eq!(runtime.config().max_recursion, 100);
-}
-```
-
-**Step 3: 테스트 통과 확인**
-
-Run: `source ~/.cargo/env && cd rust-research-agent/crates/rig-deepagents && cargo test runtime`
-Expected: PASS
-
---
-
-## Phase 3: 최종 검증
-
-### Task 3.1: 전체 테스트 실행 및 검증
-
-**Step 1: 전체 테스트 실행**
-
-Run: `source ~/.cargo/env && cd rust-research-agent/crates/rig-deepagents && cargo test`
-Expected: 모든 테스트 PASS (30개 이상)
-
-**Step 2: Clippy 린트 확인**
-
-Run: `source ~/.cargo/env && cd rust-research-agent/crates/rig-deepagents && cargo clippy -- -D warnings`
-Expected: 경고 없음 (또는 허용 가능한 경고만)
-
-**Step 3: 문서 주석 확인**
-
-Run: `source ~/.cargo/env && cd rust-research-agent/crates/rig-deepagents && cargo doc --no-deps`
-Expected: 문서 생성 성공
-
---
-
-### Task 3.2: 변경사항 커밋
-
-**Step 1: 변경 파일 확인**
-
-Run: `git status`
-Expected: 수정된 Rust 파일들 표시
-
-**Step 2: 커밋**
-
-```bash
-git add rust-research-agent/crates/rig-deepagents/
-git commit -m "fix: address HIGH and MEDIUM severity issues from Codex/Code Review
-
-HIGH fixes:
- glob() now respects base_path filtering (security fix)
- CompositeBackend write/edit restores files_update key paths
- CompositeBackend glob aggregates results from all backends
- AgentState.clone() logs warning when extensions lost
-
-MEDIUM fixes:
- Add centralized path_utils for consistent path normalization
- CompositeBackend route matching normalizes trailing slashes
- FilesystemBackend grep uses async tokio::fs
- RuntimeConfig.max_recursion increased to 100 (was 10)
-
-Verified by: Codex CLI (gpt-5.2-codex), Code Reviewer subagent"
-```
-
---
-
-## 수정 우선순위 요약
-
-| 우선순위 | Task | 예상 시간 | 위험도 |
-|---------|------|----------|--------|
-| 1 | Task 1.1: glob base_path 필터 | 5분 | Low |
-| 2 | Task 1.2: files_update 키 복원 | 5분 | Low |
-| 3 | Task 1.3: glob 집계 로직 | 10분 | Medium |
-| 4 | Task 1.4: clone extensions 경고 | 5분 | Low |
-| 5 | Task 2.1: path_utils 유틸리티 | 15분 | Low |
-| 6 | Task 2.2: 라우트 매칭 정규화 | 5분 | Low |
-| 7 | Task 2.3: async grep | 5분 | Low |
-| 8 | Task 2.4: max_recursion 설정 | 5분 | Low |
-| 9 | Task 3.1: 최종 검증 | 5분 | N/A |
-| 10 | Task 3.2: 커밋 | 2분 | N/A |
-
-**총 예상 시간:** 약 60분
--- a/docs/plans/2026-01-01-rig-deepagents-phase5-6.md
+++ b/docs/plans/2026-01-01-rig-deepagents-phase5-6.md
--- a/docs/plans/2026-01-01-rig-deepagents-phase7-9.md
+++ b/docs/plans/2026-01-01-rig-deepagents-phase7-9.md
--- a/rust-research-agent/crates/rig-deepagents/Cargo.toml
+++ b/rust-research-agent/crates/rig-deepagents/Cargo.toml
@@ -5,6 +5,12 @@ edition = "2021"
 description = "DeepAgents-style middleware system for Rig framework"
 license = "MIT"

+[features]
+default = []
+checkpointer-sqlite = []
+checkpointer-redis = []
+checkpointer-postgres = []
+
 [dependencies]
 rig-core = { version = "0.27", features = ["derive"] }
 tokio = { version = "1", features = ["full", "sync"] }
@@ -20,12 +26,18 @@ uuid = { version = "1", features = ["v4"] }
 walkdir = "2"  # Added: needed for FilesystemBackend recursive traversal
 futures = "0.3"  # Added: needed for LLMProvider streaming support

+# Pregel runtime dependencies
+num_cpus = "1"  # For default parallelism configuration
+humantime-serde = "1"  # For Duration serialization in configs
+zstd = "0.13"  # For checkpoint compression
+
 [dev-dependencies]
 # OpenAI support is built into rig-core
 tokio-test = "0.4"
 dotenv = "0.15"
 criterion = { version = "0.5", features = ["async_tokio"] }
 tempfile = "3"  # For filesystem tests
+static_assertions = "1"  # For compile-time trait checks

 [[bench]]
 name = "middleware_benchmark"
--- a/rust-research-agent/crates/rig-deepagents/src/lib.rs
+++ b/rust-research-agent/crates/rig-deepagents/src/lib.rs
@@ -32,6 +32,8 @@ pub mod runtime;
 pub mod executor;
 pub mod tools;
 pub mod llm;
+pub mod pregel;
+pub mod workflow;

 // Re-exports for convenience
 pub use error::{BackendError, MiddlewareError, DeepAgentError, WriteResult, EditResult};
--- a/rust-research-agent/crates/rig-deepagents/src/pregel/checkpoint/file.rs
+++ b/rust-research-agent/crates/rig-deepagents/src/pregel/checkpoint/file.rs
@@ -0,0 +1,438 @@
+//! File-based Checkpointer Implementation
+//!
+//! Stores checkpoints as JSON files in a directory structure.
+//! Supports optional compression via zstd for reduced storage.
+//!
+//! # Directory Structure
+//!
+//! ```text
+//! checkpoints/
+//! └── {workflow_id}/
+//!     ├── checkpoint_00001.json[.zst]
+//!     ├── checkpoint_00005.json[.zst]
+//!     └── checkpoint_00010.json[.zst]
+//! ```
+
+use async_trait::async_trait;
+use serde::{Deserialize, Serialize};
+use std::io::Write;
+use std::path::{Path, PathBuf};
+use tokio::fs;
+use tokio::io::{AsyncReadExt, AsyncWriteExt};
+
+use super::{Checkpoint, Checkpointer};
+use crate::pregel::error::PregelError;
+use crate::pregel::state::WorkflowState;
+
+/// File-based checkpointer that stores checkpoints as JSON files.
+///
+/// Each checkpoint is stored in a separate file, named by superstep number.
+/// Atomic writes are ensured via temporary file + rename pattern.
+#[derive(Debug)]
+pub struct FileCheckpointer {
+    /// Workflow-specific subdirectory
+    workflow_path: PathBuf,
+    /// Whether to compress checkpoints with zstd
+    compression: bool,
+}
+
+impl FileCheckpointer {
+    /// Create a new file-based checkpointer.
+    ///
+    /// # Arguments
+    ///
+    /// * `base_path` - Base directory for storing checkpoints
+    /// * `workflow_id` - Unique identifier for this workflow
+    /// * `compression` - Whether to compress checkpoint data
+    pub fn new(base_path: impl Into<PathBuf>, workflow_id: impl AsRef<str>, compression: bool) -> Self {
+        let base_path = base_path.into();
+        let workflow_path = base_path.join(workflow_id.as_ref());
+
+        Self {
+            workflow_path,
+            compression,
+        }
+    }
+
+    /// Get the file path for a checkpoint at a given superstep
+    fn checkpoint_path(&self, superstep: usize) -> PathBuf {
+        let filename = if self.compression {
+            format!("checkpoint_{:05}.json.zst", superstep)
+        } else {
+            format!("checkpoint_{:05}.json", superstep)
+        };
+        self.workflow_path.join(filename)
+    }
+
+    /// Get the temporary file path for atomic writes
+    fn temp_path(&self, superstep: usize) -> PathBuf {
+        let filename = format!("checkpoint_{:05}.tmp", superstep);
+        self.workflow_path.join(filename)
+    }
+
+    /// Ensure the checkpoint directory exists
+    async fn ensure_dir(&self) -> Result<(), PregelError> {
+        fs::create_dir_all(&self.workflow_path)
+            .await
+            .map_err(|e| PregelError::checkpoint_error(format!("Failed to create directory: {}", e)))
+    }
+
+    /// Compress data using zstd
+    fn compress(data: &[u8]) -> Result<Vec<u8>, PregelError> {
+        let mut encoder = zstd::stream::Encoder::new(Vec::new(), 3)
+            .map_err(|e| PregelError::checkpoint_error(format!("Compression init failed: {}", e)))?;
+        encoder
+            .write_all(data)
+            .map_err(|e| PregelError::checkpoint_error(format!("Compression write failed: {}", e)))?;
+        encoder
+            .finish()
+            .map_err(|e| PregelError::checkpoint_error(format!("Compression finish failed: {}", e)))
+    }
+
+    /// Decompress data using zstd
+    fn decompress(data: &[u8]) -> Result<Vec<u8>, PregelError> {
+        zstd::stream::decode_all(data)
+            .map_err(|e| PregelError::checkpoint_error(format!("Decompression failed: {}", e)))
+    }
+
+    /// Parse superstep number from filename
+    fn parse_superstep(path: &Path) -> Option<usize> {
+        let filename = path.file_name()?.to_str()?;
+        if !filename.starts_with("checkpoint_") {
+            return None;
+        }
+
+        // Extract the number between "checkpoint_" and the extension
+        let num_part = filename
+            .strip_prefix("checkpoint_")?
+            .split('.')
+            .next()?;
+
+        num_part.parse().ok()
+    }
+
+    /// List all superstep numbers (non-generic helper method)
+    async fn list_supersteps(&self) -> Result<Vec<usize>, PregelError> {
+        if !self.workflow_path.exists() {
+            return Ok(Vec::new());
+        }
+
+        let mut entries = fs::read_dir(&self.workflow_path)
+            .await
+            .map_err(|e| PregelError::checkpoint_error(format!("Failed to read directory: {}", e)))?;
+
+        let mut supersteps = Vec::new();
+
+        while let Some(entry) = entries
+            .next_entry()
+            .await
+            .map_err(|e| PregelError::checkpoint_error(format!("Failed to read entry: {}", e)))?
+        {
+            if let Some(superstep) = Self::parse_superstep(&entry.path()) {
+                supersteps.push(superstep);
+            }
+        }
+
+        supersteps.sort();
+        Ok(supersteps)
+    }
+}
+
+#[async_trait]
+impl<S> Checkpointer<S> for FileCheckpointer
+where
+    S: WorkflowState + Clone + Send + Sync + Serialize + for<'de> Deserialize<'de>,
+{
+    async fn save(&self, checkpoint: &Checkpoint<S>) -> Result<(), PregelError> {
+        self.ensure_dir().await?;
+
+        // Serialize checkpoint
+        let json = serde_json::to_vec_pretty(checkpoint)
+            .map_err(|e| PregelError::checkpoint_error(format!("Serialization failed: {}", e)))?;
+
+        // Optionally compress
+        let data = if self.compression {
+            Self::compress(&json)?
+        } else {
+            json
+        };
+
+        // Write to temp file first (atomic write pattern)
+        let temp_path = self.temp_path(checkpoint.superstep);
+        let final_path = self.checkpoint_path(checkpoint.superstep);
+
+        let mut file = fs::File::create(&temp_path)
+            .await
+            .map_err(|e| PregelError::checkpoint_error(format!("Failed to create temp file: {}", e)))?;
+
+        file.write_all(&data)
+            .await
+            .map_err(|e| PregelError::checkpoint_error(format!("Failed to write data: {}", e)))?;
+
+        file.sync_all()
+            .await
+            .map_err(|e| PregelError::checkpoint_error(format!("Failed to sync file: {}", e)))?;
+
+        // Atomic rename
+        fs::rename(&temp_path, &final_path)
+            .await
+            .map_err(|e| PregelError::checkpoint_error(format!("Failed to rename file: {}", e)))?;
+
+        Ok(())
+    }
+
+    async fn load(&self, superstep: usize) -> Result<Option<Checkpoint<S>>, PregelError> {
+        let path = self.checkpoint_path(superstep);
+
+        if !path.exists() {
+            return Ok(None);
+        }
+
+        let mut file = fs::File::open(&path)
+            .await
+            .map_err(|e| PregelError::checkpoint_error(format!("Failed to open file: {}", e)))?;
+
+        let mut data = Vec::new();
+        file.read_to_end(&mut data)
+            .await
+            .map_err(|e| PregelError::checkpoint_error(format!("Failed to read file: {}", e)))?;
+
+        // Decompress if needed
+        let json = if self.compression {
+            Self::decompress(&data)?
+        } else {
+            data
+        };
+
+        let checkpoint: Checkpoint<S> = serde_json::from_slice(&json)
+            .map_err(|e| PregelError::checkpoint_error(format!("Deserialization failed: {}", e)))?;
+
+        Ok(Some(checkpoint))
+    }
+
+    async fn latest(&self) -> Result<Option<Checkpoint<S>>, PregelError> {
+        // Use our own list_supersteps to avoid type inference issues
+        let supersteps = self.list_supersteps().await?;
+        match supersteps.last() {
+            Some(&superstep) => self.load(superstep).await,
+            None => Ok(None),
+        }
+    }
+
+    async fn list(&self) -> Result<Vec<usize>, PregelError> {
+        self.list_supersteps().await
+    }
+
+    async fn delete(&self, superstep: usize) -> Result<(), PregelError> {
+        let path = self.checkpoint_path(superstep);
+
+        if path.exists() {
+            fs::remove_file(&path)
+                .await
+                .map_err(|e| PregelError::checkpoint_error(format!("Failed to delete file: {}", e)))?;
+        }
+
+        Ok(())
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::pregel::state::UnitState;
+    use crate::pregel::vertex::{VertexId, VertexState};
+    use std::collections::HashMap;
+    use tempfile::tempdir;
+
+    #[tokio::test]
+    async fn test_file_checkpointer_save_load() {
+        let temp_dir = tempdir().unwrap();
+        let checkpointer = FileCheckpointer::new(temp_dir.path(), "test-workflow", false);
+
+        let checkpoint = Checkpoint::new(
+            "test-workflow",
+            5,
+            UnitState,
+            HashMap::new(),
+            HashMap::new(),
+        );
+
+        checkpointer.save(&checkpoint).await.unwrap();
+        let loaded: Checkpoint<UnitState> = checkpointer.load(5).await.unwrap().unwrap();
+
+        assert_eq!(loaded.superstep, 5);
+        assert_eq!(loaded.workflow_id, "test-workflow");
+    }
+
+    #[tokio::test]
+    async fn test_file_checkpointer_with_compression() {
+        let temp_dir = tempdir().unwrap();
+        let checkpointer = FileCheckpointer::new(temp_dir.path(), "compressed-workflow", true);
+
+        // Create a checkpoint with some data
+        let mut vertex_states = HashMap::new();
+        vertex_states.insert(VertexId::new("vertex1"), VertexState::Active);
+        vertex_states.insert(VertexId::new("vertex2"), VertexState::Halted);
+
+        let checkpoint = Checkpoint::new(
+            "compressed-workflow",
+            10,
+            UnitState,
+            vertex_states,
+            HashMap::new(),
+        );
+
+        checkpointer.save(&checkpoint).await.unwrap();
+
+        // Verify the file is compressed (has .zst extension)
+        let path = temp_dir.path().join("compressed-workflow/checkpoint_00010.json.zst");
+        assert!(path.exists());
+
+        // Load and verify
+        let loaded: Checkpoint<UnitState> = checkpointer.load(10).await.unwrap().unwrap();
+        assert_eq!(loaded.superstep, 10);
+        assert_eq!(loaded.vertex_states.len(), 2);
+    }
+
+    #[tokio::test]
+    async fn test_file_checkpointer_load_nonexistent() {
+        let temp_dir = tempdir().unwrap();
+        let checkpointer = FileCheckpointer::new(temp_dir.path(), "test-workflow", false);
+
+        let result: Option<Checkpoint<UnitState>> = checkpointer.load(999).await.unwrap();
+        assert!(result.is_none());
+    }
+
+    #[tokio::test]
+    async fn test_file_checkpointer_list() {
+        let temp_dir = tempdir().unwrap();
+        let checkpointer = FileCheckpointer::new(temp_dir.path(), "test-workflow", false);
+
+        // Save checkpoints at supersteps 5, 1, 10 (out of order)
+        for superstep in [5, 1, 10] {
+            let checkpoint = Checkpoint::new(
+                "test-workflow",
+                superstep,
+                UnitState,
+                HashMap::new(),
+                HashMap::new(),
+            );
+            checkpointer.save(&checkpoint).await.unwrap();
+        }
+
+        let list = <FileCheckpointer as Checkpointer<UnitState>>::list(&checkpointer).await.unwrap();
+        assert_eq!(list, vec![1, 5, 10]); // Should be sorted
+    }
+
+    #[tokio::test]
+    async fn test_file_checkpointer_latest() {
+        let temp_dir = tempdir().unwrap();
+        let checkpointer = FileCheckpointer::new(temp_dir.path(), "test-workflow", false);
+
+        // Save checkpoints
+        for superstep in [1, 5, 3] {
+            let checkpoint = Checkpoint::new(
+                "test-workflow",
+                superstep,
+                UnitState,
+                HashMap::new(),
+                HashMap::new(),
+            );
+            checkpointer.save(&checkpoint).await.unwrap();
+        }
+
+        let latest: Checkpoint<UnitState> = checkpointer.latest().await.unwrap().unwrap();
+        assert_eq!(latest.superstep, 5);
+    }
+
+    #[tokio::test]
+    async fn test_file_checkpointer_delete() {
+        let temp_dir = tempdir().unwrap();
+        let checkpointer = FileCheckpointer::new(temp_dir.path(), "test-workflow", false);
+
+        let checkpoint = Checkpoint::new(
+            "test-workflow",
+            5,
+            UnitState,
+            HashMap::new(),
+            HashMap::new(),
+        );
+        checkpointer.save(&checkpoint).await.unwrap();
+
+        // Verify exists
+        let exists: Option<Checkpoint<UnitState>> = checkpointer.load(5).await.unwrap();
+        assert!(exists.is_some());
+
+        // Delete
+        <FileCheckpointer as Checkpointer<UnitState>>::delete(&checkpointer, 5).await.unwrap();
+
+        // Verify gone
+        let gone: Option<Checkpoint<UnitState>> = checkpointer.load(5).await.unwrap();
+        assert!(gone.is_none());
+    }
+
+    #[tokio::test]
+    async fn test_file_checkpointer_prune() {
+        let temp_dir = tempdir().unwrap();
+        let checkpointer = FileCheckpointer::new(temp_dir.path(), "test-workflow", false);
+
+        // Create 5 checkpoints
+        for superstep in 1..=5 {
+            let checkpoint = Checkpoint::new(
+                "test-workflow",
+                superstep,
+                UnitState,
+                HashMap::new(),
+                HashMap::new(),
+            );
+            checkpointer.save(&checkpoint).await.unwrap();
+        }
+
+        // Prune to keep only 2
+        let deleted = <FileCheckpointer as Checkpointer<UnitState>>::prune(&checkpointer, 2).await.unwrap();
+        assert_eq!(deleted, 3);
+
+        let remaining = <FileCheckpointer as Checkpointer<UnitState>>::list(&checkpointer).await.unwrap();
+        assert_eq!(remaining, vec![4, 5]);
+    }
+
+    #[tokio::test]
+    async fn test_file_checkpointer_atomic_write() {
+        let temp_dir = tempdir().unwrap();
+        let checkpointer = FileCheckpointer::new(temp_dir.path(), "test-workflow", false);
+
+        let checkpoint = Checkpoint::new(
+            "test-workflow",
+            7,
+            UnitState,
+            HashMap::new(),
+            HashMap::new(),
+        );
+
+        checkpointer.save(&checkpoint).await.unwrap();
+
+        // Verify no temp file remains
+        let temp_path = temp_dir.path().join("test-workflow/checkpoint_00007.tmp");
+        assert!(!temp_path.exists());
+
+        // Verify final file exists
+        let final_path = temp_dir.path().join("test-workflow/checkpoint_00007.json");
+        assert!(final_path.exists());
+    }
+
+    #[test]
+    fn test_parse_superstep() {
+        assert_eq!(
+            FileCheckpointer::parse_superstep(Path::new("checkpoint_00005.json")),
+            Some(5)
+        );
+        assert_eq!(
+            FileCheckpointer::parse_superstep(Path::new("checkpoint_00123.json.zst")),
+            Some(123)
+        );
+        assert_eq!(
+            FileCheckpointer::parse_superstep(Path::new("other_file.json")),
+            None
+        );
+    }
+}
--- a/rust-research-agent/crates/rig-deepagents/src/pregel/checkpoint/mod.rs
+++ b/rust-research-agent/crates/rig-deepagents/src/pregel/checkpoint/mod.rs
@@ -0,0 +1,551 @@
+//! Checkpointing System for Pregel Runtime
+//!
+//! Provides durable state persistence for fault-tolerant workflow execution.
+//! Checkpoints capture the complete workflow state at superstep boundaries,
+//! enabling recovery from failures without losing progress.
+//!
+//! # Architecture
+//!
+//! ```text
+//! ┌─────────────────────────────────────────────────────────────┐
+//! │                    Checkpointer                              │
+//! │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐    │
+//! │  │   File   │  │  SQLite  │  │  Redis   │  │ Postgres │    │
+//! │  └──────────┘  └──────────┘  └──────────┘  └──────────┘    │
+//! │        │            │            │            │              │
+//! │        └────────────┴────────────┴────────────┘              │
+//! │                          │                                    │
+//! │                          ▼                                    │
+//! │              Checkpoint<S: WorkflowState>                    │
+//! └─────────────────────────────────────────────────────────────┘
+//! ```
+//!
+//! # Usage
+//!
+//! ```ignore
+//! use rig_deepagents::pregel::checkpoint::{Checkpointer, CheckpointerConfig, create_checkpointer};
+//!
+//! // Create a file-based checkpointer
+//! let config = CheckpointerConfig::File {
+//!     path: PathBuf::from("./checkpoints"),
+//!     compression: true,
+//! };
+//! let checkpointer = create_checkpointer::<MyState>(config)?;
+//!
+//! // Save a checkpoint
+//! checkpointer.save(&checkpoint).await?;
+//!
+//! // Load the latest checkpoint
+//! if let Some(checkpoint) = checkpointer.latest().await? {
+//!     // Resume from checkpoint
+//! }
+//! ```
+
+mod file;
+
+pub use file::FileCheckpointer;
+
+use async_trait::async_trait;
+use chrono::{DateTime, Utc};
+use serde::{Deserialize, Serialize};
+use std::collections::HashMap;
+use std::fmt::Debug;
+use std::path::PathBuf;
+
+use super::error::PregelError;
+use super::message::WorkflowMessage;
+use super::state::WorkflowState;
+use super::vertex::{VertexId, VertexState};
+
+/// A checkpoint captures the complete workflow state at a superstep boundary.
+///
+/// Checkpoints are the foundation of fault tolerance in the Pregel runtime.
+/// They capture:
+/// - The workflow state (user-defined data)
+/// - The state of each vertex (Active, Halted, Completed)
+/// - Pending messages that haven't been delivered yet
+/// - Metadata for debugging and recovery
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct Checkpoint<S>
+where
+    S: WorkflowState,
+{
+    /// Unique identifier for the workflow instance
+    pub workflow_id: String,
+
+    /// The superstep number when this checkpoint was created
+    pub superstep: usize,
+
+    /// The workflow state at this superstep
+    pub state: S,
+
+    /// The state of each vertex (Active, Halted, Completed)
+    pub vertex_states: HashMap<VertexId, VertexState>,
+
+    /// Pending messages waiting to be delivered in the next superstep
+    pub pending_messages: HashMap<VertexId, Vec<WorkflowMessage>>,
+
+    /// When this checkpoint was created
+    pub timestamp: DateTime<Utc>,
+
+    /// Optional metadata for debugging or external tools
+    #[serde(default)]
+    pub metadata: HashMap<String, String>,
+}
+
+impl<S> Checkpoint<S>
+where
+    S: WorkflowState,
+{
+    /// Create a new checkpoint
+    pub fn new(
+        workflow_id: impl Into<String>,
+        superstep: usize,
+        state: S,
+        vertex_states: HashMap<VertexId, VertexState>,
+        pending_messages: HashMap<VertexId, Vec<WorkflowMessage>>,
+    ) -> Self {
+        Self {
+            workflow_id: workflow_id.into(),
+            superstep,
+            state,
+            vertex_states,
+            pending_messages,
+            timestamp: Utc::now(),
+            metadata: HashMap::new(),
+        }
+    }
+
+    /// Add metadata to this checkpoint
+    pub fn with_metadata(mut self, key: impl Into<String>, value: impl Into<String>) -> Self {
+        self.metadata.insert(key.into(), value.into());
+        self
+    }
+
+    /// Check if this checkpoint is empty (no vertex states or messages)
+    pub fn is_empty(&self) -> bool {
+        self.vertex_states.is_empty() && self.pending_messages.is_empty()
+    }
+
+    /// Get the total number of pending messages across all vertices
+    pub fn pending_message_count(&self) -> usize {
+        self.pending_messages.values().map(|v| v.len()).sum()
+    }
+}
+
+/// Trait for checkpointing workflow state.
+///
+/// Implementations provide durable storage for checkpoints, enabling
+/// recovery from failures and inspection of workflow history.
+#[async_trait]
+pub trait Checkpointer<S>: Send + Sync
+where
+    S: WorkflowState + Send + Sync,
+{
+    /// Save a checkpoint.
+    ///
+    /// Implementations should ensure atomic writes to prevent corruption.
+    async fn save(&self, checkpoint: &Checkpoint<S>) -> Result<(), PregelError>;
+
+    /// Load a checkpoint by superstep number.
+    ///
+    /// Returns `None` if no checkpoint exists for that superstep.
+    async fn load(&self, superstep: usize) -> Result<Option<Checkpoint<S>>, PregelError>;
+
+    /// Load the latest checkpoint.
+    ///
+    /// Returns `None` if no checkpoints exist.
+    async fn latest(&self) -> Result<Option<Checkpoint<S>>, PregelError>;
+
+    /// List all available checkpoint superstep numbers, sorted ascending.
+    async fn list(&self) -> Result<Vec<usize>, PregelError>;
+
+    /// Delete a specific checkpoint.
+    async fn delete(&self, superstep: usize) -> Result<(), PregelError>;
+
+    /// Prune checkpoints, keeping only the most recent `keep` checkpoints.
+    ///
+    /// This is useful for managing storage space in long-running workflows.
+    async fn prune(&self, keep: usize) -> Result<usize, PregelError> {
+        let checkpoints = self.list().await?;
+        let to_delete = checkpoints.len().saturating_sub(keep);
+        let mut deleted = 0;
+
+        for superstep in checkpoints.into_iter().take(to_delete) {
+            self.delete(superstep).await?;
+            deleted += 1;
+        }
+
+        Ok(deleted)
+    }
+
+    /// Clear all checkpoints for this workflow.
+    async fn clear(&self) -> Result<(), PregelError> {
+        for superstep in self.list().await? {
+            self.delete(superstep).await?;
+        }
+        Ok(())
+    }
+}
+
+/// Configuration for creating checkpointers.
+///
+/// Use with `create_checkpointer()` to instantiate the appropriate backend.
+#[derive(Debug, Clone, Default)]
+pub enum CheckpointerConfig {
+    /// In-memory checkpointing (for testing only, not durable)
+    #[default]
+    Memory,
+
+    /// File-based checkpointing
+    File {
+        /// Directory to store checkpoint files
+        path: PathBuf,
+        /// Whether to compress checkpoint data (uses zstd)
+        compression: bool,
+    },
+
+    /// SQLite-based checkpointing (requires `checkpointer-sqlite` feature)
+    #[cfg(feature = "checkpointer-sqlite")]
+    Sqlite {
+        /// Path to the SQLite database file, or `:memory:` for in-memory
+        path: String,
+    },
+
+    /// Redis-based checkpointing (requires `checkpointer-redis` feature)
+    #[cfg(feature = "checkpointer-redis")]
+    Redis {
+        /// Redis connection URL
+        url: String,
+        /// TTL for checkpoint keys (optional)
+        ttl_seconds: Option<u64>,
+    },
+
+    /// PostgreSQL-based checkpointing (requires `checkpointer-postgres` feature)
+    #[cfg(feature = "checkpointer-postgres")]
+    Postgres {
+        /// PostgreSQL connection URL
+        url: String,
+    },
+}
+
+
+/// In-memory checkpointer for testing.
+///
+/// This implementation stores checkpoints in memory and is not durable.
+/// Use only for testing or development.
+#[derive(Debug, Default)]
+pub struct MemoryCheckpointer<S>
+where
+    S: WorkflowState,
+{
+    checkpoints: tokio::sync::RwLock<HashMap<usize, Checkpoint<S>>>,
+}
+
+impl<S> MemoryCheckpointer<S>
+where
+    S: WorkflowState,
+{
+    /// Create a new in-memory checkpointer
+    pub fn new() -> Self {
+        Self {
+            checkpoints: tokio::sync::RwLock::new(HashMap::new()),
+        }
+    }
+}
+
+#[async_trait]
+impl<S> Checkpointer<S> for MemoryCheckpointer<S>
+where
+    S: WorkflowState + Clone + Send + Sync,
+{
+    async fn save(&self, checkpoint: &Checkpoint<S>) -> Result<(), PregelError> {
+        let mut checkpoints = self.checkpoints.write().await;
+        checkpoints.insert(checkpoint.superstep, checkpoint.clone());
+        Ok(())
+    }
+
+    async fn load(&self, superstep: usize) -> Result<Option<Checkpoint<S>>, PregelError> {
+        let checkpoints = self.checkpoints.read().await;
+        Ok(checkpoints.get(&superstep).cloned())
+    }
+
+    async fn latest(&self) -> Result<Option<Checkpoint<S>>, PregelError> {
+        let checkpoints = self.checkpoints.read().await;
+        let max_superstep = checkpoints.keys().max().copied();
+        match max_superstep {
+            Some(superstep) => Ok(checkpoints.get(&superstep).cloned()),
+            None => Ok(None),
+        }
+    }
+
+    async fn list(&self) -> Result<Vec<usize>, PregelError> {
+        let checkpoints = self.checkpoints.read().await;
+        let mut supersteps: Vec<usize> = checkpoints.keys().copied().collect();
+        supersteps.sort();
+        Ok(supersteps)
+    }
+
+    async fn delete(&self, superstep: usize) -> Result<(), PregelError> {
+        let mut checkpoints = self.checkpoints.write().await;
+        checkpoints.remove(&superstep);
+        Ok(())
+    }
+}
+
+/// Create a checkpointer from configuration.
+///
+/// This factory function creates the appropriate checkpointer backend
+/// based on the provided configuration.
+///
+/// # Example
+///
+/// ```ignore
+/// let config = CheckpointerConfig::File {
+///     path: PathBuf::from("./checkpoints"),
+///     compression: true,
+/// };
+/// let checkpointer = create_checkpointer::<MyState>(config, "workflow-123")?;
+/// ```
+pub fn create_checkpointer<S>(
+    config: CheckpointerConfig,
+    workflow_id: impl Into<String>,
+) -> Result<Box<dyn Checkpointer<S>>, PregelError>
+where
+    S: WorkflowState + Clone + Send + Sync + Serialize + for<'de> Deserialize<'de> + 'static,
+{
+    let workflow_id = workflow_id.into();
+
+    match config {
+        CheckpointerConfig::Memory => Ok(Box::new(MemoryCheckpointer::<S>::new())),
+
+        CheckpointerConfig::File { path, compression } => {
+            let checkpointer = FileCheckpointer::new(path, workflow_id, compression);
+            Ok(Box::new(checkpointer))
+        }
+
+        #[cfg(feature = "checkpointer-sqlite")]
+        CheckpointerConfig::Sqlite { path } => {
+            // SQLite checkpointer will be implemented in Task 8.2.3
+            Err(PregelError::not_implemented("SQLite checkpointer"))
+        }
+
+        #[cfg(feature = "checkpointer-redis")]
+        CheckpointerConfig::Redis { url, ttl_seconds } => {
+            // Redis checkpointer will be implemented in Task 8.2.4
+            Err(PregelError::not_implemented("Redis checkpointer"))
+        }
+
+        #[cfg(feature = "checkpointer-postgres")]
+        CheckpointerConfig::Postgres { url } => {
+            // PostgreSQL checkpointer will be implemented in Task 8.2.5
+            Err(PregelError::not_implemented("PostgreSQL checkpointer"))
+        }
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::pregel::state::UnitState;
+
+    #[test]
+    fn test_checkpoint_creation() {
+        let checkpoint = Checkpoint::new(
+            "test-workflow",
+            5,
+            UnitState,
+            HashMap::new(),
+            HashMap::new(),
+        );
+
+        assert_eq!(checkpoint.workflow_id, "test-workflow");
+        assert_eq!(checkpoint.superstep, 5);
+        assert!(checkpoint.is_empty());
+        assert_eq!(checkpoint.pending_message_count(), 0);
+    }
+
+    #[test]
+    fn test_checkpoint_with_metadata() {
+        let checkpoint = Checkpoint::new(
+            "test-workflow",
+            10,
+            UnitState,
+            HashMap::new(),
+            HashMap::new(),
+        )
+        .with_metadata("version", "1.0")
+        .with_metadata("creator", "test");
+
+        assert_eq!(checkpoint.metadata.get("version"), Some(&"1.0".to_string()));
+        assert_eq!(checkpoint.metadata.get("creator"), Some(&"test".to_string()));
+    }
+
+    #[test]
+    fn test_checkpoint_with_vertex_states() {
+        let mut vertex_states = HashMap::new();
+        vertex_states.insert(VertexId::new("a"), VertexState::Active);
+        vertex_states.insert(VertexId::new("b"), VertexState::Halted);
+
+        let checkpoint = Checkpoint::new(
+            "test-workflow",
+            3,
+            UnitState,
+            vertex_states,
+            HashMap::new(),
+        );
+
+        assert!(!checkpoint.is_empty());
+        assert_eq!(checkpoint.vertex_states.len(), 2);
+    }
+
+    #[test]
+    fn test_checkpoint_pending_message_count() {
+        let mut pending_messages = HashMap::new();
+        pending_messages.insert(
+            VertexId::new("a"),
+            vec![WorkflowMessage::Activate, WorkflowMessage::Activate],
+        );
+        pending_messages.insert(
+            VertexId::new("b"),
+            vec![WorkflowMessage::Activate],
+        );
+
+        let checkpoint = Checkpoint::new(
+            "test-workflow",
+            7,
+            UnitState,
+            HashMap::new(),
+            pending_messages,
+        );
+
+        assert_eq!(checkpoint.pending_message_count(), 3);
+    }
+
+    #[tokio::test]
+    async fn test_memory_checkpointer_save_load() {
+        let checkpointer = MemoryCheckpointer::<UnitState>::new();
+
+        let checkpoint = Checkpoint::new(
+            "test-workflow",
+            5,
+            UnitState,
+            HashMap::new(),
+            HashMap::new(),
+        );
+
+        checkpointer.save(&checkpoint).await.unwrap();
+        let loaded = checkpointer.load(5).await.unwrap().unwrap();
+
+        assert_eq!(loaded.superstep, 5);
+        assert_eq!(loaded.workflow_id, "test-workflow");
+    }
+
+    #[tokio::test]
+    async fn test_memory_checkpointer_latest() {
+        let checkpointer = MemoryCheckpointer::<UnitState>::new();
+
+        // Save checkpoints at supersteps 1, 3, 5
+        for superstep in [1, 3, 5] {
+            let checkpoint = Checkpoint::new(
+                "test-workflow",
+                superstep,
+                UnitState,
+                HashMap::new(),
+                HashMap::new(),
+            );
+            checkpointer.save(&checkpoint).await.unwrap();
+        }
+
+        let latest = checkpointer.latest().await.unwrap().unwrap();
+        assert_eq!(latest.superstep, 5);
+    }
+
+    #[tokio::test]
+    async fn test_memory_checkpointer_list() {
+        let checkpointer = MemoryCheckpointer::<UnitState>::new();
+
+        // Save checkpoints at supersteps 5, 1, 3 (out of order)
+        for superstep in [5, 1, 3] {
+            let checkpoint = Checkpoint::new(
+                "test-workflow",
+                superstep,
+                UnitState,
+                HashMap::new(),
+                HashMap::new(),
+            );
+            checkpointer.save(&checkpoint).await.unwrap();
+        }
+
+        let list = checkpointer.list().await.unwrap();
+        assert_eq!(list, vec![1, 3, 5]); // Should be sorted
+    }
+
+    #[tokio::test]
+    async fn test_memory_checkpointer_delete() {
+        let checkpointer = MemoryCheckpointer::<UnitState>::new();
+
+        let checkpoint = Checkpoint::new(
+            "test-workflow",
+            5,
+            UnitState,
+            HashMap::new(),
+            HashMap::new(),
+        );
+        checkpointer.save(&checkpoint).await.unwrap();
+
+        checkpointer.delete(5).await.unwrap();
+        let loaded = checkpointer.load(5).await.unwrap();
+        assert!(loaded.is_none());
+    }
+
+    #[tokio::test]
+    async fn test_memory_checkpointer_prune() {
+        let checkpointer = MemoryCheckpointer::<UnitState>::new();
+
+        // Save 5 checkpoints
+        for superstep in 1..=5 {
+            let checkpoint = Checkpoint::new(
+                "test-workflow",
+                superstep,
+                UnitState,
+                HashMap::new(),
+                HashMap::new(),
+            );
+            checkpointer.save(&checkpoint).await.unwrap();
+        }
+
+        // Prune to keep only 2
+        let deleted = checkpointer.prune(2).await.unwrap();
+        assert_eq!(deleted, 3);
+
+        let remaining = checkpointer.list().await.unwrap();
+        assert_eq!(remaining, vec![4, 5]);
+    }
+
+    #[tokio::test]
+    async fn test_memory_checkpointer_clear() {
+        let checkpointer = MemoryCheckpointer::<UnitState>::new();
+
+        for superstep in 1..=3 {
+            let checkpoint = Checkpoint::new(
+                "test-workflow",
+                superstep,
+                UnitState,
+                HashMap::new(),
+                HashMap::new(),
+            );
+            checkpointer.save(&checkpoint).await.unwrap();
+        }
+
+        checkpointer.clear().await.unwrap();
+        let list = checkpointer.list().await.unwrap();
+        assert!(list.is_empty());
+    }
+
+    #[test]
+    fn test_checkpointer_config_default() {
+        let config = CheckpointerConfig::default();
+        assert!(matches!(config, CheckpointerConfig::Memory));
+    }
+}
--- a/rust-research-agent/crates/rig-deepagents/src/pregel/config.rs
+++ b/rust-research-agent/crates/rig-deepagents/src/pregel/config.rs
@@ -0,0 +1,273 @@
+//! Pregel runtime configuration
+//!
+//! Configuration for the Pregel execution engine including
+//! parallelism, timeouts, checkpointing, and retry policies.
+
+use serde::{Deserialize, Serialize};
+use std::time::Duration;
+
+/// Pregel runtime configuration
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct PregelConfig {
+    /// Maximum supersteps before forced termination
+    pub max_supersteps: usize,
+
+    /// Maximum concurrent vertex computations
+    pub parallelism: usize,
+
+    /// Checkpoint frequency (every N supersteps, 0 = disabled)
+    pub checkpoint_interval: usize,
+
+    /// Timeout for individual vertex computation
+    #[serde(with = "humantime_serde")]
+    pub vertex_timeout: Duration,
+
+    /// Timeout for entire workflow
+    #[serde(with = "humantime_serde")]
+    pub workflow_timeout: Duration,
+
+    /// Enable detailed tracing
+    pub tracing_enabled: bool,
+
+    /// Retry policy for failed vertices
+    pub retry_policy: RetryPolicy,
+}
+
+impl Default for PregelConfig {
+    fn default() -> Self {
+        Self {
+            max_supersteps: 100,
+            parallelism: num_cpus::get(),
+            checkpoint_interval: 10,
+            vertex_timeout: Duration::from_secs(300),    // 5 min per vertex
+            workflow_timeout: Duration::from_secs(3600), // 1 hour total
+            tracing_enabled: true,
+            retry_policy: RetryPolicy::default(),
+        }
+    }
+}
+
+impl PregelConfig {
+    /// Create a new config with defaults
+    pub fn new() -> Self {
+        Self::default()
+    }
+
+    /// Set maximum supersteps
+    pub fn with_max_supersteps(mut self, max: usize) -> Self {
+        self.max_supersteps = max;
+        self
+    }
+
+    /// Set parallelism level
+    pub fn with_parallelism(mut self, parallelism: usize) -> Self {
+        self.parallelism = parallelism.max(1);
+        self
+    }
+
+    /// Set checkpoint interval (0 to disable)
+    pub fn with_checkpoint_interval(mut self, interval: usize) -> Self {
+        self.checkpoint_interval = interval;
+        self
+    }
+
+    /// Set vertex timeout
+    pub fn with_vertex_timeout(mut self, timeout: Duration) -> Self {
+        self.vertex_timeout = timeout;
+        self
+    }
+
+    /// Set workflow timeout
+    pub fn with_workflow_timeout(mut self, timeout: Duration) -> Self {
+        self.workflow_timeout = timeout;
+        self
+    }
+
+    /// Enable or disable tracing
+    pub fn with_tracing(mut self, enabled: bool) -> Self {
+        self.tracing_enabled = enabled;
+        self
+    }
+
+    /// Set retry policy
+    pub fn with_retry_policy(mut self, policy: RetryPolicy) -> Self {
+        self.retry_policy = policy;
+        self
+    }
+
+    /// Check if checkpointing is enabled
+    pub fn checkpointing_enabled(&self) -> bool {
+        self.checkpoint_interval > 0
+    }
+
+    /// Check if a checkpoint should be taken at this superstep
+    #[allow(clippy::manual_is_multiple_of)] // Using % for compatibility with older Rust versions
+    pub fn should_checkpoint(&self, superstep: usize) -> bool {
+        self.checkpointing_enabled() && superstep > 0 && superstep % self.checkpoint_interval == 0
+    }
+}
+
+/// Retry policy for failed vertex computations
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct RetryPolicy {
+    /// Maximum retry attempts
+    pub max_retries: usize,
+
+    /// Base delay for exponential backoff
+    #[serde(with = "humantime_serde")]
+    pub backoff_base: Duration,
+
+    /// Maximum delay between retries
+    #[serde(with = "humantime_serde")]
+    pub backoff_max: Duration,
+}
+
+impl Default for RetryPolicy {
+    fn default() -> Self {
+        Self {
+            max_retries: 3,
+            backoff_base: Duration::from_millis(100),
+            backoff_max: Duration::from_secs(10),
+        }
+    }
+}
+
+impl RetryPolicy {
+    /// Create a new retry policy
+    pub fn new(max_retries: usize) -> Self {
+        Self {
+            max_retries,
+            ..Default::default()
+        }
+    }
+
+    /// Set backoff base duration
+    pub fn with_backoff_base(mut self, base: Duration) -> Self {
+        self.backoff_base = base;
+        self
+    }
+
+    /// Set maximum backoff duration
+    pub fn with_backoff_max(mut self, max: Duration) -> Self {
+        self.backoff_max = max;
+        self
+    }
+
+    /// Calculate delay for a given retry attempt (exponential backoff)
+    pub fn delay_for_attempt(&self, attempt: usize) -> Duration {
+        let multiplier = 2u32.saturating_pow(attempt as u32);
+        let delay = self.backoff_base.saturating_mul(multiplier);
+        delay.min(self.backoff_max)
+    }
+
+    /// Check if more retries are allowed
+    pub fn should_retry(&self, attempts: usize) -> bool {
+        attempts < self.max_retries
+    }
+
+    /// Create a no-retry policy
+    pub fn no_retry() -> Self {
+        Self {
+            max_retries: 0,
+            ..Default::default()
+        }
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_default_config() {
+        let config = PregelConfig::default();
+        assert_eq!(config.max_supersteps, 100);
+        assert!(config.parallelism > 0);
+        assert_eq!(config.checkpoint_interval, 10);
+        assert!(config.tracing_enabled);
+    }
+
+    #[test]
+    fn test_config_builder() {
+        let config = PregelConfig::default()
+            .with_max_supersteps(50)
+            .with_parallelism(4)
+            .with_checkpoint_interval(5);
+
+        assert_eq!(config.max_supersteps, 50);
+        assert_eq!(config.parallelism, 4);
+        assert_eq!(config.checkpoint_interval, 5);
+    }
+
+    #[test]
+    fn test_parallelism_minimum() {
+        let config = PregelConfig::default().with_parallelism(0);
+        assert_eq!(config.parallelism, 1);
+    }
+
+    #[test]
+    fn test_checkpointing_enabled() {
+        let config = PregelConfig::default();
+        assert!(config.checkpointing_enabled());
+
+        let disabled = config.with_checkpoint_interval(0);
+        assert!(!disabled.checkpointing_enabled());
+    }
+
+    #[test]
+    fn test_should_checkpoint() {
+        let config = PregelConfig::default().with_checkpoint_interval(5);
+
+        assert!(!config.should_checkpoint(0));
+        assert!(!config.should_checkpoint(1));
+        assert!(config.should_checkpoint(5));
+        assert!(config.should_checkpoint(10));
+        assert!(!config.should_checkpoint(7));
+    }
+
+    #[test]
+    fn test_retry_policy_default() {
+        let policy = RetryPolicy::default();
+        assert_eq!(policy.max_retries, 3);
+        assert!(policy.should_retry(0));
+        assert!(policy.should_retry(2));
+        assert!(!policy.should_retry(3));
+    }
+
+    #[test]
+    fn test_retry_backoff() {
+        let policy = RetryPolicy::default();
+
+        let delay0 = policy.delay_for_attempt(0);
+        let delay1 = policy.delay_for_attempt(1);
+        let delay2 = policy.delay_for_attempt(2);
+
+        assert_eq!(delay0, Duration::from_millis(100));
+        assert_eq!(delay1, Duration::from_millis(200));
+        assert_eq!(delay2, Duration::from_millis(400));
+    }
+
+    #[test]
+    fn test_retry_backoff_max() {
+        let policy = RetryPolicy::default().with_backoff_max(Duration::from_millis(300));
+
+        let delay_high = policy.delay_for_attempt(10);
+        assert_eq!(delay_high, Duration::from_millis(300));
+    }
+
+    #[test]
+    fn test_no_retry_policy() {
+        let policy = RetryPolicy::no_retry();
+        assert!(!policy.should_retry(0));
+    }
+
+    #[test]
+    fn test_config_with_timeouts() {
+        let config = PregelConfig::default()
+            .with_vertex_timeout(Duration::from_secs(60))
+            .with_workflow_timeout(Duration::from_secs(120));
+
+        assert_eq!(config.vertex_timeout, Duration::from_secs(60));
+        assert_eq!(config.workflow_timeout, Duration::from_secs(120));
+    }
+}
--- a/rust-research-agent/crates/rig-deepagents/src/pregel/error.rs
+++ b/rust-research-agent/crates/rig-deepagents/src/pregel/error.rs
@@ -0,0 +1,243 @@
+//! Error types for Pregel runtime
+//!
+//! Comprehensive error handling for the Pregel execution engine.
+
+use super::vertex::VertexId;
+use thiserror::Error;
+
+/// Errors that can occur during Pregel runtime execution
+#[derive(Debug, Error)]
+pub enum PregelError {
+    /// Maximum supersteps exceeded
+    #[error("Max supersteps exceeded: {0}")]
+    MaxSuperstepsExceeded(usize),
+
+    /// Vertex computation timed out
+    #[error("Vertex timeout: {0:?}")]
+    VertexTimeout(VertexId),
+
+    /// Error during vertex computation
+    #[error("Vertex error in {vertex_id:?}: {message}")]
+    VertexError {
+        vertex_id: VertexId,
+        message: String,
+        #[source]
+        source: Option<Box<dyn std::error::Error + Send + Sync>>,
+    },
+
+    /// Error during routing decision
+    #[error("Routing error in {vertex_id:?}: {decision}")]
+    RoutingError { vertex_id: VertexId, decision: String },
+
+    /// Recursion depth limit exceeded
+    #[error("Recursion limit in {vertex_id:?}: depth {depth}, limit {limit}")]
+    RecursionLimit {
+        vertex_id: VertexId,
+        depth: usize,
+        limit: usize,
+    },
+
+    /// Error in workflow state management
+    #[error("State error: {0}")]
+    StateError(String),
+
+    /// Error in checkpointing
+    #[error("Checkpoint error: {0}")]
+    CheckpointError(String),
+
+    /// Feature not yet implemented
+    #[error("Not implemented: {0}")]
+    NotImplemented(String),
+
+    /// Invalid workflow configuration
+    #[error("Configuration error: {0}")]
+    ConfigError(String),
+
+    /// Message delivery failed
+    #[error("Message delivery failed: {0}")]
+    MessageDeliveryError(String),
+
+    /// Workflow terminated by user
+    #[error("Workflow cancelled")]
+    Cancelled,
+
+    /// Workflow execution timed out
+    #[error("Workflow timeout after {0:?}")]
+    WorkflowTimeout(std::time::Duration),
+
+    /// Maximum retry attempts exceeded for a vertex
+    #[error("Max retries exceeded for vertex {vertex_id:?}: {attempts} attempts")]
+    MaxRetriesExceeded { vertex_id: VertexId, attempts: usize },
+}
+
+impl PregelError {
+    /// Create a vertex error with a message
+    pub fn vertex_error(vertex_id: impl Into<VertexId>, message: impl Into<String>) -> Self {
+        Self::VertexError {
+            vertex_id: vertex_id.into(),
+            message: message.into(),
+            source: None,
+        }
+    }
+
+    /// Create a vertex error with source
+    pub fn vertex_error_with_source(
+        vertex_id: impl Into<VertexId>,
+        message: impl Into<String>,
+        source: impl std::error::Error + Send + Sync + 'static,
+    ) -> Self {
+        Self::VertexError {
+            vertex_id: vertex_id.into(),
+            message: message.into(),
+            source: Some(Box::new(source)),
+        }
+    }
+
+    /// Create a routing error
+    pub fn routing_error(vertex_id: impl Into<VertexId>, decision: impl Into<String>) -> Self {
+        Self::RoutingError {
+            vertex_id: vertex_id.into(),
+            decision: decision.into(),
+        }
+    }
+
+    /// Create a recursion limit error
+    pub fn recursion_limit(vertex_id: impl Into<VertexId>, depth: usize, limit: usize) -> Self {
+        Self::RecursionLimit {
+            vertex_id: vertex_id.into(),
+            depth,
+            limit,
+        }
+    }
+
+    /// Check if the error is recoverable
+    pub fn is_recoverable(&self) -> bool {
+        matches!(
+            self,
+            PregelError::VertexTimeout(_)
+                | PregelError::VertexError { .. }
+                | PregelError::MessageDeliveryError(_)
+        )
+    }
+
+    /// Check if the error is a timeout
+    pub fn is_timeout(&self) -> bool {
+        matches!(self, PregelError::VertexTimeout(_))
+    }
+
+    /// Create a checkpoint error
+    pub fn checkpoint_error(message: impl Into<String>) -> Self {
+        Self::CheckpointError(message.into())
+    }
+
+    /// Create a not implemented error
+    pub fn not_implemented(feature: impl Into<String>) -> Self {
+        Self::NotImplemented(feature.into())
+    }
+
+    /// Create a state error
+    pub fn state_error(message: impl Into<String>) -> Self {
+        Self::StateError(message.into())
+    }
+
+    /// Create a config error
+    pub fn config_error(message: impl Into<String>) -> Self {
+        Self::ConfigError(message.into())
+    }
+
+    /// Create a workflow timeout error
+    pub fn workflow_timeout(duration: std::time::Duration) -> Self {
+        Self::WorkflowTimeout(duration)
+    }
+
+    /// Create a max retries exceeded error
+    pub fn max_retries_exceeded(vertex_id: impl Into<VertexId>, attempts: usize) -> Self {
+        Self::MaxRetriesExceeded {
+            vertex_id: vertex_id.into(),
+            attempts,
+        }
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    // Ensure errors are Send + Sync (compile-time check)
+    static_assertions::assert_impl_all!(super::PregelError: Send, Sync);
+    use super::*;
+
+    #[test]
+    fn test_error_display() {
+        let err = PregelError::MaxSuperstepsExceeded(100);
+        assert_eq!(format!("{}", err), "Max supersteps exceeded: 100");
+    }
+
+    #[test]
+    fn test_vertex_error() {
+        let err = PregelError::vertex_error("node1", "computation failed");
+        match err {
+            PregelError::VertexError {
+                vertex_id,
+                message,
+                source,
+            } => {
+                assert_eq!(vertex_id.0, "node1");
+                assert_eq!(message, "computation failed");
+                assert!(source.is_none());
+            }
+            _ => panic!("Wrong error type"),
+        }
+    }
+
+    #[test]
+    fn test_vertex_timeout() {
+        let err = PregelError::VertexTimeout(VertexId::from("slow_node"));
+        assert!(format!("{}", err).contains("slow_node"));
+        assert!(err.is_timeout());
+    }
+
+    #[test]
+    fn test_routing_error() {
+        let err = PregelError::routing_error("router", "no matching branch");
+        match err {
+            PregelError::RoutingError { vertex_id, decision } => {
+                assert_eq!(vertex_id.0, "router");
+                assert_eq!(decision, "no matching branch");
+            }
+            _ => panic!("Wrong error type"),
+        }
+    }
+
+    #[test]
+    fn test_recursion_limit() {
+        let err = PregelError::recursion_limit("nested_agent", 6, 5);
+        match err {
+            PregelError::RecursionLimit {
+                vertex_id,
+                depth,
+                limit,
+            } => {
+                assert_eq!(vertex_id.0, "nested_agent");
+                assert_eq!(depth, 6);
+                assert_eq!(limit, 5);
+            }
+            _ => panic!("Wrong error type"),
+        }
+    }
+
+    #[test]
+    fn test_is_recoverable() {
+        assert!(PregelError::VertexTimeout(VertexId::from("x")).is_recoverable());
+        assert!(PregelError::vertex_error("x", "err").is_recoverable());
+        assert!(PregelError::MessageDeliveryError("err".into()).is_recoverable());
+
+        assert!(!PregelError::MaxSuperstepsExceeded(100).is_recoverable());
+        assert!(!PregelError::Cancelled.is_recoverable());
+        assert!(!PregelError::recursion_limit("x", 5, 3).is_recoverable());
+    }
+
+    #[test]
+    fn test_errors_are_send_sync() {
+        fn assert_send_sync<T: Send + Sync>() {}
+        assert_send_sync::<PregelError>();
+    }
+}
--- a/rust-research-agent/crates/rig-deepagents/src/pregel/message.rs
+++ b/rust-research-agent/crates/rig-deepagents/src/pregel/message.rs
@@ -0,0 +1,230 @@
+//! Message types for Pregel vertex communication
+//!
+//! Vertices communicate by sending messages to each other.
+//! Messages are delivered at the start of each superstep.
+
+use serde::{Deserialize, Serialize};
+use super::vertex::VertexId;
+
+/// Trait bound for vertex messages
+pub trait VertexMessage: Clone + Send + Sync + 'static {}
+
+/// Priority level for research directions
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Default, Serialize, Deserialize)]
+pub enum Priority {
+    High,
+    #[default]
+    Medium,
+    Low,
+}
+
+/// Source information for research findings
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct Source {
+    /// URL of the source
+    pub url: String,
+    /// Title of the source
+    pub title: String,
+    /// Relevance score (0.0 to 1.0)
+    pub relevance: f32,
+}
+
+impl Source {
+    /// Create a new source
+    pub fn new(url: impl Into<String>, title: impl Into<String>, relevance: f32) -> Self {
+        Self {
+            url: url.into(),
+            title: title.into(),
+            relevance: relevance.clamp(0.0, 1.0),
+        }
+    }
+}
+
+/// Standard message types for workflow coordination
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub enum WorkflowMessage {
+    /// Trigger vertex activation
+    Activate,
+
+    /// Pass data between vertices
+    Data {
+        key: String,
+        value: serde_json::Value,
+    },
+
+    /// Signal completion of upstream work
+    Completed {
+        source: VertexId,
+        result: Option<String>,
+    },
+
+    /// Request vertex to halt
+    Halt,
+
+    /// Research-specific: share findings
+    ResearchFinding {
+        query: String,
+        sources: Vec<Source>,
+        summary: String,
+    },
+
+    /// Research-specific: suggest new direction
+    ResearchDirection {
+        topic: String,
+        priority: Priority,
+        rationale: String,
+    },
+}
+
+impl VertexMessage for WorkflowMessage {}
+
+impl WorkflowMessage {
+    /// Create a Data message
+    pub fn data(key: impl Into<String>, value: impl Serialize) -> Self {
+        Self::Data {
+            key: key.into(),
+            value: serde_json::to_value(value).unwrap_or(serde_json::Value::Null),
+        }
+    }
+
+    /// Create a Completed message
+    pub fn completed(source: impl Into<VertexId>, result: Option<String>) -> Self {
+        Self::Completed {
+            source: source.into(),
+            result,
+        }
+    }
+
+    /// Create a ResearchFinding message
+    pub fn research_finding(
+        query: impl Into<String>,
+        sources: Vec<Source>,
+        summary: impl Into<String>,
+    ) -> Self {
+        Self::ResearchFinding {
+            query: query.into(),
+            sources,
+            summary: summary.into(),
+        }
+    }
+
+    /// Create a ResearchDirection message
+    pub fn research_direction(
+        topic: impl Into<String>,
+        priority: Priority,
+        rationale: impl Into<String>,
+    ) -> Self {
+        Self::ResearchDirection {
+            topic: topic.into(),
+            priority,
+            rationale: rationale.into(),
+        }
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use serde_json::json;
+
+    #[test]
+    fn test_workflow_message_serialization() {
+        let msg = WorkflowMessage::Data {
+            key: "query".into(),
+            value: json!("test query"),
+        };
+        let json_str = serde_json::to_string(&msg).unwrap();
+        let deserialized: WorkflowMessage = serde_json::from_str(&json_str).unwrap();
+
+        // Verify roundtrip
+        match deserialized {
+            WorkflowMessage::Data { key, value } => {
+                assert_eq!(key, "query");
+                assert_eq!(value, json!("test query"));
+            }
+            _ => panic!("Wrong variant"),
+        }
+    }
+
+    #[test]
+    fn test_research_finding_message() {
+        let msg = WorkflowMessage::ResearchFinding {
+            query: "rust async".into(),
+            sources: vec![Source {
+                url: "https://example.com".into(),
+                title: "Example".into(),
+                relevance: 0.95,
+            }],
+            summary: "Rust async is great".into(),
+        };
+        assert!(matches!(msg, WorkflowMessage::ResearchFinding { .. }));
+    }
+
+    #[test]
+    fn test_research_direction_message() {
+        let msg = WorkflowMessage::research_direction(
+            "async runtimes",
+            Priority::High,
+            "Important for understanding concurrency",
+        );
+        match msg {
+            WorkflowMessage::ResearchDirection {
+                topic,
+                priority,
+                rationale,
+            } => {
+                assert_eq!(topic, "async runtimes");
+                assert_eq!(priority, Priority::High);
+                assert!(!rationale.is_empty());
+            }
+            _ => panic!("Wrong variant"),
+        }
+    }
+
+    #[test]
+    fn test_source_relevance_clamping() {
+        let source = Source::new("https://test.com", "Test", 1.5);
+        assert_eq!(source.relevance, 1.0);
+
+        let source = Source::new("https://test.com", "Test", -0.5);
+        assert_eq!(source.relevance, 0.0);
+    }
+
+    #[test]
+    fn test_completed_message() {
+        let msg = WorkflowMessage::completed("planner", Some("Plan complete".to_string()));
+        match msg {
+            WorkflowMessage::Completed { source, result } => {
+                assert_eq!(source.0, "planner");
+                assert_eq!(result, Some("Plan complete".to_string()));
+            }
+            _ => panic!("Wrong variant"),
+        }
+    }
+
+    #[test]
+    fn test_data_message_helper() {
+        let msg = WorkflowMessage::data("count", 42);
+        match msg {
+            WorkflowMessage::Data { key, value } => {
+                assert_eq!(key, "count");
+                assert_eq!(value, json!(42));
+            }
+            _ => panic!("Wrong variant"),
+        }
+    }
+
+    #[test]
+    fn test_priority_default() {
+        assert_eq!(Priority::default(), Priority::Medium);
+    }
+
+    #[test]
+    fn test_activate_and_halt() {
+        let activate = WorkflowMessage::Activate;
+        let halt = WorkflowMessage::Halt;
+
+        assert!(matches!(activate, WorkflowMessage::Activate));
+        assert!(matches!(halt, WorkflowMessage::Halt));
+    }
+}
--- a/rust-research-agent/crates/rig-deepagents/src/pregel/mod.rs
+++ b/rust-research-agent/crates/rig-deepagents/src/pregel/mod.rs
@@ -0,0 +1,45 @@
+//! Pregel Runtime for Graph-Based Agent Orchestration
+//!
+//! This module implements a Pregel-inspired runtime for executing agent workflows.
+//! Key concepts:
+//!
+//! - **Vertex**: Computation unit (Agent, Tool, Router, etc.)
+//! - **Edge**: Connection between vertices (Direct, Conditional)
+//! - **Superstep**: Synchronized execution phase
+//! - **Message**: Communication between vertices
+//!
+//! # Architecture
+//!
+//! ```text
+//! ┌─────────────────────────────────────────────────────────────┐
+//! │                    PregelRuntime                             │
+//! │  ┌─────────┐  ┌─────────┐  ┌─────────┐                     │
+//! │  │Superstep│→ │Superstep│→ │Superstep│→ ...                │
+//! │  │    0    │  │    1    │  │    2    │                     │
+//! │  └─────────┘  └─────────┘  └─────────┘                     │
+//! │       │            │            │                           │
+//! │       ▼            ▼            ▼                           │
+//! │  ┌─────────────────────────────────────────────────────┐   │
+//! │  │ Per-Superstep: Deliver → Compute → Collect → Route  │   │
+//! │  └─────────────────────────────────────────────────────┘   │
+//! └─────────────────────────────────────────────────────────────┘
+//! ```
+
+pub mod vertex;
+pub mod message;
+pub mod config;
+pub mod error;
+pub mod state;
+pub mod runtime;
+pub mod checkpoint;
+
+// Re-exports
+pub use vertex::{
+    BoxedVertex, ComputeContext, ComputeResult, StateUpdate, Vertex, VertexId, VertexState,
+};
+pub use message::{Priority, Source, VertexMessage, WorkflowMessage};
+pub use config::{PregelConfig, RetryPolicy};
+pub use error::PregelError;
+pub use state::{UnitState, UnitUpdate, WorkflowState};
+pub use runtime::{PregelRuntime, WorkflowResult};
+pub use checkpoint::{Checkpoint, Checkpointer, CheckpointerConfig, MemoryCheckpointer, FileCheckpointer, create_checkpointer};
--- a/rust-research-agent/crates/rig-deepagents/src/pregel/runtime.rs
+++ b/rust-research-agent/crates/rig-deepagents/src/pregel/runtime.rs
@@ -0,0 +1,871 @@
+//! Pregel Runtime - Core execution engine for workflow graphs
+//!
+//! The runtime executes workflows through synchronized supersteps.
+//! Each superstep follows the sequence: Deliver → Compute → Collect → Route.
+
+use std::collections::HashMap;
+use std::sync::Arc;
+use tokio::sync::{Mutex, Semaphore};
+use tokio::time::timeout;
+
+use super::config::PregelConfig;
+use super::error::PregelError;
+use super::message::VertexMessage;
+use super::state::WorkflowState;
+use super::vertex::{BoxedVertex, ComputeContext, ComputeResult, VertexId, VertexState};
+
+/// Result of a workflow execution
+#[derive(Debug, Clone)]
+pub struct WorkflowResult<S: WorkflowState> {
+    /// Final workflow state
+    pub state: S,
+    /// Number of supersteps executed
+    pub supersteps: usize,
+    /// Whether the workflow completed successfully
+    pub completed: bool,
+    /// Final states of all vertices
+    pub vertex_states: HashMap<VertexId, VertexState>,
+}
+
+/// Pregel Runtime for executing workflow graphs
+///
+/// Manages the execution of vertices through synchronized supersteps,
+/// handling message passing, state updates, and termination detection.
+pub struct PregelRuntime<S, M>
+where
+    S: WorkflowState,
+    M: VertexMessage,
+{
+    /// Configuration for the runtime
+    config: PregelConfig,
+    /// Vertices in the workflow graph
+    vertices: HashMap<VertexId, BoxedVertex<S, M>>,
+    /// Current state of each vertex
+    vertex_states: HashMap<VertexId, VertexState>,
+    /// Pending messages for each vertex (delivered at start of next superstep)
+    message_queues: HashMap<VertexId, Vec<M>>,
+    /// Edges defining message routing (source -> targets)
+    edges: HashMap<VertexId, Vec<VertexId>>,
+    /// Retry attempt counts per vertex (for retry policy enforcement)
+    retry_counts: HashMap<VertexId, usize>,
+}
+
+impl<S, M> PregelRuntime<S, M>
+where
+    S: WorkflowState,
+    M: VertexMessage,
+{
+    /// Create a new runtime with default configuration
+    pub fn new() -> Self {
+        Self::with_config(PregelConfig::default())
+    }
+
+    /// Create a new runtime with custom configuration
+    pub fn with_config(config: PregelConfig) -> Self {
+        Self {
+            config,
+            vertices: HashMap::new(),
+            vertex_states: HashMap::new(),
+            message_queues: HashMap::new(),
+            edges: HashMap::new(),
+            retry_counts: HashMap::new(),
+        }
+    }
+
+    /// Add a vertex to the runtime
+    pub fn add_vertex(&mut self, vertex: BoxedVertex<S, M>) -> &mut Self {
+        let id = vertex.id().clone();
+        self.vertex_states.insert(id.clone(), VertexState::Active);
+        self.message_queues.insert(id.clone(), Vec::new());
+        self.vertices.insert(id, vertex);
+        self
+    }
+
+    /// Add an edge between vertices
+    pub fn add_edge(&mut self, from: impl Into<VertexId>, to: impl Into<VertexId>) -> &mut Self {
+        let from = from.into();
+        let to = to.into();
+        self.edges.entry(from).or_default().push(to);
+        self
+    }
+
+    /// Set the entry point (activate this vertex on start)
+    pub fn set_entry(&mut self, entry: impl Into<VertexId>) -> &mut Self {
+        let entry = entry.into();
+        if let Some(state) = self.vertex_states.get_mut(&entry) {
+            *state = VertexState::Active;
+        }
+        self
+    }
+
+    /// Get the configuration
+    pub fn config(&self) -> &PregelConfig {
+        &self.config
+    }
+
+    /// Run the workflow to completion
+    ///
+    /// Enforces the configured `workflow_timeout` - if the workflow takes longer
+    /// than this duration, it will return a `WorkflowTimeout` error.
+    pub async fn run(&mut self, initial_state: S) -> Result<WorkflowResult<S>, PregelError> {
+        let workflow_timeout = self.config.workflow_timeout;
+
+        // C2 Fix: Wrap entire run loop with workflow timeout
+        match timeout(workflow_timeout, self.run_inner(initial_state)).await {
+            Ok(result) => result,
+            Err(_) => Err(PregelError::WorkflowTimeout(workflow_timeout)),
+        }
+    }
+
+    /// Internal run loop (extracted for timeout wrapping)
+    async fn run_inner(&mut self, initial_state: S) -> Result<WorkflowResult<S>, PregelError> {
+        let mut state = initial_state;
+        let mut superstep = 0;
+
+        loop {
+            // Check max supersteps limit
+            if superstep >= self.config.max_supersteps {
+                return Err(PregelError::MaxSuperstepsExceeded(superstep));
+            }
+
+            // Check if workflow should terminate
+            if self.should_terminate(&state) {
+                return Ok(WorkflowResult {
+                    state,
+                    supersteps: superstep,
+                    completed: true,
+                    vertex_states: self.vertex_states.clone(),
+                });
+            }
+
+            // Execute one superstep
+            let updates = self.execute_superstep(superstep, &state).await?;
+
+            // Apply state updates
+            state = state.apply_updates(updates);
+
+            superstep += 1;
+        }
+    }
+
+    /// Check if the workflow should terminate
+    fn should_terminate(&self, state: &S) -> bool {
+        // Terminal state check
+        if state.is_terminal() {
+            return true;
+        }
+
+        // All vertices halted or completed AND no pending messages
+        let all_inactive = self
+            .vertex_states
+            .values()
+            .all(|s| !s.is_active());
+
+        let no_pending_messages = self
+            .message_queues
+            .values()
+            .all(|q| q.is_empty());
+
+        all_inactive && no_pending_messages
+    }
+
+    /// Execute a single superstep
+    async fn execute_superstep(
+        &mut self,
+        superstep: usize,
+        state: &S,
+    ) -> Result<Vec<S::Update>, PregelError> {
+        // 1. Deliver messages - move pending messages to vertex inboxes
+        let inboxes = self.deliver_messages();
+
+        // 2. Reactivate halted vertices that received messages
+        for (vertex_id, messages) in &inboxes {
+            if !messages.is_empty() {
+                if let Some(vertex_state) = self.vertex_states.get_mut(vertex_id) {
+                    if vertex_state.is_halted() {
+                        if let Some(vertex) = self.vertices.get(vertex_id) {
+                            *vertex_state = vertex.on_reactivation(messages);
+                        }
+                    }
+                }
+            }
+        }
+
+        // 3. Compute active vertices in parallel
+        let (updates, outboxes) = self.compute_vertices(superstep, state, &inboxes).await?;
+
+        // 4. Route messages to next superstep queues
+        self.route_messages(outboxes);
+
+        Ok(updates)
+    }
+
+    /// Deliver pending messages to vertex inboxes
+    fn deliver_messages(&mut self) -> HashMap<VertexId, Vec<M>> {
+        let mut inboxes = HashMap::new();
+        for (vertex_id, queue) in &mut self.message_queues {
+            if !queue.is_empty() {
+                inboxes.insert(vertex_id.clone(), std::mem::take(queue));
+            } else {
+                inboxes.insert(vertex_id.clone(), Vec::new());
+            }
+        }
+        inboxes
+    }
+
+    /// Compute all active vertices in parallel
+    async fn compute_vertices(
+        &mut self,
+        superstep: usize,
+        state: &S,
+        inboxes: &HashMap<VertexId, Vec<M>>,
+    ) -> Result<(Vec<S::Update>, HashMap<VertexId, HashMap<VertexId, Vec<M>>>), PregelError> {
+        let semaphore = Arc::new(Semaphore::new(self.config.parallelism));
+        let updates = Arc::new(Mutex::new(Vec::new()));
+        let outboxes = Arc::new(Mutex::new(HashMap::new()));
+        let vertex_timeout = self.config.vertex_timeout;
+
+        // Collect active vertices to compute
+        let active_vertices: Vec<_> = self
+            .vertex_states
+            .iter()
+            .filter(|(_, state)| state.is_active())
+            .map(|(id, _)| id.clone())
+            .collect();
+
+        // Execute vertices in parallel
+        let mut handles = Vec::new();
+
+        for vertex_id in active_vertices {
+            let vertex = match self.vertices.get(&vertex_id) {
+                Some(v) => Arc::clone(v),
+                None => continue,
+            };
+            let messages = inboxes.get(&vertex_id).cloned().unwrap_or_default();
+            let state_clone = state.clone();
+            let sem_clone = Arc::clone(&semaphore);
+            let vid = vertex_id.clone();
+
+            let handle = tokio::spawn(async move {
+                // Acquire semaphore permit for parallelism control
+                let _permit = sem_clone.acquire().await.unwrap();
+
+                // Create compute context
+                let mut ctx = ComputeContext::new(vid.clone(), &messages, superstep, &state_clone);
+
+                // Execute with timeout
+                let result: Result<ComputeResult<S::Update>, PregelError> = match timeout(
+                    vertex_timeout,
+                    vertex.compute(&mut ctx),
+                )
+                .await
+                {
+                    Ok(result) => result,
+                    Err(_) => Err(PregelError::VertexTimeout(vid.clone())),
+                };
+
+                let outbox = ctx.into_outbox();
+
+                (vid, result, outbox)
+            });
+
+            handles.push(handle);
+        }
+
+        // Collect results
+        let mut new_vertex_states = HashMap::new();
+
+        for handle in handles {
+            let (vid, result, outbox) = handle.await.map_err(|e| {
+                PregelError::vertex_error_with_source(
+                    "unknown",
+                    "task join error",
+                    std::io::Error::other(e.to_string()),
+                )
+            })?;
+
+            match result {
+                Ok(compute_result) => {
+                    // Success: reset retry count for this vertex
+                    self.retry_counts.remove(&vid);
+                    updates.lock().await.push(compute_result.update);
+                    new_vertex_states.insert(vid.clone(), compute_result.state);
+                    outboxes.lock().await.insert(vid, outbox);
+                }
+                Err(e) => {
+                    if e.is_recoverable() {
+                        // C3 Fix: Track retry attempts and enforce max_retries
+                        // retry_count tracks how many retries we've already attempted
+                        let retry_count = self.retry_counts.entry(vid.clone()).or_insert(0);
+
+                        // Check if we can retry BEFORE incrementing
+                        if self.config.retry_policy.should_retry(*retry_count) {
+                            // Apply backoff delay before next retry
+                            let delay = self.config.retry_policy.delay_for_attempt(*retry_count);
+                            tokio::time::sleep(delay).await;
+                            // Track this retry attempt
+                            *retry_count += 1;
+                            // Keep vertex active for retry
+                            new_vertex_states.insert(vid, VertexState::Active);
+                        } else {
+                            // Max retries exceeded (current attempt is retry_count + 1 total)
+                            return Err(PregelError::MaxRetriesExceeded {
+                                vertex_id: vid,
+                                attempts: *retry_count + 1, // +1 for the current failed attempt
+                            });
+                        }
+                    } else {
+                        return Err(e);
+                    }
+                }
+            }
+        }
+
+        // Update vertex states
+        for (vid, new_state) in new_vertex_states {
+            self.vertex_states.insert(vid, new_state);
+        }
+
+        // C1 Fix: Use async-safe lock instead of blocking_lock
+        let final_updates = match Arc::try_unwrap(updates) {
+            Ok(mutex) => mutex.into_inner(),
+            Err(arc) => arc.lock().await.clone(),
+        };
+
+        let final_outboxes = match Arc::try_unwrap(outboxes) {
+            Ok(mutex) => mutex.into_inner(),
+            Err(arc) => arc.lock().await.clone(),
+        };
+
+        Ok((final_updates, final_outboxes))
+    }
+
+    /// Route outgoing messages to target vertex queues
+    fn route_messages(&mut self, outboxes: HashMap<VertexId, HashMap<VertexId, Vec<M>>>) {
+        for (_source, outbox) in outboxes {
+            for (target, messages) in outbox {
+                if let Some(queue) = self.message_queues.get_mut(&target) {
+                    queue.extend(messages);
+                }
+            }
+        }
+    }
+}
+
+impl<S, M> Default for PregelRuntime<S, M>
+where
+    S: WorkflowState,
+    M: VertexMessage,
+{
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use super::super::message::WorkflowMessage;
+    use super::super::vertex::{StateUpdate, Vertex};
+    use async_trait::async_trait;
+    use tokio::time::Duration;
+
+    #[allow(unused_imports)]
+    use super::super::state::WorkflowState as _;
+
+    // Test state
+    #[derive(Clone, Default, Debug)]
+    struct TestState {
+        counter: i32,
+        messages_received: i32,
+    }
+
+    #[derive(Clone, Debug)]
+    struct TestUpdate {
+        counter_delta: i32,
+        messages_delta: i32,
+    }
+
+    impl StateUpdate for TestUpdate {
+        fn empty() -> Self {
+            TestUpdate {
+                counter_delta: 0,
+                messages_delta: 0,
+            }
+        }
+
+        fn is_empty(&self) -> bool {
+            self.counter_delta == 0 && self.messages_delta == 0
+        }
+    }
+
+    impl WorkflowState for TestState {
+        type Update = TestUpdate;
+
+        fn apply_update(&self, update: Self::Update) -> Self {
+            TestState {
+                counter: self.counter + update.counter_delta,
+                messages_received: self.messages_received + update.messages_delta,
+            }
+        }
+
+        fn merge_updates(updates: Vec<Self::Update>) -> Self::Update {
+            TestUpdate {
+                counter_delta: updates.iter().map(|u| u.counter_delta).sum(),
+                messages_delta: updates.iter().map(|u| u.messages_delta).sum(),
+            }
+        }
+
+        fn is_terminal(&self) -> bool {
+            self.counter >= 10
+        }
+    }
+
+    // Simple vertex that increments counter and halts
+    struct IncrementVertex {
+        id: VertexId,
+        #[allow(dead_code)]
+        increment: i32,
+    }
+
+    #[async_trait]
+    impl Vertex<TestState, WorkflowMessage> for IncrementVertex {
+        fn id(&self) -> &VertexId {
+            &self.id
+        }
+
+        async fn compute(
+            &self,
+            _ctx: &mut ComputeContext<'_, TestState, WorkflowMessage>,
+        ) -> Result<ComputeResult<TestUpdate>, PregelError> {
+            // Just halt immediately
+            Ok(ComputeResult::halt(TestUpdate::empty()))
+        }
+    }
+
+    // Vertex that sends a message then halts
+    struct MessageSenderVertex {
+        id: VertexId,
+        target: VertexId,
+    }
+
+    #[async_trait]
+    impl Vertex<TestState, WorkflowMessage> for MessageSenderVertex {
+        fn id(&self) -> &VertexId {
+            &self.id
+        }
+
+        async fn compute(
+            &self,
+            ctx: &mut ComputeContext<'_, TestState, WorkflowMessage>,
+        ) -> Result<ComputeResult<TestUpdate>, PregelError> {
+            if ctx.is_first_superstep() {
+                ctx.send_message(self.target.clone(), WorkflowMessage::Activate);
+            }
+            Ok(ComputeResult::halt(TestUpdate::empty()))
+        }
+    }
+
+    // Vertex that counts messages received
+    struct MessageReceiverVertex {
+        id: VertexId,
+    }
+
+    #[async_trait]
+    impl Vertex<TestState, WorkflowMessage> for MessageReceiverVertex {
+        fn id(&self) -> &VertexId {
+            &self.id
+        }
+
+        async fn compute(
+            &self,
+            _ctx: &mut ComputeContext<'_, TestState, WorkflowMessage>,
+        ) -> Result<ComputeResult<TestUpdate>, PregelError> {
+            // Just halt after receiving messages
+            Ok(ComputeResult::halt(TestUpdate::empty()))
+        }
+    }
+
+    #[tokio::test]
+    async fn test_runtime_creation() {
+        let runtime: PregelRuntime<TestState, WorkflowMessage> = PregelRuntime::new();
+        assert_eq!(runtime.config().max_supersteps, 100);
+    }
+
+    #[tokio::test]
+    async fn test_runtime_single_vertex_halts() {
+        let mut runtime: PregelRuntime<TestState, WorkflowMessage> = PregelRuntime::new();
+
+        runtime.add_vertex(Arc::new(IncrementVertex {
+            id: VertexId::new("a"),
+            increment: 1,
+        }));
+
+        let result = runtime.run(TestState::default()).await;
+        assert!(result.is_ok());
+
+        let result = result.unwrap();
+        assert!(result.completed);
+        // Single vertex computes once then halts, workflow terminates
+        assert!(result.supersteps <= 2);
+    }
+
+    #[tokio::test]
+    async fn test_runtime_message_delivery() {
+        let mut runtime: PregelRuntime<TestState, WorkflowMessage> = PregelRuntime::new();
+
+        runtime.add_vertex(Arc::new(MessageSenderVertex {
+            id: VertexId::new("sender"),
+            target: VertexId::new("receiver"),
+        }));
+
+        runtime.add_vertex(Arc::new(MessageReceiverVertex {
+            id: VertexId::new("receiver"),
+        }));
+
+        let result = runtime.run(TestState::default()).await.unwrap();
+        assert!(result.completed);
+        // Sender sends in superstep 0, receiver gets it in superstep 1
+        assert!(result.supersteps >= 1);
+    }
+
+    #[tokio::test]
+    async fn test_runtime_termination_all_halted() {
+        let mut runtime: PregelRuntime<TestState, WorkflowMessage> = PregelRuntime::new();
+
+        // Add two vertices that halt immediately
+        runtime.add_vertex(Arc::new(IncrementVertex {
+            id: VertexId::new("a"),
+            increment: 1,
+        }));
+
+        runtime.add_vertex(Arc::new(IncrementVertex {
+            id: VertexId::new("b"),
+            increment: 1,
+        }));
+
+        let result = runtime.run(TestState::default()).await.unwrap();
+        assert!(result.completed);
+        // All vertices halt, no messages pending -> terminate
+        assert!(result.vertex_states.values().all(|s| !s.is_active()));
+    }
+
+    #[tokio::test]
+    async fn test_runtime_max_supersteps_exceeded() {
+        struct InfiniteLoopVertex {
+            id: VertexId,
+        }
+
+        #[async_trait]
+        impl Vertex<TestState, WorkflowMessage> for InfiniteLoopVertex {
+            fn id(&self) -> &VertexId {
+                &self.id
+            }
+
+            async fn compute(
+                &self,
+                ctx: &mut ComputeContext<'_, TestState, WorkflowMessage>,
+            ) -> Result<ComputeResult<TestUpdate>, PregelError> {
+                // Always stay active
+                ctx.send_message(self.id.clone(), WorkflowMessage::Activate);
+                Ok(ComputeResult::active(TestUpdate::empty()))
+            }
+        }
+
+        let config = PregelConfig::default().with_max_supersteps(5);
+        let mut runtime: PregelRuntime<TestState, WorkflowMessage> =
+            PregelRuntime::with_config(config);
+
+        runtime.add_vertex(Arc::new(InfiniteLoopVertex {
+            id: VertexId::new("loop"),
+        }));
+
+        let result = runtime.run(TestState::default()).await;
+        assert!(result.is_err());
+        assert!(matches!(
+            result.unwrap_err(),
+            PregelError::MaxSuperstepsExceeded(5)
+        ));
+    }
+
+    #[tokio::test]
+    async fn test_runtime_terminal_state() {
+        struct CounterVertex {
+            id: VertexId,
+        }
+
+        #[async_trait]
+        impl Vertex<TestState, WorkflowMessage> for CounterVertex {
+            fn id(&self) -> &VertexId {
+                &self.id
+            }
+
+            async fn compute(
+                &self,
+                ctx: &mut ComputeContext<'_, TestState, WorkflowMessage>,
+            ) -> Result<ComputeResult<TestUpdate>, PregelError> {
+                // Keep running until terminal state
+                ctx.send_message(self.id.clone(), WorkflowMessage::Activate);
+                Ok(ComputeResult::active(TestUpdate::empty()))
+            }
+        }
+
+        let mut runtime: PregelRuntime<TestState, WorkflowMessage> = PregelRuntime::new();
+
+        runtime.add_vertex(Arc::new(CounterVertex {
+            id: VertexId::new("counter"),
+        }));
+
+        // Start with counter at 10, which is terminal
+        let result = runtime
+            .run(TestState {
+                counter: 10,
+                messages_received: 0,
+            })
+            .await
+            .unwrap();
+
+        assert!(result.completed);
+        assert_eq!(result.supersteps, 0); // Terminates immediately
+    }
+
+    #[tokio::test]
+    async fn test_runtime_parallel_execution() {
+        use std::sync::atomic::{AtomicUsize, Ordering};
+        use std::time::Instant;
+
+        static EXECUTION_COUNT: AtomicUsize = AtomicUsize::new(0);
+
+        struct SlowVertex {
+            id: VertexId,
+        }
+
+        #[async_trait]
+        impl Vertex<TestState, WorkflowMessage> for SlowVertex {
+            fn id(&self) -> &VertexId {
+                &self.id
+            }
+
+            async fn compute(
+                &self,
+                _ctx: &mut ComputeContext<'_, TestState, WorkflowMessage>,
+            ) -> Result<ComputeResult<TestUpdate>, PregelError> {
+                EXECUTION_COUNT.fetch_add(1, Ordering::SeqCst);
+                tokio::time::sleep(Duration::from_millis(50)).await;
+                Ok(ComputeResult::halt(TestUpdate::empty()))
+            }
+        }
+
+        EXECUTION_COUNT.store(0, Ordering::SeqCst);
+
+        let config = PregelConfig::default().with_parallelism(4);
+        let mut runtime: PregelRuntime<TestState, WorkflowMessage> =
+            PregelRuntime::with_config(config);
+
+        // Add 4 slow vertices
+        for i in 0..4 {
+            runtime.add_vertex(Arc::new(SlowVertex {
+                id: VertexId::new(format!("slow_{}", i)),
+            }));
+        }
+
+        let start = Instant::now();
+        let result = runtime.run(TestState::default()).await.unwrap();
+        let elapsed = start.elapsed();
+
+        assert!(result.completed);
+        assert_eq!(EXECUTION_COUNT.load(Ordering::SeqCst), 4);
+        // With parallelism=4, should take ~50ms, not ~200ms
+        assert!(elapsed < Duration::from_millis(150));
+    }
+
+    #[tokio::test]
+    async fn test_runtime_add_edge() {
+        let mut runtime: PregelRuntime<TestState, WorkflowMessage> = PregelRuntime::new();
+
+        runtime
+            .add_vertex(Arc::new(IncrementVertex {
+                id: VertexId::new("a"),
+                increment: 1,
+            }))
+            .add_edge("a", "b");
+
+        assert!(runtime.edges.contains_key(&VertexId::new("a")));
+    }
+
+    // ============================================
+    // C2: Workflow Timeout Tests (RED - should fail)
+    // ============================================
+
+    #[tokio::test]
+    async fn test_workflow_timeout_enforced() {
+        // Vertex that runs forever (simulates slow LLM calls)
+        struct SlowForeverVertex {
+            id: VertexId,
+        }
+
+        #[async_trait]
+        impl Vertex<TestState, WorkflowMessage> for SlowForeverVertex {
+            fn id(&self) -> &VertexId {
+                &self.id
+            }
+
+            async fn compute(
+                &self,
+                ctx: &mut ComputeContext<'_, TestState, WorkflowMessage>,
+            ) -> Result<ComputeResult<TestUpdate>, PregelError> {
+                // Sleep for a long time but stay active
+                tokio::time::sleep(Duration::from_secs(10)).await;
+                ctx.send_message(self.id.clone(), WorkflowMessage::Activate);
+                Ok(ComputeResult::active(TestUpdate::empty()))
+            }
+        }
+
+        // Set a very short workflow timeout (100ms)
+        let config = PregelConfig::default()
+            .with_workflow_timeout(Duration::from_millis(100))
+            .with_vertex_timeout(Duration::from_secs(60)) // vertex timeout is longer
+            .with_max_supersteps(1000); // high limit so it doesn't hit this first
+
+        let mut runtime: PregelRuntime<TestState, WorkflowMessage> =
+            PregelRuntime::with_config(config);
+
+        runtime.add_vertex(Arc::new(SlowForeverVertex {
+            id: VertexId::new("slow"),
+        }));
+
+        let start = std::time::Instant::now();
+        let result = runtime.run(TestState::default()).await;
+        let elapsed = start.elapsed();
+
+        // Should timeout within ~200ms (some tolerance)
+        assert!(elapsed < Duration::from_millis(500), "Took too long: {:?}", elapsed);
+
+        // Should return WorkflowTimeout error
+        assert!(result.is_err());
+        let err = result.unwrap_err();
+        assert!(
+            matches!(err, PregelError::WorkflowTimeout(_)),
+            "Expected WorkflowTimeout, got {:?}",
+            err
+        );
+    }
+
+    // ============================================
+    // C3: Retry Policy Tests (RED - should fail)
+    // ============================================
+
+    #[tokio::test]
+    async fn test_retry_policy_with_backoff() {
+        use std::sync::atomic::{AtomicUsize, Ordering};
+
+        static ATTEMPT_COUNT: AtomicUsize = AtomicUsize::new(0);
+
+        // Vertex that fails first 2 times, then succeeds
+        struct FailingThenSuccessVertex {
+            id: VertexId,
+        }
+
+        #[async_trait]
+        impl Vertex<TestState, WorkflowMessage> for FailingThenSuccessVertex {
+            fn id(&self) -> &VertexId {
+                &self.id
+            }
+
+            async fn compute(
+                &self,
+                _ctx: &mut ComputeContext<'_, TestState, WorkflowMessage>,
+            ) -> Result<ComputeResult<TestUpdate>, PregelError> {
+                let attempt = ATTEMPT_COUNT.fetch_add(1, Ordering::SeqCst);
+                if attempt < 2 {
+                    // Fail with recoverable error
+                    Err(PregelError::vertex_error(self.id.clone(), format!("transient failure {}", attempt)))
+                } else {
+                    // Succeed on 3rd attempt
+                    Ok(ComputeResult::halt(TestUpdate {
+                        counter_delta: 1,
+                        messages_delta: 0,
+                    }))
+                }
+            }
+        }
+
+        ATTEMPT_COUNT.store(0, Ordering::SeqCst);
+
+        let config = PregelConfig::default()
+            .with_retry_policy(
+                super::super::config::RetryPolicy::new(3)
+                    .with_backoff_base(Duration::from_millis(10))
+            )
+            .with_max_supersteps(20);
+
+        let mut runtime: PregelRuntime<TestState, WorkflowMessage> =
+            PregelRuntime::with_config(config);
+
+        runtime.add_vertex(Arc::new(FailingThenSuccessVertex {
+            id: VertexId::new("flaky"),
+        }));
+
+        let result = runtime.run(TestState::default()).await;
+
+        // Should succeed after retries
+        assert!(result.is_ok(), "Expected success after retries, got {:?}", result);
+
+        // Should have attempted 3 times (2 failures + 1 success)
+        assert_eq!(ATTEMPT_COUNT.load(Ordering::SeqCst), 3);
+    }
+
+    #[tokio::test]
+    async fn test_retry_policy_max_exceeded() {
+        use std::sync::atomic::{AtomicUsize, Ordering};
+
+        static FAIL_COUNT: AtomicUsize = AtomicUsize::new(0);
+
+        // Vertex that always fails
+        struct AlwaysFailsVertex {
+            id: VertexId,
+        }
+
+        #[async_trait]
+        impl Vertex<TestState, WorkflowMessage> for AlwaysFailsVertex {
+            fn id(&self) -> &VertexId {
+                &self.id
+            }
+
+            async fn compute(
+                &self,
+                _ctx: &mut ComputeContext<'_, TestState, WorkflowMessage>,
+            ) -> Result<ComputeResult<TestUpdate>, PregelError> {
+                FAIL_COUNT.fetch_add(1, Ordering::SeqCst);
+                Err(PregelError::vertex_error(self.id.clone(), "always fails"))
+            }
+        }
+
+        FAIL_COUNT.store(0, Ordering::SeqCst);
+
+        let config = PregelConfig::default()
+            .with_retry_policy(super::super::config::RetryPolicy::new(3))
+            .with_max_supersteps(100);
+
+        let mut runtime: PregelRuntime<TestState, WorkflowMessage> =
+            PregelRuntime::with_config(config);
+
+        runtime.add_vertex(Arc::new(AlwaysFailsVertex {
+            id: VertexId::new("failing"),
+        }));
+
+        let result = runtime.run(TestState::default()).await;
+
+        // Should fail with MaxRetriesExceeded
+        assert!(result.is_err());
+        let err = result.unwrap_err();
+        assert!(
+            matches!(err, PregelError::MaxRetriesExceeded { .. }),
+            "Expected MaxRetriesExceeded, got {:?}",
+            err
+        );
+
+        // Should have tried exactly max_retries + 1 times (initial + retries)
+        assert_eq!(FAIL_COUNT.load(Ordering::SeqCst), 4); // 1 initial + 3 retries
+    }
+}
--- a/rust-research-agent/crates/rig-deepagents/src/pregel/state.rs
+++ b/rust-research-agent/crates/rig-deepagents/src/pregel/state.rs
@@ -0,0 +1,297 @@
+//! Workflow state abstraction for Pregel runtime
+//!
+//! Defines how workflow state is updated and merged during supersteps.
+//! The runtime collects updates from all vertices and applies them atomically
+//! at the end of each superstep.
+
+use super::vertex::StateUpdate;
+
+/// Trait for workflow state managed by the Pregel runtime
+///
+/// The workflow state represents the shared data that vertices can read
+/// and update during computation. Updates are collected and merged at the
+/// end of each superstep.
+///
+/// # Example
+///
+/// ```ignore
+/// #[derive(Clone, Default)]
+/// struct ResearchState {
+///     findings: Vec<Finding>,
+///     phase: ResearchPhase,
+///     completed_topics: HashSet<String>,
+/// }
+///
+/// impl WorkflowState for ResearchState {
+///     type Update = ResearchUpdate;
+///
+///     fn apply_update(&self, update: Self::Update) -> Self {
+///         let mut new = self.clone();
+///         new.findings.extend(update.new_findings);
+///         new.completed_topics.extend(update.completed);
+///         if let Some(phase) = update.phase_transition {
+///             new.phase = phase;
+///         }
+///         new
+///     }
+///
+///     fn merge_updates(updates: Vec<Self::Update>) -> Self::Update {
+///         ResearchUpdate {
+///             new_findings: updates.iter().flat_map(|u| u.new_findings.clone()).collect(),
+///             completed: updates.iter().flat_map(|u| u.completed.clone()).collect(),
+///             phase_transition: updates.iter().find_map(|u| u.phase_transition.clone()),
+///         }
+///     }
+///
+///     fn is_terminal(&self) -> bool {
+///         self.phase == ResearchPhase::Complete
+///     }
+/// }
+/// ```
+pub trait WorkflowState: Clone + Send + Sync + 'static {
+    /// The update type produced by vertices
+    type Update: StateUpdate;
+
+    /// Apply an update to produce a new state
+    ///
+    /// This should be a pure function - the original state is not modified.
+    fn apply_update(&self, update: Self::Update) -> Self;
+
+    /// Merge multiple updates into a single update
+    ///
+    /// Called when multiple vertices produce updates in the same superstep.
+    /// The merge should be deterministic (order-independent for correctness).
+    fn merge_updates(updates: Vec<Self::Update>) -> Self::Update;
+
+    /// Check if the state represents a terminal condition
+    ///
+    /// When true, the workflow will terminate regardless of vertex states.
+    fn is_terminal(&self) -> bool {
+        false
+    }
+
+    /// Apply multiple updates in sequence
+    ///
+    /// Default implementation merges updates then applies the result.
+    fn apply_updates(&self, updates: Vec<Self::Update>) -> Self {
+        if updates.is_empty() {
+            return self.clone();
+        }
+        let merged = Self::merge_updates(updates);
+        self.apply_update(merged)
+    }
+}
+
+/// A simple unit state for workflows that don't need shared state
+///
+/// Useful for workflows where all communication is via messages.
+#[derive(Debug, Clone, Default, serde::Serialize, serde::Deserialize)]
+pub struct UnitState;
+
+/// Unit update that has no effect
+#[derive(Debug, Clone, Default, serde::Serialize, serde::Deserialize)]
+pub struct UnitUpdate;
+
+impl StateUpdate for UnitUpdate {
+    fn empty() -> Self {
+        UnitUpdate
+    }
+
+    fn is_empty(&self) -> bool {
+        true
+    }
+}
+
+impl WorkflowState for UnitState {
+    type Update = UnitUpdate;
+
+    fn apply_update(&self, _update: Self::Update) -> Self {
+        UnitState
+    }
+
+    fn merge_updates(_updates: Vec<Self::Update>) -> Self::Update {
+        UnitUpdate
+    }
+
+    fn is_terminal(&self) -> bool {
+        false
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use std::collections::HashSet;
+
+    // Counter-based state for testing
+    #[derive(Clone, Default, Debug, PartialEq)]
+    struct CounterState {
+        count: i32,
+    }
+
+    #[derive(Clone, Debug)]
+    struct CounterUpdate {
+        delta: i32,
+    }
+
+    impl StateUpdate for CounterUpdate {
+        fn empty() -> Self {
+            CounterUpdate { delta: 0 }
+        }
+
+        fn is_empty(&self) -> bool {
+            self.delta == 0
+        }
+    }
+
+    impl WorkflowState for CounterState {
+        type Update = CounterUpdate;
+
+        fn apply_update(&self, update: Self::Update) -> Self {
+            CounterState {
+                count: self.count + update.delta,
+            }
+        }
+
+        fn merge_updates(updates: Vec<Self::Update>) -> Self::Update {
+            CounterUpdate {
+                delta: updates.iter().map(|u| u.delta).sum(),
+            }
+        }
+
+        fn is_terminal(&self) -> bool {
+            self.count >= 100
+        }
+    }
+
+    #[test]
+    fn test_state_update_merge() {
+        let updates = vec![
+            CounterUpdate { delta: 5 },
+            CounterUpdate { delta: 3 },
+            CounterUpdate { delta: -2 },
+        ];
+        let merged = CounterState::merge_updates(updates);
+        assert_eq!(merged.delta, 6);
+    }
+
+    #[test]
+    fn test_state_apply_update() {
+        let state = CounterState { count: 10 };
+        let update = CounterUpdate { delta: 5 };
+        let new_state = state.apply_update(update);
+        assert_eq!(new_state.count, 15);
+        // Original state unchanged (immutable)
+        assert_eq!(state.count, 10);
+    }
+
+    #[test]
+    fn test_state_apply_updates() {
+        let state = CounterState { count: 0 };
+        let updates = vec![
+            CounterUpdate { delta: 10 },
+            CounterUpdate { delta: 20 },
+            CounterUpdate { delta: 5 },
+        ];
+        let new_state = state.apply_updates(updates);
+        assert_eq!(new_state.count, 35);
+    }
+
+    #[test]
+    fn test_state_terminal_condition() {
+        let non_terminal = CounterState { count: 50 };
+        assert!(!non_terminal.is_terminal());
+
+        let terminal = CounterState { count: 100 };
+        assert!(terminal.is_terminal());
+
+        let over_terminal = CounterState { count: 150 };
+        assert!(over_terminal.is_terminal());
+    }
+
+    #[test]
+    fn test_empty_updates() {
+        let state = CounterState { count: 42 };
+        let new_state = state.apply_updates(vec![]);
+        assert_eq!(new_state.count, 42);
+    }
+
+    // More complex state for testing
+    #[derive(Clone, Default, Debug)]
+    struct CollectionState {
+        items: Vec<String>,
+        seen: HashSet<String>,
+    }
+
+    #[derive(Clone, Debug)]
+    struct CollectionUpdate {
+        new_items: Vec<String>,
+    }
+
+    impl StateUpdate for CollectionUpdate {
+        fn empty() -> Self {
+            CollectionUpdate { new_items: vec![] }
+        }
+
+        fn is_empty(&self) -> bool {
+            self.new_items.is_empty()
+        }
+    }
+
+    impl WorkflowState for CollectionState {
+        type Update = CollectionUpdate;
+
+        fn apply_update(&self, update: Self::Update) -> Self {
+            let mut items = self.items.clone();
+            let mut seen = self.seen.clone();
+
+            for item in update.new_items {
+                if !seen.contains(&item) {
+                    seen.insert(item.clone());
+                    items.push(item);
+                }
+            }
+
+            CollectionState { items, seen }
+        }
+
+        fn merge_updates(updates: Vec<Self::Update>) -> Self::Update {
+            CollectionUpdate {
+                new_items: updates.into_iter().flat_map(|u| u.new_items).collect(),
+            }
+        }
+    }
+
+    #[test]
+    fn test_collection_state_dedup() {
+        let state = CollectionState::default();
+        let updates = vec![
+            CollectionUpdate {
+                new_items: vec!["a".to_string(), "b".to_string()],
+            },
+            CollectionUpdate {
+                new_items: vec!["b".to_string(), "c".to_string()],
+            },
+        ];
+
+        let new_state = state.apply_updates(updates);
+        // "b" should only appear once due to dedup in apply_update
+        assert_eq!(new_state.items.len(), 3);
+        assert_eq!(new_state.seen.len(), 3);
+    }
+
+    #[test]
+    fn test_unit_state() {
+        let state = UnitState;
+        let update = UnitUpdate;
+
+        assert!(update.is_empty());
+        assert!(UnitUpdate::empty().is_empty());
+
+        let new_state = state.apply_update(update);
+        assert!(!new_state.is_terminal());
+
+        let merged = UnitState::merge_updates(vec![UnitUpdate, UnitUpdate]);
+        assert!(merged.is_empty());
+    }
+}
--- a/rust-research-agent/crates/rig-deepagents/src/pregel/vertex.rs
+++ b/rust-research-agent/crates/rig-deepagents/src/pregel/vertex.rs
@@ -0,0 +1,562 @@
+//! Vertex (Node) abstractions for Pregel runtime
+//!
+//! A Vertex represents a computation unit in the workflow graph.
+//! Vertices communicate via messages and execute in synchronized supersteps.
+
+use async_trait::async_trait;
+use serde::{Deserialize, Serialize};
+use std::collections::HashMap;
+use std::hash::Hash;
+use std::sync::Arc;
+
+use super::error::PregelError;
+use super::message::VertexMessage;
+
+/// Unique identifier for a vertex in the workflow graph
+#[derive(Debug, Clone, PartialEq, Eq, Hash, Serialize, Deserialize)]
+pub struct VertexId(pub String);
+
+impl VertexId {
+    /// Create a new VertexId
+    pub fn new(id: impl Into<String>) -> Self {
+        Self(id.into())
+    }
+}
+
+impl From<&str> for VertexId {
+    fn from(s: &str) -> Self {
+        Self(s.to_string())
+    }
+}
+
+impl From<String> for VertexId {
+    fn from(s: String) -> Self {
+        Self(s)
+    }
+}
+
+impl std::fmt::Display for VertexId {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        write!(f, "{}", self.0)
+    }
+}
+
+/// Vertex execution state (Pregel's "vote to halt" mechanism)
+///
+/// - `Active`: Vertex will compute in the next superstep
+/// - `Halted`: Vertex has voted to halt (will reactivate on message receipt)
+/// - `Completed`: Vertex has finished and will not reactivate
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Default, Serialize, Deserialize)]
+pub enum VertexState {
+    /// Vertex is active and will compute in next superstep
+    #[default]
+    Active,
+    /// Vertex has voted to halt (will reactivate on message receipt)
+    Halted,
+    /// Vertex has completed and will not reactivate
+    Completed,
+}
+
+impl VertexState {
+    /// Check if the vertex is active
+    pub fn is_active(&self) -> bool {
+        matches!(self, VertexState::Active)
+    }
+
+    /// Check if the vertex is halted (can be reactivated)
+    pub fn is_halted(&self) -> bool {
+        matches!(self, VertexState::Halted)
+    }
+
+    /// Check if the vertex is completed (cannot be reactivated)
+    pub fn is_completed(&self) -> bool {
+        matches!(self, VertexState::Completed)
+    }
+}
+
+/// Trait for state updates produced by vertex computation
+///
+/// State updates are collected from all vertices and merged at the end of each superstep.
+pub trait StateUpdate: Clone + Send + Sync + 'static {
+    /// Create an empty (no-op) update
+    fn empty() -> Self;
+
+    /// Check if this update has no effect
+    fn is_empty(&self) -> bool;
+}
+
+/// Context provided to a vertex during computation
+///
+/// Provides access to:
+/// - Incoming messages from other vertices
+/// - Outbox for sending messages
+/// - Current superstep number
+/// - Workflow state (read-only)
+pub struct ComputeContext<'a, S, M: VertexMessage> {
+    /// Messages received from other vertices
+    pub messages: &'a [M],
+    /// Current superstep number (0-indexed)
+    pub superstep: usize,
+    /// Read-only access to workflow state
+    pub state: &'a S,
+    /// Outgoing messages (target vertex -> messages)
+    outbox: HashMap<VertexId, Vec<M>>,
+    /// Current vertex ID
+    vertex_id: VertexId,
+}
+
+impl<'a, S, M: VertexMessage> ComputeContext<'a, S, M> {
+    /// Create a new compute context
+    pub fn new(vertex_id: VertexId, messages: &'a [M], superstep: usize, state: &'a S) -> Self {
+        Self {
+            messages,
+            superstep,
+            state,
+            outbox: HashMap::new(),
+            vertex_id,
+        }
+    }
+
+    /// Get the current vertex ID
+    pub fn id(&self) -> &VertexId {
+        &self.vertex_id
+    }
+
+    /// Send a message to another vertex
+    ///
+    /// Messages will be delivered at the start of the next superstep.
+    pub fn send_message(&mut self, target: impl Into<VertexId>, message: M) {
+        let target = target.into();
+        self.outbox.entry(target).or_default().push(message);
+    }
+
+    /// Send a message to multiple targets
+    pub fn broadcast(&mut self, targets: impl IntoIterator<Item = impl Into<VertexId>>, message: M) {
+        for target in targets {
+            self.send_message(target.into(), message.clone());
+        }
+    }
+
+    /// Check if this is the first superstep
+    pub fn is_first_superstep(&self) -> bool {
+        self.superstep == 0
+    }
+
+    /// Check if any messages were received
+    pub fn has_messages(&self) -> bool {
+        !self.messages.is_empty()
+    }
+
+    /// Get the count of received messages
+    pub fn message_count(&self) -> usize {
+        self.messages.len()
+    }
+
+    /// Consume the context and return the outbox
+    pub fn into_outbox(self) -> HashMap<VertexId, Vec<M>> {
+        self.outbox
+    }
+}
+
+use super::state::WorkflowState;
+
+/// The core vertex trait for Pregel computation
+///
+/// Each vertex in the workflow graph implements this trait.
+/// During each superstep, active vertices have their `compute` method called.
+///
+/// # Type Parameters
+///
+/// - `S`: The workflow state type (must implement WorkflowState)
+/// - `M`: The message type used for vertex communication
+///
+/// # Example
+///
+/// ```ignore
+/// struct EchoVertex {
+///     id: VertexId,
+/// }
+///
+/// #[async_trait]
+/// impl Vertex<MyState, WorkflowMessage> for EchoVertex {
+///     fn id(&self) -> &VertexId {
+///         &self.id
+///     }
+///
+///     async fn compute(
+///         &self,
+///         ctx: &mut ComputeContext<'_, MyState, WorkflowMessage>,
+///     ) -> Result<ComputeResult<MyUpdate>, PregelError> {
+///         for msg in ctx.messages {
+///             if let WorkflowMessage::Data { key, value } = msg {
+///                 ctx.send_message("output", WorkflowMessage::data(format!("echo_{}", key), value.clone()));
+///             }
+///         }
+///         Ok(ComputeResult::halt(MyUpdate::empty()))
+///     }
+/// }
+/// ```
+#[async_trait]
+pub trait Vertex<S, M>: Send + Sync
+where
+    S: WorkflowState,
+    M: VertexMessage,
+{
+    /// Get the vertex's unique identifier
+    fn id(&self) -> &VertexId;
+
+    /// Execute the vertex's computation
+    ///
+    /// Called during each superstep for active vertices.
+    /// Returns a state update and the next vertex state.
+    async fn compute(
+        &self,
+        ctx: &mut ComputeContext<'_, S, M>,
+    ) -> Result<ComputeResult<S::Update>, PregelError>;
+
+    /// Combine multiple messages into one (optional optimization)
+    ///
+    /// Default implementation returns messages unchanged.
+    /// Override to reduce message traffic for commutative/associative operations.
+    fn combine_messages(&self, messages: Vec<M>) -> Vec<M> {
+        messages
+    }
+
+    /// Called when the vertex receives messages while halted
+    ///
+    /// By default, returns `Active` to reactivate the vertex.
+    fn on_reactivation(&self, _messages: &[M]) -> VertexState {
+        VertexState::Active
+    }
+}
+
+/// Result of a vertex computation
+#[derive(Debug, Clone)]
+pub struct ComputeResult<U: StateUpdate> {
+    /// State update to apply
+    pub update: U,
+    /// New vertex state
+    pub state: VertexState,
+}
+
+impl<U: StateUpdate> ComputeResult<U> {
+    /// Create a result that keeps the vertex active
+    pub fn active(update: U) -> Self {
+        Self {
+            update,
+            state: VertexState::Active,
+        }
+    }
+
+    /// Create a result that halts the vertex
+    pub fn halt(update: U) -> Self {
+        Self {
+            update,
+            state: VertexState::Halted,
+        }
+    }
+
+    /// Create a result that completes the vertex
+    pub fn complete(update: U) -> Self {
+        Self {
+            update,
+            state: VertexState::Completed,
+        }
+    }
+
+    /// Create a result with a specific state
+    pub fn with_state(update: U, state: VertexState) -> Self {
+        Self { update, state }
+    }
+}
+
+/// Boxed vertex for dynamic dispatch
+pub type BoxedVertex<S, M> = Arc<dyn Vertex<S, M>>;
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use super::super::message::WorkflowMessage;
+
+    // Test StateUpdate implementation for tests
+    #[derive(Clone, Debug, Default)]
+    struct TestUpdate {
+        delta: i32,
+    }
+
+    impl StateUpdate for TestUpdate {
+        fn empty() -> Self {
+            TestUpdate { delta: 0 }
+        }
+
+        fn is_empty(&self) -> bool {
+            self.delta == 0
+        }
+    }
+
+    // Test state for tests
+    #[derive(Clone, Debug, Default)]
+    #[allow(dead_code)]
+    struct TestState {
+        value: i32,
+    }
+
+    impl super::super::state::WorkflowState for TestState {
+        type Update = TestUpdate;
+
+        fn apply_update(&self, update: Self::Update) -> Self {
+            TestState {
+                value: self.value + update.delta,
+            }
+        }
+
+        fn merge_updates(updates: Vec<Self::Update>) -> Self::Update {
+            TestUpdate {
+                delta: updates.iter().map(|u| u.delta).sum(),
+            }
+        }
+    }
+
+    // Mock vertex for testing
+    struct EchoVertex {
+        id: VertexId,
+    }
+
+    #[async_trait]
+    impl Vertex<TestState, WorkflowMessage> for EchoVertex {
+        fn id(&self) -> &VertexId {
+            &self.id
+        }
+
+        async fn compute(
+            &self,
+            ctx: &mut ComputeContext<'_, TestState, WorkflowMessage>,
+        ) -> Result<ComputeResult<TestUpdate>, PregelError> {
+            // Echo back any data messages
+            for msg in ctx.messages {
+                if let WorkflowMessage::Data { key, value } = msg {
+                    ctx.send_message(
+                        "output",
+                        WorkflowMessage::data(format!("echo_{}", key), value.clone()),
+                    );
+                }
+            }
+            Ok(ComputeResult::halt(TestUpdate::empty()))
+        }
+    }
+
+    #[tokio::test]
+    async fn test_mock_vertex_compute() {
+        let vertex = EchoVertex {
+            id: VertexId::new("echo"),
+        };
+
+        let state = TestState { value: 42 };
+        let messages = vec![WorkflowMessage::data("test", "hello")];
+
+        let mut ctx = ComputeContext::new(
+            VertexId::new("echo"),
+            &messages,
+            0,
+            &state,
+        );
+
+        let result = vertex.compute(&mut ctx).await;
+        assert!(result.is_ok());
+
+        let result = result.unwrap();
+        assert!(result.state.is_halted());
+
+        let outbox = ctx.into_outbox();
+        assert!(outbox.contains_key(&VertexId::new("output")));
+        assert_eq!(outbox.get(&VertexId::new("output")).unwrap().len(), 1);
+    }
+
+    #[test]
+    fn test_compute_context_send_message() {
+        let state = TestState { value: 0 };
+        let messages: Vec<WorkflowMessage> = vec![];
+
+        let mut ctx = ComputeContext::new(
+            VertexId::new("test"),
+            &messages,
+            0,
+            &state,
+        );
+
+        ctx.send_message("target1", WorkflowMessage::Activate);
+        ctx.send_message("target1", WorkflowMessage::Halt);
+        ctx.send_message("target2", WorkflowMessage::Activate);
+
+        let outbox = ctx.into_outbox();
+        assert_eq!(outbox.get(&VertexId::new("target1")).unwrap().len(), 2);
+        assert_eq!(outbox.get(&VertexId::new("target2")).unwrap().len(), 1);
+    }
+
+    #[test]
+    fn test_compute_context_broadcast() {
+        let state = TestState { value: 0 };
+        let messages: Vec<WorkflowMessage> = vec![];
+
+        let mut ctx = ComputeContext::new(
+            VertexId::new("broadcaster"),
+            &messages,
+            0,
+            &state,
+        );
+
+        let targets = vec!["a", "b", "c"];
+        ctx.broadcast(targets, WorkflowMessage::Activate);
+
+        let outbox = ctx.into_outbox();
+        assert_eq!(outbox.len(), 3);
+        assert!(outbox.contains_key(&VertexId::new("a")));
+        assert!(outbox.contains_key(&VertexId::new("b")));
+        assert!(outbox.contains_key(&VertexId::new("c")));
+    }
+
+    #[test]
+    fn test_compute_context_helpers() {
+        let state = TestState { value: 0 };
+        let messages = vec![WorkflowMessage::Activate, WorkflowMessage::Halt];
+
+        let ctx = ComputeContext::<TestState, WorkflowMessage>::new(
+            VertexId::new("test"),
+            &messages,
+            0,
+            &state,
+        );
+
+        assert!(ctx.is_first_superstep());
+        assert!(ctx.has_messages());
+        assert_eq!(ctx.message_count(), 2);
+        assert_eq!(ctx.id(), &VertexId::new("test"));
+
+        let ctx2 = ComputeContext::<TestState, WorkflowMessage>::new(
+            VertexId::new("test2"),
+            &[],
+            5,
+            &state,
+        );
+
+        assert!(!ctx2.is_first_superstep());
+        assert!(!ctx2.has_messages());
+        assert_eq!(ctx2.message_count(), 0);
+    }
+
+    #[test]
+    fn test_compute_result_constructors() {
+        let update = TestUpdate { delta: 5 };
+
+        let active = ComputeResult::active(update.clone());
+        assert!(active.state.is_active());
+
+        let halted = ComputeResult::halt(update.clone());
+        assert!(halted.state.is_halted());
+
+        let completed = ComputeResult::complete(update.clone());
+        assert!(completed.state.is_completed());
+
+        let custom = ComputeResult::with_state(update, VertexState::Halted);
+        assert!(custom.state.is_halted());
+    }
+
+    #[test]
+    fn test_state_update_trait() {
+        let empty = TestUpdate::empty();
+        assert!(empty.is_empty());
+
+        let non_empty = TestUpdate { delta: 10 };
+        assert!(!non_empty.is_empty());
+    }
+
+    #[test]
+    fn test_vertex_state_helpers() {
+        assert!(VertexState::Active.is_active());
+        assert!(!VertexState::Active.is_halted());
+        assert!(!VertexState::Active.is_completed());
+
+        assert!(!VertexState::Halted.is_active());
+        assert!(VertexState::Halted.is_halted());
+        assert!(!VertexState::Halted.is_completed());
+
+        assert!(!VertexState::Completed.is_active());
+        assert!(!VertexState::Completed.is_halted());
+        assert!(VertexState::Completed.is_completed());
+    }
+
+    #[test]
+    fn test_vertex_id_from_str() {
+        let id: VertexId = "planner".into();
+        assert_eq!(id.0, "planner");
+    }
+
+    #[test]
+    fn test_vertex_id_from_string() {
+        let id: VertexId = String::from("router").into();
+        assert_eq!(id.0, "router");
+    }
+
+    #[test]
+    fn test_vertex_id_new() {
+        let id = VertexId::new("explorer");
+        assert_eq!(id.0, "explorer");
+    }
+
+    #[test]
+    fn test_vertex_id_equality() {
+        let id1: VertexId = "node1".into();
+        let id2: VertexId = "node1".into();
+        let id3: VertexId = "node2".into();
+        assert_eq!(id1, id2);
+        assert_ne!(id1, id3);
+    }
+
+    #[test]
+    fn test_vertex_id_hash() {
+        use std::collections::HashSet;
+        let mut set = HashSet::new();
+        set.insert(VertexId::from("a"));
+        set.insert(VertexId::from("b"));
+        set.insert(VertexId::from("a")); // duplicate
+        assert_eq!(set.len(), 2);
+    }
+
+    #[test]
+    fn test_vertex_id_display() {
+        let id = VertexId::new("test_node");
+        assert_eq!(format!("{}", id), "test_node");
+    }
+
+    #[test]
+    fn test_vertex_state_default_is_active() {
+        assert_eq!(VertexState::default(), VertexState::Active);
+    }
+
+    #[test]
+    fn test_vertex_state_variants() {
+        let active = VertexState::Active;
+        let halted = VertexState::Halted;
+        let completed = VertexState::Completed;
+
+        assert_ne!(active, halted);
+        assert_ne!(halted, completed);
+        assert_ne!(active, completed);
+    }
+
+    #[test]
+    fn test_vertex_id_serialization() {
+        let id = VertexId::new("test");
+        let json = serde_json::to_string(&id).unwrap();
+        let deserialized: VertexId = serde_json::from_str(&json).unwrap();
+        assert_eq!(id, deserialized);
+    }
+
+    #[test]
+    fn test_vertex_state_serialization() {
+        let state = VertexState::Halted;
+        let json = serde_json::to_string(&state).unwrap();
+        let deserialized: VertexState = serde_json::from_str(&json).unwrap();
+        assert_eq!(state, deserialized);
+    }
+}
--- a/rust-research-agent/crates/rig-deepagents/src/workflow/mod.rs
+++ b/rust-research-agent/crates/rig-deepagents/src/workflow/mod.rs
@@ -0,0 +1,57 @@
+//! Workflow Graph System for Pregel-Based Agent Orchestration
+//!
+//! This module provides the building blocks for constructing and executing
+//! agent workflows using a Pregel-inspired graph execution model.
+//!
+//! # Overview
+//!
+//! ```text
+//! ┌─────────────────────────────────────────────────────────────┐
+//! │                     WorkflowGraph                            │
+//! │  ┌─────────────────────────────────────────────────────┐    │
+//! │  │                    Nodes                             │    │
+//! │  │  ┌─────────┐  ┌─────────┐  ┌─────────┐              │    │
+//! │  │  │  Agent  │→ │ Router  │→ │ SubAgent│              │    │
+//! │  │  └─────────┘  └────┬────┘  └─────────┘              │    │
+//! │  │                    │                                 │    │
+//! │  │              ┌─────▼─────┐                           │    │
+//! │  │              │   Tool    │                           │    │
+//! │  │              └───────────┘                           │    │
+//! │  └─────────────────────────────────────────────────────┘    │
+//! │                                                              │
+//! │  Compile via WorkflowBuilder → CompiledWorkflow              │
+//! │  Execute via PregelRuntime                                   │
+//! └─────────────────────────────────────────────────────────────┘
+//! ```
+//!
+//! # Usage
+//!
+//! ```ignore
+//! use rig_deepagents::workflow::{WorkflowGraph, NodeKind, AgentNodeConfig};
+//!
+//! let workflow = WorkflowGraph::<MyState>::new()
+//!     .name("research_agent")
+//!     .node("planner", NodeKind::Agent(AgentNodeConfig {
+//!         system_prompt: "Plan the research...".into(),
+//!         ..Default::default()
+//!     }))
+//!     .node("researcher", NodeKind::Agent(AgentNodeConfig {
+//!         system_prompt: "Execute research...".into(),
+//!         ..Default::default()
+//!     }))
+//!     .entry("planner")
+//!     .edge("planner", "researcher")
+//!     .edge("researcher", END)
+//!     .build()?;
+//! ```
+
+pub mod node;
+pub mod vertices;
+
+pub use node::{
+    AgentNodeConfig, Branch, BranchCondition, FanInNodeConfig, FanOutNodeConfig, MergeStrategy,
+    NodeKind, RouterNodeConfig, RoutingStrategy, SplitStrategy, StopCondition, SubAgentNodeConfig,
+    ToolNodeConfig,
+};
+
+pub use vertices::agent::AgentVertex;
--- a/rust-research-agent/crates/rig-deepagents/src/workflow/node.rs
+++ b/rust-research-agent/crates/rig-deepagents/src/workflow/node.rs
@@ -0,0 +1,513 @@
+//! Node Types and Configuration for Workflow Graphs
+//!
+//! Defines the different types of nodes that can exist in a workflow graph.
+//! Each node type has specific behavior and configuration options.
+//!
+//! # Node Types
+//!
+//! - **Agent**: LLM-based processing with tool calling capabilities
+//! - **Tool**: Single tool execution with static or dynamic arguments
+//! - **Router**: Conditional branching based on state or LLM decisions
+//! - **SubAgent**: Delegation to nested workflows with recursion protection
+//! - **FanOut**: Parallel dispatch to multiple targets
+//! - **FanIn**: Synchronization point waiting for multiple sources
+//! - **Passthrough**: Simple data forwarding (identity transformation)
+
+use serde::{Deserialize, Serialize};
+use std::collections::{HashMap, HashSet};
+use std::time::Duration;
+
+/// The kind of node in a workflow graph.
+///
+/// Each variant represents a different computation pattern.
+#[derive(Debug, Clone, Default, Serialize, Deserialize)]
+#[serde(tag = "type", rename_all = "snake_case")]
+pub enum NodeKind {
+    /// An LLM-based agent that can process messages and call tools
+    Agent(AgentNodeConfig),
+
+    /// A single tool execution node
+    Tool(ToolNodeConfig),
+
+    /// Conditional routing based on state or LLM decisions
+    Router(RouterNodeConfig),
+
+    /// Delegation to a sub-workflow
+    SubAgent(SubAgentNodeConfig),
+
+    /// Parallel dispatch to multiple targets
+    FanOut(FanOutNodeConfig),
+
+    /// Synchronization point waiting for multiple sources
+    FanIn(FanInNodeConfig),
+
+    /// Simple passthrough (identity transformation)
+    #[default]
+    Passthrough,
+}
+
+/// Configuration for an Agent node.
+///
+/// Agents use LLMs to process messages and can optionally call tools.
+/// They iterate until a stop condition is met.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct AgentNodeConfig {
+    /// System prompt for the agent
+    pub system_prompt: String,
+
+    /// Maximum iterations before forcing termination
+    #[serde(default = "default_max_iterations")]
+    pub max_iterations: usize,
+
+    /// Conditions that cause the agent to stop iterating
+    #[serde(default)]
+    pub stop_conditions: Vec<StopCondition>,
+
+    /// Tools the agent is allowed to use (None = all tools)
+    #[serde(default)]
+    pub allowed_tools: Option<HashSet<String>>,
+
+    /// Timeout for each LLM call
+    #[serde(default, with = "humantime_serde")]
+    pub llm_timeout: Option<Duration>,
+
+    /// Temperature for LLM calls
+    #[serde(default)]
+    pub temperature: Option<f32>,
+}
+
+impl Default for AgentNodeConfig {
+    fn default() -> Self {
+        Self {
+            system_prompt: String::new(),
+            max_iterations: 10,
+            stop_conditions: vec![StopCondition::NoToolCalls],
+            allowed_tools: None,
+            llm_timeout: None,
+            temperature: None,
+        }
+    }
+}
+
+fn default_max_iterations() -> usize {
+    10
+}
+
+/// Conditions that cause an agent to stop iterating.
+#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
+#[serde(tag = "type", rename_all = "snake_case")]
+pub enum StopCondition {
+    /// Stop when the LLM produces no tool calls
+    NoToolCalls,
+
+    /// Stop when a specific tool is called
+    OnTool { tool_name: String },
+
+    /// Stop when the message contains specific text
+    ContainsText { pattern: String },
+
+    /// Stop when a state field matches a condition
+    StateMatch { field: String, value: serde_json::Value },
+
+    /// Stop after a certain number of iterations
+    MaxIterations { count: usize },
+}
+
+/// Configuration for a Tool node.
+///
+/// Executes a single tool with arguments from static config or state.
+#[derive(Debug, Clone, Default, Serialize, Deserialize)]
+pub struct ToolNodeConfig {
+    /// Name of the tool to execute
+    #[serde(default)]
+    pub tool_name: String,
+
+    /// Static arguments (can be overridden by state paths)
+    #[serde(default)]
+    pub static_args: HashMap<String, serde_json::Value>,
+
+    /// Map from argument name to state path for dynamic arguments
+    #[serde(default)]
+    pub state_arg_paths: HashMap<String, String>,
+
+    /// Path in state where the result should be stored
+    #[serde(default)]
+    pub result_path: Option<String>,
+
+    /// Timeout for tool execution
+    #[serde(default, with = "humantime_serde")]
+    pub timeout: Option<Duration>,
+}
+
+/// Configuration for a Router node.
+///
+/// Determines next node based on state inspection or LLM decision.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct RouterNodeConfig {
+    /// How to make the routing decision
+    pub strategy: RoutingStrategy,
+
+    /// Branches to evaluate (in order for StateField strategy)
+    pub branches: Vec<Branch>,
+
+    /// Default branch if no conditions match (required for StateField)
+    #[serde(default)]
+    pub default: Option<String>,
+}
+
+impl Default for RouterNodeConfig {
+    fn default() -> Self {
+        Self {
+            strategy: RoutingStrategy::StateField {
+                field: String::new(),
+            },
+            branches: Vec::new(),
+            default: None,
+        }
+    }
+}
+
+/// Strategy for making routing decisions.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+#[serde(tag = "type", rename_all = "snake_case")]
+pub enum RoutingStrategy {
+    /// Route based on a state field value
+    StateField {
+        /// Path to the state field to inspect
+        field: String,
+    },
+
+    /// Route based on LLM classification
+    LLMDecision {
+        /// Prompt describing the options to the LLM
+        prompt: String,
+        /// Model to use (optional, uses default if not specified)
+        model: Option<String>,
+    },
+}
+
+/// A branch in a routing decision.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct Branch {
+    /// Target node for this branch
+    pub target: String,
+
+    /// Condition that must be true for this branch
+    pub condition: BranchCondition,
+}
+
+/// Condition for a routing branch.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+#[serde(tag = "op", rename_all = "snake_case")]
+pub enum BranchCondition {
+    /// Value equals expected
+    Equals { value: serde_json::Value },
+
+    /// Value is in set of options
+    In { values: Vec<serde_json::Value> },
+
+    /// Value matches regex pattern
+    Matches { pattern: String },
+
+    /// Value is truthy (non-null, non-empty, non-false)
+    IsTruthy,
+
+    /// Value is falsy
+    IsFalsy,
+
+    /// Always true (used for catch-all branches)
+    Always,
+}
+
+/// Configuration for a SubAgent node.
+///
+/// Delegates work to a nested workflow with recursion protection.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct SubAgentNodeConfig {
+    /// Name of the sub-agent to invoke
+    pub agent_name: String,
+
+    /// Maximum recursion depth (prevents infinite nesting)
+    #[serde(default = "default_max_recursion")]
+    pub max_recursion: usize,
+
+    /// Mapping from parent state paths to sub-agent input
+    #[serde(default)]
+    pub input_mapping: HashMap<String, String>,
+
+    /// Mapping from sub-agent output to parent state paths
+    #[serde(default)]
+    pub output_mapping: HashMap<String, String>,
+
+    /// Timeout for the entire sub-agent execution
+    #[serde(default, with = "humantime_serde")]
+    pub timeout: Option<Duration>,
+}
+
+impl Default for SubAgentNodeConfig {
+    fn default() -> Self {
+        Self {
+            agent_name: String::new(),
+            max_recursion: 5,
+            input_mapping: HashMap::new(),
+            output_mapping: HashMap::new(),
+            timeout: None,
+        }
+    }
+}
+
+fn default_max_recursion() -> usize {
+    5
+}
+
+/// Configuration for a FanOut node.
+///
+/// Broadcasts messages to multiple targets in parallel.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct FanOutNodeConfig {
+    /// Target nodes to send to
+    pub targets: Vec<String>,
+
+    /// How to split the work among targets
+    #[serde(default)]
+    pub split_strategy: SplitStrategy,
+
+    /// Path to array in state to split (for Split strategy)
+    #[serde(default)]
+    pub split_path: Option<String>,
+}
+
+impl Default for FanOutNodeConfig {
+    fn default() -> Self {
+        Self {
+            targets: Vec::new(),
+            split_strategy: SplitStrategy::Broadcast,
+            split_path: None,
+        }
+    }
+}
+
+/// Strategy for splitting work in a FanOut node.
+#[derive(Debug, Clone, Default, Serialize, Deserialize)]
+#[serde(rename_all = "snake_case")]
+pub enum SplitStrategy {
+    /// Send the same message to all targets
+    #[default]
+    Broadcast,
+
+    /// Split an array and send one element to each target
+    Split,
+
+    /// Round-robin distribution
+    RoundRobin,
+}
+
+/// Configuration for a FanIn node.
+///
+/// Waits for messages from multiple sources and merges them.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct FanInNodeConfig {
+    /// Source nodes to wait for
+    pub sources: Vec<String>,
+
+    /// How to merge results from sources
+    #[serde(default)]
+    pub merge_strategy: MergeStrategy,
+
+    /// Path in state where merged results are stored
+    #[serde(default)]
+    pub result_path: Option<String>,
+
+    /// Timeout for waiting for all sources
+    #[serde(default, with = "humantime_serde")]
+    pub timeout: Option<Duration>,
+}
+
+impl Default for FanInNodeConfig {
+    fn default() -> Self {
+        Self {
+            sources: Vec::new(),
+            merge_strategy: MergeStrategy::Collect,
+            result_path: None,
+            timeout: None,
+        }
+    }
+}
+
+/// Strategy for merging results in a FanIn node.
+#[derive(Debug, Clone, Default, Serialize, Deserialize)]
+#[serde(rename_all = "snake_case")]
+pub enum MergeStrategy {
+    /// Collect all results into an array
+    #[default]
+    Collect,
+
+    /// Use first result that arrives
+    First,
+
+    /// Use last result (all must complete)
+    Last,
+
+    /// Concatenate string results
+    Concat,
+
+    /// Merge object results (later values overwrite)
+    Merge,
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_node_kind_serialization() {
+        let agent = NodeKind::Agent(AgentNodeConfig {
+            system_prompt: "You are helpful.".into(),
+            ..Default::default()
+        });
+
+        let json = serde_json::to_string(&agent).unwrap();
+        let deserialized: NodeKind = serde_json::from_str(&json).unwrap();
+
+        match deserialized {
+            NodeKind::Agent(config) => {
+                assert_eq!(config.system_prompt, "You are helpful.");
+            }
+            _ => panic!("Wrong variant"),
+        }
+    }
+
+    #[test]
+    fn test_tool_node_config() {
+        let tool = ToolNodeConfig {
+            tool_name: "search".into(),
+            static_args: [("query".into(), serde_json::json!("test"))].into(),
+            state_arg_paths: [("max_results".into(), "config.limit".into())].into(),
+            result_path: Some("search_results".into()),
+            timeout: Some(Duration::from_secs(30)),
+        };
+
+        let json = serde_json::to_string(&tool).unwrap();
+        assert!(json.contains("search"));
+        assert!(json.contains("query"));
+    }
+
+    #[test]
+    fn test_router_config_with_branches() {
+        let router = RouterNodeConfig {
+            strategy: RoutingStrategy::StateField {
+                field: "phase".into(),
+            },
+            branches: vec![
+                Branch {
+                    target: "explore".into(),
+                    condition: BranchCondition::Equals {
+                        value: serde_json::json!("exploratory"),
+                    },
+                },
+                Branch {
+                    target: "synthesize".into(),
+                    condition: BranchCondition::Equals {
+                        value: serde_json::json!("synthesis"),
+                    },
+                },
+            ],
+            default: Some("done".into()),
+        };
+
+        let json = serde_json::to_string(&router).unwrap();
+        let deserialized: RouterNodeConfig = serde_json::from_str(&json).unwrap();
+
+        assert_eq!(deserialized.branches.len(), 2);
+        assert_eq!(deserialized.default, Some("done".into()));
+    }
+
+    #[test]
+    fn test_stop_conditions() {
+        let conditions = vec![
+            StopCondition::NoToolCalls,
+            StopCondition::OnTool {
+                tool_name: "submit".into(),
+            },
+            StopCondition::ContainsText {
+                pattern: "DONE".into(),
+            },
+            StopCondition::MaxIterations { count: 5 },
+        ];
+
+        let json = serde_json::to_string(&conditions).unwrap();
+        let deserialized: Vec<StopCondition> = serde_json::from_str(&json).unwrap();
+
+        assert_eq!(deserialized.len(), 4);
+        assert_eq!(deserialized[0], StopCondition::NoToolCalls);
+    }
+
+    #[test]
+    fn test_subagent_config() {
+        let config = SubAgentNodeConfig {
+            agent_name: "researcher".into(),
+            max_recursion: 3,
+            input_mapping: [("query".into(), "research.topic".into())].into(),
+            output_mapping: [("findings".into(), "research.findings".into())].into(),
+            timeout: Some(Duration::from_secs(300)),
+        };
+
+        assert_eq!(config.agent_name, "researcher");
+        assert_eq!(config.max_recursion, 3);
+    }
+
+    #[test]
+    fn test_fanout_fanin_config() {
+        let fanout = FanOutNodeConfig {
+            targets: vec!["a".into(), "b".into(), "c".into()],
+            split_strategy: SplitStrategy::Broadcast,
+            split_path: None,
+        };
+
+        let fanin = FanInNodeConfig {
+            sources: vec!["a".into(), "b".into(), "c".into()],
+            merge_strategy: MergeStrategy::Collect,
+            result_path: Some("results".into()),
+            timeout: Some(Duration::from_secs(60)),
+        };
+
+        assert_eq!(fanout.targets, fanin.sources);
+    }
+
+    #[test]
+    fn test_branch_conditions() {
+        let conditions = vec![
+            BranchCondition::Equals {
+                value: serde_json::json!("active"),
+            },
+            BranchCondition::In {
+                values: vec![serde_json::json!(1), serde_json::json!(2)],
+            },
+            BranchCondition::Matches {
+                pattern: "^done.*".into(),
+            },
+            BranchCondition::IsTruthy,
+            BranchCondition::Always,
+        ];
+
+        for condition in &conditions {
+            let json = serde_json::to_string(condition).unwrap();
+            let _: BranchCondition = serde_json::from_str(&json).unwrap();
+        }
+    }
+
+    #[test]
+    fn test_node_kind_variants() {
+        // Ensure all 7 variants can be created
+        let _agent = NodeKind::Agent(Default::default());
+        let _tool = NodeKind::Tool(Default::default());
+        let _router = NodeKind::Router(Default::default());
+        let _subagent = NodeKind::SubAgent(Default::default());
+        let _fanout = NodeKind::FanOut(Default::default());
+        let _fanin = NodeKind::FanIn(Default::default());
+        let _passthrough = NodeKind::Passthrough;
+
+        // Ensure default is Passthrough
+        assert!(matches!(NodeKind::default(), NodeKind::Passthrough));
+    }
+}
--- a/rust-research-agent/crates/rig-deepagents/src/workflow/vertices/agent.rs
+++ b/rust-research-agent/crates/rig-deepagents/src/workflow/vertices/agent.rs
@@ -0,0 +1,354 @@
+//! AgentVertex: LLM-based agent node with tool calling capabilities
+//!
+//! Implements the Vertex trait for agent nodes that use LLMs to process
+//! messages and can iteratively call tools until a stop condition is met.
+
+use async_trait::async_trait;
+use std::sync::Arc;
+
+use crate::llm::{LLMConfig, LLMProvider};
+use crate::middleware::ToolDefinition;
+use crate::pregel::error::PregelError;
+use crate::pregel::message::WorkflowMessage;
+use crate::pregel::state::WorkflowState;
+use crate::pregel::vertex::{ComputeContext, ComputeResult, StateUpdate, Vertex, VertexId};
+use crate::state::{Message, Role};
+use crate::workflow::node::{AgentNodeConfig, StopCondition};
+
+/// An agent vertex that uses an LLM to process messages and call tools
+pub struct AgentVertex<S: WorkflowState> {
+    id: VertexId,
+    config: AgentNodeConfig,
+    llm: Arc<dyn LLMProvider>,
+    tools: Vec<ToolDefinition>,
+    _phantom: std::marker::PhantomData<S>,
+}
+
+impl<S: WorkflowState> AgentVertex<S> {
+    /// Create a new agent vertex
+    pub fn new(
+        id: impl Into<VertexId>,
+        config: AgentNodeConfig,
+        llm: Arc<dyn LLMProvider>,
+        tools: Vec<ToolDefinition>,
+    ) -> Self {
+        Self {
+            id: id.into(),
+            config,
+            llm,
+            tools,
+            _phantom: std::marker::PhantomData,
+        }
+    }
+
+    /// Check if any stop condition is met
+    fn check_stop_conditions(&self, message: &Message, iteration: usize) -> bool {
+        for condition in &self.config.stop_conditions {
+            match condition {
+                StopCondition::NoToolCalls => {
+                    if message.tool_calls.is_none() || message.tool_calls.as_ref().unwrap().is_empty() {
+                        return true;
+                    }
+                }
+                StopCondition::OnTool { tool_name } => {
+                    if let Some(tool_calls) = &message.tool_calls {
+                        if tool_calls.iter().any(|tc| &tc.name == tool_name) {
+                            return true;
+                        }
+                    }
+                }
+                StopCondition::ContainsText { pattern } => {
+                    if message.content.contains(pattern) {
+                        return true;
+                    }
+                }
+                StopCondition::MaxIterations { count } => {
+                    if iteration >= *count {
+                        return true;
+                    }
+                }
+                StopCondition::StateMatch { .. } => {
+                    // TODO: Implement state matching
+                    continue;
+                }
+            }
+        }
+        false
+    }
+
+    /// Filter tools based on allowed list
+    fn filter_tools(&self) -> Vec<ToolDefinition> {
+        if let Some(allowed) = &self.config.allowed_tools {
+            self.tools
+                .iter()
+                .filter(|t| allowed.contains(&t.name))
+                .cloned()
+                .collect()
+        } else {
+            self.tools.clone()
+        }
+    }
+
+    /// Build LLM config from agent config
+    fn build_llm_config(&self) -> Option<LLMConfig> {
+        self.config.temperature.map(|temp| LLMConfig::new("").with_temperature(temp as f64))
+    }
+}
+
+#[async_trait]
+impl<S: WorkflowState> Vertex<S, WorkflowMessage> for AgentVertex<S> {
+    fn id(&self) -> &VertexId {
+        &self.id
+    }
+
+    async fn compute(
+        &self,
+        ctx: &mut ComputeContext<'_, S, WorkflowMessage>,
+    ) -> Result<ComputeResult<S::Update>, PregelError> {
+        // Build message history starting with system prompt
+        let mut messages = vec![Message {
+            role: Role::System,
+            content: self.config.system_prompt.clone(),
+            tool_calls: None,
+            tool_call_id: None,
+        }];
+
+        // Add any incoming workflow messages as user messages
+        for msg in ctx.messages {
+            if let WorkflowMessage::Data { key: _, value } = msg {
+                messages.push(Message {
+                    role: Role::User,
+                    content: value.to_string(),
+                    tool_calls: None,
+                    tool_call_id: None,
+                });
+            }
+        }
+
+        // If no user messages, add a default activation message
+        if messages.len() == 1 {
+            messages.push(Message {
+                role: Role::User,
+                content: "Begin processing.".to_string(),
+                tool_calls: None,
+                tool_call_id: None,
+            });
+        }
+
+        let filtered_tools = self.filter_tools();
+        let llm_config = self.build_llm_config();
+
+        // Agent loop: iterate until stop condition or max iterations
+        for iteration in 0..self.config.max_iterations {
+            // Call LLM
+            let response = self
+                .llm
+                .complete(&messages, &filtered_tools, llm_config.as_ref())
+                .await
+                .map_err(|e| PregelError::VertexError {
+                    vertex_id: self.id.clone(),
+                    message: e.to_string(),
+                    source: Some(Box::new(e)),
+                })?;
+
+            let assistant_message = response.message.clone();
+            messages.push(assistant_message.clone());
+
+            // Check stop conditions
+            if self.check_stop_conditions(&assistant_message, iteration) {
+                // Send final response as output message
+                ctx.send_message(
+                    "output",
+                    WorkflowMessage::Data {
+                        key: "response".to_string(),
+                        value: serde_json::Value::String(assistant_message.content),
+                    },
+                );
+                return Ok(ComputeResult::halt(S::Update::empty()));
+            }
+
+            // If there are tool calls, execute them
+            if let Some(tool_calls) = &assistant_message.tool_calls {
+                for tool_call in tool_calls {
+                    // TODO: Execute tool calls
+                    // For now, just add a mock tool result
+                    messages.push(Message::tool(
+                        "Tool executed successfully",
+                        &tool_call.id,
+                    ));
+                }
+            } else {
+                // No tool calls and no stop condition matched, halt anyway
+                ctx.send_message(
+                    "output",
+                    WorkflowMessage::Data {
+                        key: "response".to_string(),
+                        value: serde_json::Value::String(assistant_message.content),
+                    },
+                );
+                return Ok(ComputeResult::halt(S::Update::empty()));
+            }
+        }
+
+        // Max iterations reached
+        Err(PregelError::vertex_error(
+            self.id.clone(),
+            "Max iterations reached",
+        ))
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::error::DeepAgentError;
+    use crate::llm::LLMResponse;
+    use crate::pregel::state::UnitState;
+    use crate::pregel::vertex::VertexState;
+    use crate::state::ToolCall;
+    use std::sync::Mutex;
+
+    // Mock LLM provider for testing
+    struct MockLLMProvider {
+        responses: Arc<Mutex<Vec<Message>>>,
+    }
+
+    impl MockLLMProvider {
+        fn new() -> Self {
+            Self {
+                responses: Arc::new(Mutex::new(Vec::new())),
+            }
+        }
+
+        fn with_response(self, content: impl Into<String>) -> Self {
+            let message = Message {
+                role: Role::Assistant,
+                content: content.into(),
+                tool_calls: None,
+                tool_call_id: None,
+            };
+            self.responses.lock().unwrap().push(message);
+            self
+        }
+
+        fn with_tool_call(self, content: impl Into<String>, tool_name: impl Into<String>) -> Self {
+            let message = Message {
+                role: Role::Assistant,
+                content: content.into(),
+                tool_calls: Some(vec![ToolCall {
+                    id: "test_call_1".to_string(),
+                    name: tool_name.into(),
+                    arguments: serde_json::json!({}),
+                }]),
+                tool_call_id: None,
+            };
+            self.responses.lock().unwrap().push(message);
+            self
+        }
+    }
+
+    #[async_trait]
+    impl LLMProvider for MockLLMProvider {
+        async fn complete(
+            &self,
+            _messages: &[Message],
+            _tools: &[ToolDefinition],
+            _config: Option<&LLMConfig>,
+        ) -> Result<LLMResponse, DeepAgentError> {
+            let mut responses = self.responses.lock().unwrap();
+            if responses.is_empty() {
+                return Err(DeepAgentError::AgentExecution(
+                    "No more mock responses".to_string(),
+                ));
+            }
+            let message = responses.remove(0);
+            Ok(LLMResponse::new(message))
+        }
+
+        fn name(&self) -> &str {
+            "mock"
+        }
+
+        fn default_model(&self) -> &str {
+            "mock-model"
+        }
+    }
+
+    #[tokio::test]
+    async fn test_agent_vertex_single_response() {
+        let mock_llm = MockLLMProvider::new().with_response("Hello! How can I help?");
+
+        let vertex = AgentVertex::<UnitState>::new(
+            "agent",
+            AgentNodeConfig {
+                system_prompt: "You are helpful.".into(),
+                stop_conditions: vec![StopCondition::NoToolCalls],
+                ..Default::default()
+            },
+            Arc::new(mock_llm),
+            vec![],
+        );
+
+        let mut ctx =
+            ComputeContext::<UnitState, WorkflowMessage>::new("agent".into(), &[], 0, &UnitState);
+
+        let result = vertex.compute(&mut ctx).await.unwrap();
+
+        assert_eq!(result.state, VertexState::Halted);
+        assert!(ctx.has_messages() || !ctx.into_outbox().is_empty());
+    }
+
+    #[tokio::test]
+    async fn test_agent_vertex_stop_on_tool() {
+        let mock_llm = MockLLMProvider::new().with_tool_call("Let me search for that", "search");
+
+        let vertex = AgentVertex::<UnitState>::new(
+            "agent",
+            AgentNodeConfig {
+                system_prompt: "You are a researcher.".into(),
+                stop_conditions: vec![StopCondition::OnTool {
+                    tool_name: "search".to_string(),
+                }],
+                ..Default::default()
+            },
+            Arc::new(mock_llm),
+            vec![],
+        );
+
+        let mut ctx =
+            ComputeContext::<UnitState, WorkflowMessage>::new("agent".into(), &[], 0, &UnitState);
+
+        let result = vertex.compute(&mut ctx).await.unwrap();
+
+        assert_eq!(result.state, VertexState::Halted);
+    }
+
+    #[tokio::test]
+    async fn test_agent_vertex_max_iterations() {
+        // Mock LLM that always returns tool calls (would loop forever without limit)
+        let mut mock_llm = MockLLMProvider::new();
+        for _ in 0..15 {
+            mock_llm = mock_llm.with_tool_call("Still thinking...", "think");
+        }
+
+        let vertex = AgentVertex::<UnitState>::new(
+            "agent",
+            AgentNodeConfig {
+                system_prompt: "You are helpful.".into(),
+                max_iterations: 3,
+                stop_conditions: vec![], // No stop conditions, relies on max_iterations
+                ..Default::default()
+            },
+            Arc::new(mock_llm),
+            vec![],
+        );
+
+        let mut ctx =
+            ComputeContext::<UnitState, WorkflowMessage>::new("agent".into(), &[], 0, &UnitState);
+
+        let result = vertex.compute(&mut ctx).await;
+
+        // Should hit max iterations and return error
+        assert!(result.is_err());
+    }
+}
--- a/rust-research-agent/crates/rig-deepagents/src/workflow/vertices/mod.rs
+++ b/rust-research-agent/crates/rig-deepagents/src/workflow/vertices/mod.rs
@@ -0,0 +1,11 @@
+//! Vertex implementations for workflow nodes
+//!
+//! Each vertex type implements the Vertex trait and corresponds to a NodeKind variant.
+
+pub mod agent;
+
+// Future vertex implementations:
+// pub mod tool;
+// pub mod router;
+// pub mod subagent;
+// pub mod parallel;