Update docs

2026-01-13 18:57:35 +09:00
parent fa67114806
commit 6a4f3dcdd9
6 changed files with 631 additions and 144 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -1,43 +1,122 @@
-# Repository Guidelines
+# PROJECT KNOWLEDGE BASE

-## Project Structure & Module Organization
- `research_agent/` contains the core Python agents, prompts, tools, and subagent utilities.
- `skills/` holds project-level skills as `SKILL.md` files (YAML frontmatter + instructions).
- `research_workspace/` is the agent’s working filesystem for generated outputs; keep it clean or example-only.
- `deep-agents-ui/` is the Next.js/React UI with source under `deep-agents-ui/src/`.
- `deepagents_sourcecode/` vendors upstream library sources for reference and comparison.
- `rust-research-agent/` is a standalone Rust tutorial agent with its own build/test flow.
- `langgraph.json` defines the LangGraph deployment entrypoint for the research agent.
+**Generated:** 2026-01-13
+
+---
+
+## OVERVIEW
+
+Multi-agent research system demonstrating **FileSystem-based Context Engineering** using LangChain's DeepAgents framework. Includes Python orchestrator, Rust port via Rig framework, and Next.js chat UI.
+
+---
+
+## STRUCTURE

-## Build, Test, and Development Commands
-Use the UI commands from `deep-agents-ui/` when working on the frontend:
-```bash
-cd deep-agents-ui && yarn install   # install deps
-cd deep-agents-ui && yarn dev       # run local UI
-cd deep-agents-ui && yarn build     # production build
-cd deep-agents-ui && yarn lint      # eslint checks
-cd deep-agents-ui && yarn format    # prettier format
 ```
-Python tooling is configured in `pyproject.toml` (ruff + mypy):
-```bash
-uv run ruff format .
-uv run ruff check .
-uv run mypy .
+/
+  research_agent/          # Python DeepAgent orchestrator (see AGENTS.md)
+  context_engineering_research_agent/  # Extended agent with 5 strategies
+  deep-agents-ui/          # Next.js React frontend (see AGENTS.md)
+  rust-research-agent/     # Rust implementation (see AGENTS.md)
+    rig-deepagents/        # Pregel-based middleware runtime
+    rig-rlm/               # Recursive Language Model agent
+  tests/                   # pytest test suite (see AGENTS.md)
+  skills/                  # Project-level skills (SKILL.md per skill)
+  research_workspace/      # Agent output directory (ephemeral)
+  deepagents_sourcecode/   # Vendor: upstream library reference
 ```

-## Coding Style & Naming Conventions
- Python: follow ruff defaults and Google-style docstrings (see `pyproject.toml`); prefer `snake_case` modules and functions.
- TypeScript/React: keep `PascalCase` for components, `camelCase` for hooks/utilities; rely on ESLint + Prettier (Tailwind plugin).
- Skill definitions: keep one skill per directory with a `SKILL.md` entrypoint and clear, task-focused naming.
+---

-## Testing Guidelines
- There are no repository-wide tests for `research_agent/` yet; add `pytest` tests when introducing new logic.
- Subprojects have their own suites: see `deepagents_sourcecode/libs/*/Makefile` and `rust-research-agent/README.md` for `make test` or `cargo test`.
+## WHERE TO LOOK

-## Commit & Pull Request Guidelines
- Git history uses short, descriptive messages in English or Korean with no enforced prefix; keep summaries concise and imperative.
- For PRs, include: a brief summary, testing notes (or “not run”), linked issues, and UI screenshots for frontend changes.
+| Task | Location | Notes |
+|------|----------|-------|
+| Modify orchestrator | `research_agent/agent.py` | SubAgent assembly, tools, middleware |
+| Add research tool | `research_agent/tools.py` | `tavily_search`, `think_tool` |
+| Autonomous researcher logic | `research_agent/researcher/` | Three-phase workflow |
+| Context strategies | `context_engineering_research_agent/context_strategies/` | 5 patterns |
+| Frontend components | `deep-agents-ui/src/app/components/` | Chat UI |
+| Rust Pregel runtime | `rust-research-agent/rig-deepagents/src/pregel/` | Graph execution |
+| Rust middleware | `rust-research-agent/rig-deepagents/src/middleware/` | Tool injection |
+| Add new skill | `skills/{skill-name}/SKILL.md` | YAML frontmatter + instructions |

-## Configuration & Secrets
- Copy `env.example` to `.env` for API keys; never commit secrets.
- UI-only keys can be set via `NEXT_PUBLIC_LANGSMITH_API_KEY` in `deep-agents-ui/`.
+---
+
+## CONVENTIONS
+
+### Deviations from Standard Patterns
+
+- **Backend factory pattern**: Always use `backend_factory(rt: ToolRuntime)` - middleware depends on this signature
+- **SubAgent naming**: Use `researcher`, `explorer`, `synthesizer` - hardcoded in prompts
+- **File paths**: Paths starting with "/" route to `research_workspace/`; others are in-memory
+- **Korean comments**: Docstrings and some comments in Korean (bilingual codebase)
+
+---
+
+## ANTI-PATTERNS
+
+- **DO NOT** commit `.env` files (contains API keys)
+- **DO NOT** instantiate researcher directly - use `get_researcher_subagent()`
+- **DO NOT** skip `think_tool()` between searches - explicit reflection required
+- **NEVER** write raw JSON to user - always format responses (see `deepagents_cli/tools.py`)
+- **NEVER** lie to exit early - complete TODO items fully (see `researcher/runner.py`)
+
+---
+
+## COMMANDS
+
+### Python Development
+
+```bash
+uv sync                           # Install dependencies
+langgraph dev                     # Start backend (port 2024)
+uv run ruff format .              # Format code
+uv run ruff check .               # Lint
+uv run mypy .                     # Type check
+uv run pytest tests/              # Run tests
+```
+
+### Frontend Development
+
+```bash
+cd deep-agents-ui
+yarn install && yarn dev          # Dev server (port 3000)
+yarn build                        # Production build
+yarn lint && yarn format          # Lint + format
+```
+
+### Rust Development
+
+```bash
+cd rust-research-agent/rig-deepagents
+cargo test                        # Run tests (159)
+cargo clippy -- -D warnings       # Lint (strict)
+cargo build --features checkpointer-sqlite  # Build with features
+```
+
+---
+
+## ENVIRONMENT VARIABLES
+
+Copy `env.example` to `.env`:
+
+| Variable | Required | Purpose |
+|----------|----------|---------|
+| `OPENAI_API_KEY` | Yes | gpt-4.1 model |
+| `TAVILY_API_KEY` | Yes | Web search |
+| `LANGSMITH_API_KEY` | No | Tracing (`lsv2_pt_...`) |
+| `ANTHROPIC_API_KEY` | No | Claude models + caching |
+
+---
+
+## SUBDIRECTORY KNOWLEDGE
+
+See AGENTS.md files in:
+- `research_agent/AGENTS.md` - Orchestrator details
+- `context_engineering_research_agent/AGENTS.md` - Context strategies
+- `deep-agents-ui/AGENTS.md` - Frontend architecture
+- `tests/AGENTS.md` - Test organization
+- `rust-research-agent/AGENTS.md` - Rust overview (3 tiers)
+- `rust-research-agent/rig-deepagents/AGENTS.md` - Middleware architecture
+- `rust-research-agent/rig-rlm/AGENTS.md` - Recursive LLM pattern
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -6,9 +6,8 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

 A multi-agent research system demonstrating **FileSystem-based Context Engineering** using LangChain's DeepAgents framework. The system includes:
 - **Python DeepAgents**: LangChain-based multi-agent orchestration with web research capabilities
- **Rust `rig-deepagents`**: A port/reimagining using the Rig framework with Pregel-inspired graph execution
-
-The system enables agents to conduct web research, delegate tasks to sub-agents, and generate comprehensive reports with persistent filesystem state.
+- **Context Engineering Module**: Experimental platform with 5 context optimization strategies
+- **Rust `rig-deepagents`**: Pregel-inspired graph execution runtime using Rig framework

 ## Development Commands

@@ -22,11 +21,16 @@ uv sync
 langgraph dev

 # Linting and formatting
-ruff check research_agent/
-ruff format research_agent/
+uv run ruff check .
+uv run ruff format .

 # Type checking
-mypy research_agent/
+uv run mypy .
+
+# Run tests
+uv run pytest tests/                      # All tests
+uv run pytest tests/test_agent.py -v      # Single file
+uv run pytest -k "test_researcher" -v     # Pattern match
 ```

 ### Frontend UI (deep-agents-ui/)
@@ -40,29 +44,19 @@ yarn lint         # ESLint
 yarn format       # Prettier
 ```

-### Interactive Notebook Development
+### Rust `rig-deepagents`

 ```bash
-# Open Jupyter for interactive agent testing
-jupyter notebook DeepAgent_research.ipynb
-```
+cd rust-research-agent/rig-deepagents

-The `research_agent/utils.py` module provides Rich-formatted display helpers for notebooks:
- `format_messages(messages)` - Renders messages with colored panels (Human=blue, AI=green, Tool=yellow)
- `show_prompt(text, title)` - Displays prompts with XML/header syntax highlighting
-
-### Rust `rig-deepagents` Crate
-
-```bash
-cd rust-research-agent/crates/rig-deepagents
-
-# Run all tests (159 tests)
+# Run all tests
 cargo test

 # Run tests for a specific module
 cargo test pregel::          # Pregel runtime tests
 cargo test workflow::        # Workflow node tests
 cargo test middleware::      # Middleware tests
+cargo test checkpointing     # Checkpointing tests

 # Linting (strict, treats warnings as errors)
 cargo clippy -- -D warnings
@@ -82,17 +76,18 @@ cargo build --features checkpointer-postgres
 ## Required Environment Variables

 Copy `env.example` to `.env`:
- `OPENAI_API_KEY` - For gpt-4.1 model
- `TAVILY_API_KEY` - For web search functionality
- `LANGSMITH_API_KEY` - Optional, format `lsv2_pt_...` for tracing
- `LANGSMITH_TRACING` / `LANGSMITH_PROJECT` - Optional tracing config
+
+| Variable | Required | Purpose |
+|----------|----------|---------|
+| `OPENAI_API_KEY` | Yes | gpt-4.1 model |
+| `TAVILY_API_KEY` | Yes | Web search |
+| `ANTHROPIC_API_KEY` | No | Claude models + prompt caching |
+| `LANGSMITH_API_KEY` | No | Tracing (`lsv2_pt_...`) |

 ## Architecture

 ### Multi-SubAgent System

-The system uses a three-tier agent hierarchy with two distinct SubAgent types:
-
 ```
 Main Orchestrator Agent (agent.py)
    │
@@ -113,52 +108,36 @@ Main Orchestrator Agent (agent.py)
 | CompiledSubAgent | `{"runnable": CompiledStateGraph}` | Multi-turn autonomous | Complex research with self-planning |
 | Simple SubAgent | `{"system_prompt": str}` | Single response | Quick tasks, file ops |

-### Core Components
+### Context Engineering Strategies (context_engineering_research_agent/)

-**`research_agent/agent.py`** - Orchestrator configuration:
- LLM: `ChatOpenAI(model="gpt-4.1", temperature=0.0)`
- Creates researcher via `get_researcher_subagent()` (CompiledSubAgent)
- Defines `explorer_agent`, `synthesizer_agent` (Simple SubAgents)
- Assembles `ALL_SUBAGENTS = [researcher_subagent, *SIMPLE_SUBAGENTS]`
+Five strategies for optimizing LLM context window usage:

-**`research_agent/researcher/`** - Autonomous researcher module:
- `agent.py`: `create_researcher_agent()` factory and `get_researcher_subagent()` wrapper
- `prompts.py`: `AUTONOMOUS_RESEARCHER_INSTRUCTIONS` with three-phase workflow (Exploratory → Directed → Synthesis)
+| Strategy | File | Trigger |
+|----------|------|---------|
+| **Offloading** | `context_strategies/offloading.py` | Tool result > 20,000 tokens |
+| **Reduction** | `context_strategies/reduction.py` | Context usage > 85% |
+| **Retrieval** | grep/glob/read_file tools | Always available |
+| **Isolation** | SubAgent `task()` tool | Complex subtasks |
+| **Caching** | `context_strategies/caching.py` | Anthropic provider |

-**Backend Factory Pattern** - The `backend_factory(rt: ToolRuntime)` function demonstrates the recommended pattern:
+Middleware stack order matters: Offloading → Reduction → Caching → Telemetry
+
+### Backend Factory Pattern
+
+The `backend_factory(rt: ToolRuntime)` function demonstrates the recommended pattern:
 ```python
 CompositeBackend(
    default=StateBackend(rt),      # In-memory state (temporary files)
    routes={"/": fs_backend}       # Route "/" paths to FilesystemBackend
 )
 ```
-This enables routing: paths starting with "/" go to persistent local filesystem (`research_workspace/`), others use ephemeral state.
-
-**`research_agent/prompts.py`** - Prompt templates:
- `RESEARCH_WORKFLOW_INSTRUCTIONS` - Main workflow (plan → save → delegate → synthesize → write → verify)
- `SUBAGENT_DELEGATION_INSTRUCTIONS` - When to parallelize (comparisons) vs single agent (overviews)
- `EXPLORER_INSTRUCTIONS` - Fast read-only exploration with filesystem tools
- `SYNTHESIZER_INSTRUCTIONS` - Multi-source integration with confidence levels
-
-**`research_agent/tools.py`** - Research tools:
- `tavily_search(query, max_results, topic)` - Searches web, fetches full page content, converts to markdown
- `think_tool(reflection)` - Explicit reflection step for deliberate research
-
-**`langgraph.json`** - Deployment config pointing to `./research_agent/agent.py:agent`
-
-### Context Engineering Pattern
-
-The filesystem acts as long-term memory:
-1. Agent reads/writes files in virtual `research_workspace/`
-2. Structured outputs: reports, TODOs, request files
-3. Middleware auto-injects filesystem and sub-agent tools
-4. Automatic context summarization for token efficiency
+Paths starting with "/" go to persistent local filesystem (`research_workspace/`), others use ephemeral state.

 ### DeepAgents Auto-Injected Tools

 The `create_deep_agent()` function automatically adds these tools via middleware:
 - **TodoListMiddleware**: `write_todos` - Task planning and progress tracking
- **FilesystemMiddleware**: `ls`, `read_file`, `write_file`, `edit_file`, `glob`, `grep` - File operations
+- **FilesystemMiddleware**: `ls`, `read_file`, `write_file`, `edit_file`, `glob`, `grep`
 - **SubAgentMiddleware**: `task` - Delegate work to sub-agents
 - **SkillsMiddleware**: Progressive skill disclosure via `skills/` directory

@@ -166,60 +145,65 @@ Custom tools (`tavily_search`, `think_tool`) are added explicitly in `agent.py`.

 ### Skills System

-Project-level skills are located in `PROJECT_ROOT/skills/`:
- `academic-search/` - arXiv paper search with structured output
- `data-synthesis/` - Multi-source data integration and analysis
+Project-level skills in `skills/`:
+- `academic-search/` - arXiv paper search
+- `data-synthesis/` - Multi-source data integration
 - `report-writing/` - Structured report generation
 - `skill-creator/` - Meta-skill for creating new skills

-Each skill has a `SKILL.md` file with YAML frontmatter (name, description) and detailed instructions. The SkillsMiddleware uses Progressive Disclosure: only skill metadata is injected into the system prompt at session start; full skill content is read on-demand when needed.
-
-### Research Workflow
-
-**Orchestrator workflow:**
-```
-Plan → Save Request → Delegate to Sub-agents → Synthesize → Write Report → Verify
-```
-
-**Autonomous Researcher workflow (breadth-first, then depth):**
-```
-Phase 1: Exploratory Search (1-2 searches) → Identify directions
-Phase 2: Directed Research (1-2 searches per direction) → Deep dive
-Phase 3: Synthesis → Combine findings with source agreement analysis
-```
-
-Sub-agents operate with token budgets (5-6 max searches) and explicit reflection loops (Search → think_tool → Decide → Repeat).
+Each skill has `SKILL.md` with YAML frontmatter. SkillsMiddleware uses Progressive Disclosure: only metadata injected at session start, full content read on-demand.

 ## Rust `rig-deepagents` Architecture

-The Rust crate provides a Pregel-inspired graph execution runtime for agent workflows.
+Pregel-inspired graph execution runtime for agent workflows.

 ### Module Structure

 ```
-rust-research-agent/crates/rig-deepagents/src/
+rust-research-agent/rig-deepagents/src/
 ├── lib.rs              # Library entry point and re-exports
 ├── pregel/             # Pregel Runtime (graph execution engine)
-│   ├── runtime.rs      # Superstep orchestration, workflow timeout, retry policies
+│   ├── runtime.rs      # Superstep orchestration, CheckpointingRuntime
 │   ├── vertex.rs       # Vertex trait and compute context
 │   ├── message.rs      # Inter-vertex message passing
 │   ├── config.rs       # PregelConfig, RetryPolicy
 │   ├── checkpoint/     # Fault tolerance via checkpointing
-│   │   ├── mod.rs      # Checkpointer trait and factory
-│   │   └── file.rs     # FileCheckpointer implementation
-│   └── state.rs        # WorkflowState trait, UnitState
+│   │   ├── mod.rs      # Checkpointer trait
+│   │   ├── file.rs     # FileCheckpointer
+│   │   ├── sqlite.rs   # SQLiteCheckpointer
+│   │   ├── redis.rs    # RedisCheckpointer
+│   │   └── postgres.rs # PostgresCheckpointer
+│   └── state.rs        # WorkflowState trait
 ├── workflow/           # Workflow Builder DSL
-│   ├── node.rs         # NodeKind (Agent, Tool, Router, SubAgent, FanOut/FanIn)
-│   └── mod.rs          # WorkflowGraph builder API
+│   ├── compiled.rs     # CompiledWorkflow with checkpoint support
+│   ├── graph.rs        # WorkflowGraph builder API
+│   └── vertices/       # Node implementations (Agent, Tool, Router, etc.)
+├── compat/             # Rig Framework Compatibility Layer
+│   ├── rig_agent_adapter.rs  # RigAgentAdapter (primary LLM integration)
+│   └── rig_tool_adapter.rs   # RigToolAdapter for Rig Tool compatibility
 ├── middleware/         # AgentMiddleware trait and MiddlewareStack
+│   └── summarization/  # Token counting and context summarization
 ├── backends/           # Backend trait (Memory, Filesystem, Composite)
-├── llm/                # LLMProvider abstraction (OpenAI, Anthropic)
-└── tools/              # Tool implementations (read_file, write_file, grep, etc.)
+├── llm/                # LLMProvider abstraction (uses RigAgentAdapter)
+└── tools/              # Tool implementations (read_file, write_file, etc.)
 ```

-### Pregel Execution Model
+### LLM Integration

-The runtime executes workflows using synchronized supersteps:
+**Use `RigAgentAdapter`** to wrap Rig's native providers (OpenAI, Anthropic, etc.):
+
+```rust
+use rig::providers::openai::Client;
+use rig_deepagents::{RigAgentAdapter, AgentExecutor};
+
+let client = Client::from_env();
+let agent = client.agent("gpt-4").build();
+let provider = RigAgentAdapter::new(agent);
+```
+
+Legacy `OpenAIProvider` and `AnthropicProvider` have been removed.
+
+### Pregel Execution Model

 ```
 ┌─────────────────────────────────────────────────────────────┐
@@ -236,7 +220,7 @@ The runtime executes workflows using synchronized supersteps:

 - **Vertex**: Computation unit with `compute()` method (Agent, Tool, Router)
 - **Message**: Communication between vertices across supersteps
- **Checkpointing**: Fault tolerance via periodic state snapshots
+- **Checkpointing**: Fault tolerance via periodic state snapshots (File, SQLite, Redis, Postgres)
 - **Retry Policy**: Exponential backoff with configurable max retries

 ### Key Types
@@ -244,15 +228,9 @@ The runtime executes workflows using synchronized supersteps:
 | Type | Purpose |
 |------|---------|
 | `PregelRuntime<S, M>` | Executes workflow graph with state S and message M |
-| `Vertex<S, M>` | Trait for computation nodes |
-| `WorkflowState` | Trait for workflow state (must be serializable) |
-| `PregelConfig` | Runtime configuration (max supersteps, parallelism, timeout) |
-| `Checkpointer` | Trait for state persistence (Memory, File, SQLite, Redis, Postgres) |
-
-### Design Documents
-
- `docs/plans/2026-01-02-rig-deepagents-pregel-design.md` - Comprehensive Pregel runtime design
- `docs/plans/2026-01-02-rig-deepagents-implementation-tasks.md` - Implementation task breakdown
+| `CheckpointingRuntime<S, M>` | PregelRuntime with checkpoint/resume support |
+| `RigAgentAdapter` | Wraps any Rig Agent for LLMProvider compatibility |
+| `CompiledWorkflow` | Builder result with optional checkpointing |

 ## Key Files for Understanding the System

@@ -261,18 +239,43 @@ The runtime executes workflows using synchronized supersteps:
 2. `research_agent/researcher/agent.py` - Autonomous researcher factory (CompiledSubAgent pattern)
 3. `research_agent/researcher/prompts.py` - Three-phase autonomous workflow
 4. `research_agent/prompts.py` - Orchestrator and Simple SubAgent prompts
-5. `research_agent/tools.py` - Tool implementations
-6. `research_agent/skills/middleware.py` - SkillsMiddleware with progressive disclosure
+
+**Context Engineering:**
+5. `context_engineering_research_agent/agent.py` - Context-aware agent factory
+6. `context_engineering_research_agent/context_strategies/` - 5 optimization strategies

 **Rust rig-deepagents:**
-7. `rust-research-agent/crates/rig-deepagents/src/pregel/runtime.rs` - Pregel execution engine
-8. `rust-research-agent/crates/rig-deepagents/src/pregel/vertex.rs` - Vertex abstraction
-9. `rust-research-agent/crates/rig-deepagents/src/workflow/node.rs` - Node type definitions
-10. `rust-research-agent/crates/rig-deepagents/src/llm/provider.rs` - LLMProvider trait
+7. `rust-research-agent/rig-deepagents/src/pregel/runtime.rs` - Pregel + Checkpointing
+8. `rust-research-agent/rig-deepagents/src/compat/rig_agent_adapter.rs` - LLM integration
+9. `rust-research-agent/rig-deepagents/src/workflow/compiled.rs` - Workflow compilation

-**Documentation:**
-11. `DeepAgents_Technical_Guide.md` - Python DeepAgents reference (Korean)
-12. `docs/plans/2026-01-02-rig-deepagents-pregel-design.md` - Rust Pregel design
+## Critical Patterns
+
+### SubAgent Creation
+
+Always use factory functions, never instantiate directly:
+```python
+# Correct
+researcher_subagent = get_researcher_subagent()
+
+# Wrong - bypasses middleware setup
+researcher = create_researcher_agent()
+```
+
+### File Path Routing
+
+Paths starting with "/" persist to `research_workspace/`, others are in-memory:
+```python
+write_file("/reports/summary.md", content)  # Persists
+write_file("temp/scratch.txt", content)     # Ephemeral
+```
+
+### Reflection Loop
+
+Always use `think_tool()` between web searches - explicit reflection is required:
+```
+Search → think_tool() → Decide → Search → think_tool() → Synthesize
+```

 ## Tech Stack

--- a/context_engineering_research_agent/AGENTS.md
+++ b/context_engineering_research_agent/AGENTS.md
@@ -0,0 +1,111 @@
+# AGENTS.md - Context Engineering Research Agent
+
+> **Component**: `context_engineering_research_agent/`
+> **Type**: Extended DeepAgent with Context Strategies
+> **Role**: Experimental platform for 5 Context Engineering patterns
+
+---
+
+## 1. Module Purpose
+
+This module extends the base research agent with explicit **Context Engineering** strategies. It serves as a research testbed for optimizing LLM context window usage.
+
+---
+
+## 2. The 5 Context Engineering Strategies
+
+| Strategy | Implementation | Trigger |
+|----------|----------------|---------|
+| **Offloading** | `context_strategies/offloading.py` | Tool result > 20,000 tokens |
+| **Reduction** | `context_strategies/reduction.py` | Context usage > 85% |
+| **Retrieval** | grep/glob/read_file tools | Always available |
+| **Isolation** | SubAgent `task()` tool | Complex subtasks |
+| **Caching** | `context_strategies/caching.py` | Anthropic provider |
+
+---
+
+## 3. Key Files
+
+| File | Purpose |
+|------|---------|
+| `agent.py` | Main factory: `create_context_aware_agent()` |
+| `context_strategies/__init__.py` | Re-exports all strategy classes |
+| `context_strategies/offloading.py` | `ContextOffloadingStrategy` middleware |
+| `context_strategies/reduction.py` | `ContextReductionStrategy` middleware |
+| `context_strategies/caching.py` | `ContextCachingStrategy` + provider detection |
+| `context_strategies/caching_telemetry.py` | `PromptCachingTelemetryMiddleware` |
+| `context_strategies/isolation.py` | State isolation utilities for SubAgents |
+| `context_strategies/retrieval.py` | Selective file loading patterns |
+| `backends/docker_sandbox.py` | Sandboxed execution backend |
+| `backends/pyodide_sandbox.py` | Browser-based Python sandbox |
+
+---
+
+## 4. Agent Factory Pattern
+
+```python
+# Simple usage (defaults)
+agent = get_agent()
+
+# Customized configuration
+agent = create_context_aware_agent(
+    model="anthropic/claude-sonnet-4",
+    enable_offloading=True,
+    enable_reduction=True,
+    enable_caching=True,
+    offloading_token_limit=20000,
+    reduction_threshold=0.85,
+)
+```
+
+---
+
+## 5. Multi-Provider Support
+
+Provider detection is automatic via `detect_provider(model)`:
+
+| Provider | Features |
+|----------|----------|
+| Anthropic | Full cache_control markers |
+| OpenAI | Standard caching |
+| OpenRouter | Pass `openrouter_model_name` for specific routing |
+| Gemini | Standard caching |
+
+---
+
+## 6. Middleware Stack Order
+
+Middlewares execute in registration order. The recommended stack:
+
+```python
+middleware=[
+    ContextOffloadingStrategy,   # 1. Evict large results FIRST
+    ContextReductionStrategy,    # 2. Compress if still too large
+    ContextCachingStrategy,      # 3. Mark cacheable sections
+    PromptCachingTelemetryMiddleware,  # 4. Collect metrics
+]
+```
+
+**Order matters:** Offloading before reduction prevents unnecessary summarization.
+
+---
+
+## 7. Sandbox Backends
+
+For secure code execution:
+
+| Backend | Environment | Isolation Level |
+|---------|-------------|-----------------|
+| `DockerSandbox` | Container | High (network isolated) |
+| `PyodideSandbox` | WASM | Medium (browser-like) |
+| `DockerSession` | Persistent container | High + state persistence |
+
+---
+
+## 8. Testing
+
+Tests are in `tests/context_engineering/`:
+- `test_caching.py` - Cache strategy unit tests
+- `test_offloading.py` - Eviction threshold tests
+- `test_reduction.py` - Summarization trigger tests
+- `test_integration.py` - Full agent integration tests
--- a/deep-agents-ui/AGENTS.md
+++ b/deep-agents-ui/AGENTS.md
@@ -0,0 +1,118 @@
+# AGENTS.md - Deep Agents UI
+
+> **Component**: `deep-agents-ui/`
+> **Type**: Next.js 16 React Frontend
+> **Role**: Chat interface for LangGraph DeepAgents
+
+---
+
+## 1. Module Purpose
+
+React-based chat UI that connects to a LangGraph backend. Displays agent messages, tool calls, SubAgent activity, tasks/files sidebar, and handles human-in-the-loop interrupts.
+
+---
+
+## 2. Quick Start
+
+```bash
+yarn install
+yarn dev          # localhost:3000
+```
+
+Configure via Settings dialog:
+- **Deployment URL**: `http://127.0.0.1:2024` (LangGraph dev server)
+- **Assistant ID**: `research` (or UUID)
+
+---
+
+## 3. Directory Structure
+
+```
+src/
+  app/
+    page.tsx              # Main entry, config handling
+    layout.tsx            # Root layout with providers
+    components/
+      ChatInterface.tsx   # Message input/display area
+      ChatMessage.tsx     # Individual message rendering
+      ToolCallBox.tsx     # Tool invocation display
+      SubAgentIndicator.tsx  # Active SubAgent status
+      TasksFilesSidebar.tsx  # TODO list + file tree
+      ThreadList.tsx      # Conversation history
+      ToolApprovalInterrupt.tsx  # HITL approval UI
+      ConfigDialog.tsx    # Settings modal
+      FileViewDialog.tsx  # File content viewer
+      MarkdownContent.tsx # Markdown renderer
+
+  components/ui/          # Radix UI primitives (shadcn)
+  providers/
+    ChatProvider.tsx      # Chat state context
+    ClientProvider.tsx    # LangGraph SDK client
+  lib/
+    config.ts             # LocalStorage config persistence
+```
+
+---
+
+## 4. Key Components
+
+| Component | Function |
+|-----------|----------|
+| `ChatProvider` | Manages message state, streaming, thread lifecycle |
+| `ClientProvider` | Wraps `@langchain/langgraph-sdk` client |
+| `ChatInterface` | Main chat view with input area |
+| `ToolCallBox` | Renders tool name, args, result with syntax highlighting |
+| `SubAgentIndicator` | Shows which SubAgent is currently active |
+| `ToolApprovalInterrupt` | Human-in-the-loop approval/rejection UI |
+
+---
+
+## 5. State Management
+
+| State | Location | Persistence |
+|-------|----------|-------------|
+| Config | `lib/config.ts` | LocalStorage |
+| Thread ID | URL query param `?threadId=` | URL |
+| Sidebar | URL query param `?sidebar=` | URL |
+| Messages | `ChatProvider` context | Server (LangGraph) |
+
+---
+
+## 6. Styling
+
+- **TailwindCSS** with custom theme
+- **shadcn/ui** components (Radix primitives)
+- **Dark mode** via CSS variables
+
+---
+
+## 7. Development Commands
+
+```bash
+yarn dev          # Start dev server (port 3000)
+yarn build        # Production build
+yarn lint         # ESLint
+yarn format       # Prettier
+```
+
+---
+
+## 8. Backend Connection
+
+The UI connects to LangGraph API endpoints:
+- `POST /threads` - Create thread
+- `POST /threads/{id}/runs` - Stream messages
+- `GET /assistants/{id}` - Fetch assistant config
+
+Configured via `ClientProvider` with deployment URL and optional LangSmith API key.
+
+---
+
+## 9. Extension Points
+
+| Task | Where to Modify |
+|------|-----------------|
+| Add new message type | `ChatMessage.tsx` + type in `ChatProvider` |
+| Custom tool rendering | `ToolCallBox.tsx` |
+| New sidebar panel | `TasksFilesSidebar.tsx` |
+| Theme customization | `tailwind.config.js` + `globals.css` |
--- a/research_agent/AGENTS.md
+++ b/research_agent/AGENTS.md
@@ -0,0 +1,97 @@
+# AGENTS.md - Research Agent Module
+
+> **Component**: `research_agent/`
+> **Type**: Python DeepAgent Orchestrator
+> **Role**: Multi-SubAgent Research System with Skills Integration
+
+---
+
+## 1. Module Purpose
+
+This module implements the main research orchestrator using LangChain's DeepAgents framework. It coordinates three specialized SubAgents and integrates a project-level skills system.
+
+---
+
+## 2. Architecture: Three-Tier SubAgent System
+
+```
+Orchestrator (agent.py)
+    |
+    +-- researcher (CompiledSubAgent)
+    |       Autonomous, self-planning DeepAgent
+    |       "Breadth-first, then depth" research pattern
+    |
+    +-- explorer (Simple SubAgent)
+    |       Fast read-only filesystem exploration
+    |
+    +-- synthesizer (Simple SubAgent)
+            Multi-source result integration
+```
+
+### SubAgent Types
+
+| Type | Definition | Execution | Use Case |
+|------|------------|-----------|----------|
+| CompiledSubAgent | `{"runnable": CompiledStateGraph}` | Multi-turn autonomous | Complex research |
+| Simple SubAgent | `{"system_prompt": str}` | Single response | Quick tasks |
+
+---
+
+## 3. Key Files
+
+| File | Purpose |
+|------|---------|
+| `agent.py` | Orchestrator assembly: model, backend, SubAgents, middleware |
+| `prompts.py` | Orchestrator prompts: workflow, delegation, explorer, synthesizer |
+| `tools.py` | `tavily_search()`, `think_tool()` implementations |
+| `researcher/agent.py` | `get_researcher_subagent()` factory for CompiledSubAgent |
+| `researcher/prompts.py` | Three-phase autonomous workflow (Exploratory -> Directed -> Synthesis) |
+| `researcher/depth.py` | Research depth configuration (shallow/medium/deep) |
+| `researcher/ralph_loop.py` | Iterative refinement loop pattern |
+| `skills/middleware.py` | SkillsMiddleware with Progressive Disclosure |
+
+---
+
+## 4. Backend Configuration
+
+The module uses a **CompositeBackend** pattern:
+
+```python
+CompositeBackend(
+    default=StateBackend(rt),       # In-memory (temporary files)
+    routes={"/": fs_backend}        # "/" paths -> research_workspace/
+)
+```
+
+- Paths starting with "/" persist to `research_workspace/`
+- Other paths use ephemeral in-memory state
+
+---
+
+## 5. Skills System
+
+Skills are loaded from `PROJECT_ROOT/skills/` via `SkillsMiddleware`.
+
+**Progressive Disclosure Pattern:**
+1. Session start: Only skill metadata injected
+2. Agent request: Full SKILL.md content loaded on-demand
+3. Token efficiency: ~90% reduction in initial context
+
+---
+
+## 6. Anti-Patterns
+
+- **DO NOT** directly instantiate researcher - use `get_researcher_subagent()`
+- **DO NOT** skip `think_tool()` between searches - explicit reflection required
+- **DO NOT** modify `backend_factory` signature - middleware depends on it
+
+---
+
+## 7. Extension Points
+
+| Task | Where to Modify |
+|------|-----------------|
+| Add new SubAgent | Define in `agent.py`, add to `SIMPLE_SUBAGENTS` or `ALL_SUBAGENTS` |
+| New research tool | Add to `tools.py`, include in `create_deep_agent(tools=[...])` |
+| Custom middleware | Create in `skills/`, add to middleware list in `agent.py` |
+| Modify researcher behavior | Edit `researcher/prompts.py` |
--- a/tests/AGENTS.md
+++ b/tests/AGENTS.md
@@ -0,0 +1,79 @@
+# AGENTS.md - Test Suite
+
+> **Component**: `tests/`
+> **Type**: pytest Test Suite
+> **Role**: Unit and integration tests for all Python modules
+
+---
+
+## 1. Test Organization
+
+```
+tests/
+  context_engineering/    # Context strategy tests
+    test_caching.py       # Cache control marker tests
+    test_offloading.py    # Token eviction tests
+    test_reduction.py     # Summarization trigger tests
+    test_isolation.py     # SubAgent state isolation
+    test_retrieval.py     # Selective file loading
+    test_integration.py   # Full agent integration
+    test_openrouter_models.py  # Multi-provider tests
+
+  researcher/             # Research agent tests
+    test_depth.py         # Research depth configuration
+    test_ralph_loop.py    # Iterative refinement loop
+    test_runner.py        # Agent runner tests
+    test_tools.py         # Tool unit tests
+    test_integration.py   # End-to-end research flow
+
+  backends/               # Backend implementation tests
+    test_docker_sandbox_integration.py
+    conftest.py           # Shared fixtures
+```
+
+---
+
+## 2. Running Tests
+
+```bash
+# All tests
+uv run pytest tests/
+
+# Specific module
+uv run pytest tests/context_engineering/
+
+# Single test file
+uv run pytest tests/researcher/test_depth.py
+
+# With coverage
+uv run pytest --cov=research_agent tests/
+```
+
+---
+
+## 3. Key Fixtures
+
+Located in `conftest.py` files:
+- `mock_model` - Mocked LLM for unit tests
+- `temp_workspace` - Temporary filesystem backend
+- `docker_sandbox` - Docker container fixture (integration)
+
+---
+
+## 4. Test Categories
+
+| Category | Marker | Speed |
+|----------|--------|-------|
+| Unit | (default) | Fast |
+| Integration | `@pytest.mark.integration` | Slow |
+| Docker | `@pytest.mark.docker` | Requires Docker |
+| LLM | `@pytest.mark.llm` | Requires API keys |
+
+---
+
+## 5. Environment Variables
+
+For integration tests requiring real APIs:
+- `OPENAI_API_KEY` - OpenAI tests
+- `TAVILY_API_KEY` - Search tool tests
+- `ANTHROPIC_API_KEY` - Caching tests with Claude