deepagent/Context_Engineering_Research.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "4c897bc9",
   "metadata": {},
   "source": [
    "# Context Engineering 연구 노트북\n",
    "\n",
    "DeepAgents 라이브러리에서 사용되는 5가지 Context Engineering 전략을 분석하고 실험합니다.\n",
    "\n",
    "## 참고 자료\n",
    "\n",
    "- YouTube: https://www.youtube.com/watch?v=6_BcCthVvb8\n",
    "- PDF: Context Engineering Meetup.pdf\n",
    "- PDF: Manus Context Engineering LangChain Webinar.pdf\n",
    "\n",
    "## 문제 정의(왜 필요한가)\n",
    "\n",
    "- 에이전트는 도구 호출(tool calls)과 관찰(observations)이 누적되며 컨텍스트가 계속 성장합니다(수십~수백 턴).\n",
    "- 컨텍스트가 길어질수록 성능이 떨어질 수 있다는 관측이 있습니다(context rot).\n",
    "- 실패 모드: Poisoning / Distraction / Confusion / Clash\n",
    "\n",
    "## Manus 관점(모델 vs 앱 경계)\n",
    "\n",
    "- Context Engineering은 application과 model 사이의 실용적인 경계로 다뤄집니다.\n",
    "- ‘모델을 따로 학습/미세조정’에 먼저 뛰어들기보다, 컨텍스트 설계로 제품 반복 속도를 확보한다는 관점이 강조됩니다.\n",
    "\n",
    "## 추가 주제: Tool Offloading\n",
    "\n",
    "- 도구 자체도 컨텍스트를 더럽힐 수 있으므로, 계층적 액션 스페이스/도구 로딩 제한(필요한 도구만 노출)을 고려합니다.\n",
    "\n",
    "## Context Engineering 5가지 핵심 전략\n",
    "\n",
    "| 전략 | 설명 | DeepAgents 구현 |\n",
    "|------|------|----------------|\n",
    "| **1. Offloading** | 대용량 결과를 파일로 축출 | FilesystemMiddleware |\n",
    "| **2. Reduction** | Compaction + Summarization | SummarizationMiddleware |\n",
    "| **3. Retrieval** | grep/glob 기반 검색 | FilesystemMiddleware |\n",
    "| **4. Isolation** | SubAgent로 컨텍스트 격리 | SubAgentMiddleware |\n",
    "| **5. Caching** | Prompt Caching | AnthropicPromptCachingMiddleware |\n",
    "\n",
    "## 아키텍처 개요\n",
    "\n",
    "```\n",
    "┌─────────────────────────────────────────────────────────────────┐\n",
    "│                     Context Engineering                          │\n",
    "├─────────────────────────────────────────────────────────────────┤\n",
    "│                                                                  │\n",
    "│   ┌────────────┐    ┌────────────┐    ┌────────────┐            │\n",
    "│   │ Offloading │    │ Reduction  │    │  Caching   │            │\n",
    "│   │ (20k 토큰) │    │ (85% 임계) │    │ (Anthropic)│            │\n",
    "│   └─────┬──────┘    └─────┬──────┘    └─────┬──────┘            │\n",
    "│         │                 │                 │                    │\n",
    "│         ▼                 ▼                 ▼                    │\n",
    "│   ┌─────────────────────────────────────────────────────┐       │\n",
    "│   │              Middleware Stack                       │       │\n",
    "│   └─────────────────────────────────────────────────────┘       │\n",
    "│                          │                                       │\n",
    "│         ┌────────────────┼────────────────┐                     │\n",
    "│         ▼                ▼                ▼                     │\n",
    "│   ┌────────────┐  ┌────────────┐  ┌────────────┐               │\n",
    "│   │ Retrieval  │  │ Isolation  │  │  Backend   │               │\n",
    "│   │(grep/glob) │  │ (SubAgent) │  │ (FileSystem│               │\n",
    "│   └────────────┘  └────────────┘  └────────────┘               │\n",
    "│                                                                  │\n",
    "└─────────────────────────────────────────────────────────────────┘\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fecc3e39",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "from pathlib import Path\n",
    "\n",
    "from dotenv import load_dotenv\n",
    "\n",
    "load_dotenv(\".env\", override=True)\n",
    "\n",
    "PROJECT_ROOT = Path.cwd()\n",
    "if str(PROJECT_ROOT) not in sys.path:\n",
    "    sys.path.insert(0, str(PROJECT_ROOT))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "strategy1",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 전략 1: Context Offloading\n",
    "\n",
    "대용량 도구 결과를 파일시스템으로 축출하여 컨텍스트 윈도우 오버플로우를 방지합니다.\n",
    "\n",
    "### 핵심 원리\n",
    "- 도구 결과가 `tool_token_limit_before_evict` (기본 20,000 토큰) 초과 시 자동 축출\n",
    "- `/large_tool_results/{tool_call_id}` 경로에 저장\n",
    "- 처음 10줄 미리보기 제공\n",
    "- 에이전트가 `read_file`로 필요할 때 로드"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "offloading_demo",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "토큰 임계값: 20,000\n",
      "축출 경로: /large_tool_results\n",
      "미리보기 줄 수: 10\n"
     ]
    }
   ],
   "source": [
    "from context_engineering_research_agent.context_strategies.offloading import (\n",
    "    ContextOffloadingStrategy,\n",
    "    OffloadingConfig,\n",
    ")\n",
    "\n",
    "config = OffloadingConfig(\n",
    "    token_limit_before_evict=20000,\n",
    "    eviction_path_prefix=\"/large_tool_results\",\n",
    "    preview_lines=10,\n",
    ")\n",
    "\n",
    "print(f\"토큰 임계값: {config.token_limit_before_evict:,}\")\n",
    "print(f\"축출 경로: {config.eviction_path_prefix}\")\n",
    "print(f\"미리보기 줄 수: {config.preview_lines}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "offloading_test",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "짧은 콘텐츠: 600 자 → 축출 대상: False\n",
      "대용량 콘텐츠: 210,000 자 → 축출 대상: True\n"
     ]
    }
   ],
   "source": [
    "strategy = ContextOffloadingStrategy(config=config)\n",
    "\n",
    "small_content = \"짧은 텍스트\" * 100\n",
    "large_content = \"대용량 텍스트\" * 30000\n",
    "\n",
    "print(f\"짧은 콘텐츠: {len(small_content)} 자 → 축출 대상: {strategy._should_offload(small_content)}\")\n",
    "print(f\"대용량 콘텐츠: {len(large_content):,} 자 → 축출 대상: {strategy._should_offload(large_content)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "strategy2",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 전략 2: Context Reduction\n",
    "\n",
    "컨텍스트 윈도우 사용량이 임계값을 초과할 때 자동으로 대화 내용을 압축합니다.\n",
    "\n",
    "### 두 가지 기법\n",
    "\n",
    "| 기법 | 설명 | 비용 |\n",
    "|------|------|------|\n",
    "| **Compaction** | 오래된 도구 호출/결과 제거 | 무료 |\n",
    "| **Summarization** | LLM이 대화 요약 | API 비용 발생 |\n",
    "\n",
    "우선순위: Compaction → Summarization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "reduction_demo",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "임계값: 85.0%\n",
      "컨텍스트 윈도우: 200,000 토큰\n",
      "Compaction 대상 나이: 10 메시지\n",
      "최소 유지 메시지: 5\n"
     ]
    }
   ],
   "source": [
    "from context_engineering_research_agent.context_strategies.reduction import (\n",
    "    ContextReductionStrategy,\n",
    "    ReductionConfig,\n",
    ")\n",
    "\n",
    "config = ReductionConfig(\n",
    "    context_threshold=0.85,\n",
    "    model_context_window=200000,\n",
    "    compaction_age_threshold=10,\n",
    "    min_messages_to_keep=5,\n",
    ")\n",
    "\n",
    "print(f\"임계값: {config.context_threshold * 100}%\")\n",
    "print(f\"컨텍스트 윈도우: {config.model_context_window:,} 토큰\")\n",
    "print(f\"Compaction 대상 나이: {config.compaction_age_threshold} 메시지\")\n",
    "print(f\"최소 유지 메시지: {config.min_messages_to_keep}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "reduction_test",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "컨텍스트 사용률: 25.0%\n",
      "축소 필요: False\n"
     ]
    }
   ],
   "source": [
    "from langchain_core.messages import AIMessage, HumanMessage\n",
    "\n",
    "strategy = ContextReductionStrategy(config=config)\n",
    "\n",
    "messages = [\n",
    "    HumanMessage(content=\"안녕하세요\" * 1000),\n",
    "    AIMessage(content=\"안녕하세요\" * 1000),\n",
    "] * 20\n",
    "\n",
    "usage_ratio = strategy._get_context_usage_ratio(messages)\n",
    "print(f\"컨텍스트 사용률: {usage_ratio * 100:.1f}%\")\n",
    "print(f\"축소 필요: {strategy._should_reduce(messages)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "strategy3",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 전략 3: Context Retrieval\n",
    "\n",
    "grep/glob 기반의 단순하고 빠른 검색으로 필요한 정보만 선택적으로 로드합니다.\n",
    "\n",
    "### 벡터 검색을 사용하지 않는 이유\n",
    "\n",
    "| 특성 | 직접 검색 | 벡터 검색 |\n",
    "|------|----------|----------|\n",
    "| 결정성 | ✅ 정확한 매칭 | ❌ 확률적 |\n",
    "| 인프라 | ✅ 불필요 | ❌ 벡터 DB 필요 |\n",
    "| 속도 | ✅ 빠름 | ❌ 인덱싱 오버헤드 |\n",
    "| 디버깅 | ✅ 예측 가능 | ❌ 블랙박스 |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "retrieval_demo",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "기본 읽기 제한: 500 줄\n",
      "grep 최대 결과: 100\n",
      "glob 최대 결과: 100\n",
      "줄 길이 제한: 2000 자\n"
     ]
    }
   ],
   "source": [
    "from context_engineering_research_agent.context_strategies.retrieval import (\n",
    "    ContextRetrievalStrategy,\n",
    "    RetrievalConfig,\n",
    ")\n",
    "\n",
    "config = RetrievalConfig(\n",
    "    default_read_limit=500,\n",
    "    max_grep_results=100,\n",
    "    max_glob_results=100,\n",
    "    truncate_line_length=2000,\n",
    ")\n",
    "\n",
    "print(f\"기본 읽기 제한: {config.default_read_limit} 줄\")\n",
    "print(f\"grep 최대 결과: {config.max_grep_results}\")\n",
    "print(f\"glob 최대 결과: {config.max_glob_results}\")\n",
    "print(f\"줄 길이 제한: {config.truncate_line_length} 자\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "strategy4",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 전략 4: Context Isolation\n",
    "\n",
    "SubAgent를 통해 독립된 컨텍스트 윈도우에서 작업을 수행합니다.\n",
    "\n",
    "### 장점\n",
    "- 메인 에이전트 컨텍스트 오염 방지\n",
    "- 복잡한 작업의 격리 처리\n",
    "- 병렬 처리 가능\n",
    "\n",
    "### SubAgent 유형\n",
    "\n",
    "| 유형 | 구조 | 특징 |\n",
    "|------|------|------|\n",
    "| Simple | `{name, system_prompt, tools}` | 단일 응답 |\n",
    "| Compiled | `{name, runnable}` | 자체 DeepAgent, 다중 턴 |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "isolation_demo",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "기본 모델: gpt-4.1\n",
      "범용 에이전트 포함: True\n",
      "제외 상태 키: ('messages', 'todos', 'structured_response')\n"
     ]
    }
   ],
   "source": [
    "from context_engineering_research_agent.context_strategies.isolation import (\n",
    "    ContextIsolationStrategy,\n",
    "    IsolationConfig,\n",
    ")\n",
    "\n",
    "config = IsolationConfig(\n",
    "    default_model=\"gpt-4.1\",\n",
    "    include_general_purpose_agent=True,\n",
    "    excluded_state_keys=(\"messages\", \"todos\", \"structured_response\"),\n",
    ")\n",
    "\n",
    "print(f\"기본 모델: {config.default_model}\")\n",
    "print(f\"범용 에이전트 포함: {config.include_general_purpose_agent}\")\n",
    "print(f\"제외 상태 키: {config.excluded_state_keys}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "strategy5",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 전략 5: Context Caching\n",
    "\n",
    "Anthropic Prompt Caching을 활용하여 시스템 프롬프트와 반복 컨텍스트를 캐싱합니다.\n",
    "\n",
    "### 이점\n",
    "- API 호출 비용 절감\n",
    "- 응답 속도 향상\n",
    "- 동일 세션 내 반복 호출 최적화\n",
    "\n",
    "### 캐싱 조건\n",
    "- 최소 1,024 토큰 이상\n",
    "- `cache_control: {\"type\": \"ephemeral\"}` 마커 추가"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "caching_demo",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "최소 캐싱 토큰: 1,024\n",
      "캐시 컨트롤 타입: ephemeral\n",
      "시스템 프롬프트 캐싱: True\n",
      "도구 캐싱: True\n"
     ]
    }
   ],
   "source": [
    "from context_engineering_research_agent.context_strategies.caching import (\n",
    "    ContextCachingStrategy,\n",
    "    CachingConfig,\n",
    ")\n",
    "\n",
    "config = CachingConfig(\n",
    "    min_cacheable_tokens=1024,\n",
    "    cache_control_type=\"ephemeral\",\n",
    "    enable_for_system_prompt=True,\n",
    "    enable_for_tools=True,\n",
    ")\n",
    "\n",
    "print(f\"최소 캐싱 토큰: {config.min_cacheable_tokens:,}\")\n",
    "print(f\"캐시 컨트롤 타입: {config.cache_control_type}\")\n",
    "print(f\"시스템 프롬프트 캐싱: {config.enable_for_system_prompt}\")\n",
    "print(f\"도구 캐싱: {config.enable_for_tools}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "caching_test",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "짧은 콘텐츠: 11 자 → 캐싱 대상: False\n",
      "긴 콘텐츠: 5,500 자 → 캐싱 대상: True\n"
     ]
    }
   ],
   "source": [
    "strategy = ContextCachingStrategy(config=config)\n",
    "\n",
    "short_content = \"짧은 시스템 프롬프트\"\n",
    "long_content = \"긴 시스템 프롬프트 \" * 500\n",
    "\n",
    "print(f\"짧은 콘텐츠: {len(short_content)} 자 → 캐싱 대상: {strategy._should_cache(short_content)}\")\n",
    "print(f\"긴 콘텐츠: {len(long_content):,} 자 → 캐싱 대상: {strategy._should_cache(long_content)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "integration",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 통합 에이전트 실행\n",
    "\n",
    "5가지 전략이 모두 적용된 에이전트를 실행합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "agent_create",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "에이전트 타입: CompiledStateGraph\n"
     ]
    }
   ],
   "source": [
    "from context_engineering_research_agent import create_context_aware_agent\n",
    "\n",
    "agent = create_context_aware_agent(\n",
    "    model=\"gpt-4.1\",\n",
    "    enable_offloading=True,\n",
    "    enable_reduction=True,\n",
    "    enable_caching=True,\n",
    "    offloading_token_limit=20000,\n",
    "    reduction_threshold=0.85,\n",
    ")\n",
    "\n",
    "print(f\"에이전트 타입: {type(agent).__name__}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "comparison_intro",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 전략 활성화/비활성화 비교 실험\n",
    "\n",
    "각 전략을 활성화/비활성화했을 때의 차이점을 실험합니다.\n",
    "\n",
    "### 실험 설계\n",
    "\n",
    "| 실험 | Offloading | Reduction | Caching | 목적 |\n",
    "|------|------------|-----------|---------|------|\n",
    "| 1. 기본 | ❌ | ❌ | ❌ | 베이스라인 |\n",
    "| 2. Offloading만 | ✅ | ❌ | ❌ | 대용량 결과 축출 효과 |\n",
    "| 3. Reduction만 | ❌ | ✅ | ❌ | 컨텍스트 압축 효과 |\n",
    "| 4. 모두 활성화 | ✅ | ✅ | ✅ | 전체 효과 |\n",
    "\n",
    "### 실패 모드(컨텍스트가 커질 때) 시뮬레이션 실험\n",
    "\n",
    "아래 실험 5~8은 **API 키 없이 실행 가능한 순수 파이썬 시뮬레이션**으로, “컨텍스트 실패 모드”를 재현하고 완화책을 보여줍니다.\n",
    "\n",
    "| 실험 | 실패 모드 | 무엇을 재현하나 | 완화책(예시) |\n",
    "|------|----------|-----------------|--------------|\n",
    "| 5 | Confusion | 도구가 많고 유사할수록 선택이 흔들림 | 도구 로딩 제한 / 계층적 액션 스페이스 |\n",
    "| 6 | Clash | 연속된 관찰이 서로 모순될 때 혼란 | 충돌 감지 / 재검증 / 불확실성 표기 |\n",
    "| 7 | Distraction | 긴 로그에서 반복 행동으로 쏠림 | 계획/목표 리프레시 / 강제 다음 행동 |\n",
    "| 8 | Poisoning | 검증되지 않은 사실이 메모리를 오염 | 출처 태깅 / 검증 게이트 / 격리 |\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "exp1_offloading",
   "metadata": {},
   "source": [
    "### 실험 1: Offloading 전략 효과\n",
    "\n",
    "대용량 도구 결과가 있을 때 Offloading 활성화/비활성화 비교"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "exp1_code",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "작은 결과 크기: 23 자\n",
      "대용량 결과 크기: 305,889 자\n",
      "\n",
      "[Offloading 비활성화 시]\n",
      "  작은 결과 축출: False\n",
      "  대용량 결과 축출: False\n",
      "  → 대용량 결과가 컨텍스트에 그대로 포함됨\n",
      "\n",
      "[Offloading 활성화 시]\n",
      "  작은 결과 축출: False\n",
      "  대용량 결과 축출: True\n",
      "  → 대용량 결과는 파일로 저장, 미리보기만 컨텍스트에 포함\n",
      "\n",
      "미리보기 크기: 6,159 자 (원본의 2.0%)\n"
     ]
    }
   ],
   "source": [
    "small_result = \"검색 결과: 항목 1, 항목 2, 항목 3\"\n",
    "large_result = \"\\n\".join([f\"검색 결과 {i}: \" + \"상세 내용 \" * 100 for i in range(500)])\n",
    "\n",
    "print(f\"작은 결과 크기: {len(small_result):,} 자\")\n",
    "print(f\"대용량 결과 크기: {len(large_result):,} 자\")\n",
    "print()\n",
    "\n",
    "offloading_disabled = ContextOffloadingStrategy(\n",
    "    config=OffloadingConfig(token_limit_before_evict=999999999)\n",
    ")\n",
    "offloading_enabled = ContextOffloadingStrategy(\n",
    "    config=OffloadingConfig(token_limit_before_evict=20000)\n",
    ")\n",
    "\n",
    "print(\"[Offloading 비활성화 시]\")\n",
    "print(f\"  작은 결과 축출: {offloading_disabled._should_offload(small_result)}\")\n",
    "print(f\"  대용량 결과 축출: {offloading_disabled._should_offload(large_result)}\")\n",
    "print(f\"  → 대용량 결과가 컨텍스트에 그대로 포함됨\")\n",
    "print()\n",
    "\n",
    "print(\"[Offloading 활성화 시]\")\n",
    "print(f\"  작은 결과 축출: {offloading_enabled._should_offload(small_result)}\")\n",
    "print(f\"  대용량 결과 축출: {offloading_enabled._should_offload(large_result)}\")\n",
    "print(f\"  → 대용량 결과는 파일로 저장, 미리보기만 컨텍스트에 포함\")\n",
    "print()\n",
    "\n",
    "preview = offloading_enabled._create_preview(large_result)\n",
    "print(f\"미리보기 크기: {len(preview):,} 자 (원본의 {len(preview)/len(large_result)*100:.1f}%)\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "exp2_reduction",
   "metadata": {},
   "source": [
    "### 실험 2: Reduction 전략 효과\n",
    "\n",
    "긴 대화에서 Compaction 적용 전/후 비교"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "exp2_code",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Reduction 비활성화 시]\n",
      "  메시지 수: 85\n",
      "  추정 토큰: 2,972\n",
      "  → 모든 도구 호출/결과가 컨텍스트에 유지됨\n",
      "\n",
      "[Reduction 활성화 시 - Compaction]\n",
      "  메시지 수: 85 → 60\n",
      "  추정 토큰: 2,972 → 2,350\n",
      "  절약된 토큰: 622 (20.9%)\n",
      "  → 오래된 도구 호출/결과가 제거되어 컨텍스트 효율화\n"
     ]
    }
   ],
   "source": [
    "messages_with_tools = []\n",
    "for i in range(30):\n",
    "    messages_with_tools.append(HumanMessage(content=f\"질문 {i}: \" + \"내용 \" * 50))\n",
    "    ai_msg = AIMessage(\n",
    "        content=f\"답변 {i}: \" + \"응답 \" * 50,\n",
    "        tool_calls=[{'id': f'call_{i}', 'name': 'search', 'args': {'q': 'test'}}] if i < 25 else []\n",
    "    )\n",
    "    messages_with_tools.append(ai_msg)\n",
    "    if i < 25:\n",
    "        messages_with_tools.append(ToolMessage(content=f\"도구 결과 {i}: \" + \"결과 \" * 30, tool_call_id=f'call_{i}'))\n",
    "\n",
    "reduction = ContextReductionStrategy(\n",
    "    config=ReductionConfig(compaction_age_threshold=10)\n",
    ")\n",
    "\n",
    "original_tokens = reduction._estimate_tokens(messages_with_tools)\n",
    "print(f\"[Reduction 비활성화 시]\")\n",
    "print(f\"  메시지 수: {len(messages_with_tools)}\")\n",
    "print(f\"  추정 토큰: {original_tokens:,}\")\n",
    "print(f\"  → 모든 도구 호출/결과가 컨텍스트에 유지됨\")\n",
    "print()\n",
    "\n",
    "compacted, result = reduction.apply_compaction(messages_with_tools)\n",
    "compacted_tokens = reduction._estimate_tokens(compacted)\n",
    "\n",
    "print(f\"[Reduction 활성화 시 - Compaction]\")\n",
    "print(f\"  메시지 수: {len(messages_with_tools)} → {len(compacted)}\")\n",
    "print(f\"  추정 토큰: {original_tokens:,} → {compacted_tokens:,}\")\n",
    "print(f\"  절약된 토큰: {result.estimated_tokens_saved:,} ({result.estimated_tokens_saved/original_tokens*100:.1f}%)\")\n",
    "print(f\"  → 오래된 도구 호출/결과가 제거되어 컨텍스트 효율화\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "exp3_combined",
   "metadata": {},
   "source": [
    "### 실험 3: 전략 조합 효과 시뮬레이션\n",
    "\n",
    "모든 전략을 함께 적용했을 때의 시너지 효과"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "exp3_code",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "============================================================\n",
      "시나리오: 복잡한 연구 작업 수행\n",
      "============================================================\n",
      "\n",
      "[시나리오 설정]\n",
      "  대화 턴 수: 50\n",
      "  도구 호출 수: 40\n",
      "  대용량 결과 수: 5\n",
      "  평균 결과 크기: 100k 자\n",
      "\n",
      "[모든 전략 비활성화 시]\n",
      "  예상 컨텍스트 크기: 537,000 자 (~134,250 토큰)\n",
      "  문제: 컨텍스트 윈도우 초과 가능성 높음\n",
      "\n",
      "[Offloading만 활성화 시]\n",
      "  예상 컨텍스트 크기: 42,000 자 (~10,500 토큰)\n",
      "  절약: 495,000 자 (92.2%)\n",
      "\n",
      "[Offloading + Reduction 활성화 시]\n",
      "  예상 컨텍스트 크기: 25,200 자 (~6,300 토큰)\n",
      "  총 절약: 511,800 자 (95.3%)\n",
      "\n",
      "[+ Caching 활성화 시 추가 효과]\n",
      "  시스템 프롬프트 캐싱으로 반복 호출 비용 90% 절감\n",
      "  응답 속도 향상\n"
     ]
    }
   ],
   "source": [
    "print(\"=\" * 60)\n",
    "print(\"시나리오: 복잡한 연구 작업 수행\")\n",
    "print(\"=\" * 60)\n",
    "print()\n",
    "\n",
    "scenario = {\n",
    "    \"대화 턴 수\": 50,\n",
    "    \"도구 호출 수\": 40,\n",
    "    \"대용량 결과 수\": 5,\n",
    "    \"평균 결과 크기\": \"100k 자\",\n",
    "}\n",
    "\n",
    "print(\"[시나리오 설정]\")\n",
    "for k, v in scenario.items():\n",
    "    print(f\"  {k}: {v}\")\n",
    "print()\n",
    "\n",
    "baseline_context = 50 * 500 + 40 * 300 + 5 * 100000\n",
    "print(\"[모든 전략 비활성화 시]\")\n",
    "print(f\"  예상 컨텍스트 크기: {baseline_context:,} 자 (~{baseline_context//4:,} 토큰)\")\n",
    "print(f\"  문제: 컨텍스트 윈도우 초과 가능성 높음\")\n",
    "print()\n",
    "\n",
    "with_offloading = 50 * 500 + 40 * 300 + 5 * 1000\n",
    "print(\"[Offloading만 활성화 시]\")\n",
    "print(f\"  예상 컨텍스트 크기: {with_offloading:,} 자 (~{with_offloading//4:,} 토큰)\")\n",
    "print(f\"  절약: {(baseline_context - with_offloading):,} 자 ({(baseline_context - with_offloading)/baseline_context*100:.1f}%)\")\n",
    "print()\n",
    "\n",
    "with_reduction = with_offloading * 0.6\n",
    "print(\"[Offloading + Reduction 활성화 시]\")\n",
    "print(f\"  예상 컨텍스트 크기: {int(with_reduction):,} 자 (~{int(with_reduction)//4:,} 토큰)\")\n",
    "print(f\"  총 절약: {int(baseline_context - with_reduction):,} 자 ({(baseline_context - with_reduction)/baseline_context*100:.1f}%)\")\n",
    "print()\n",
    "\n",
    "print(\"[+ Caching 활성화 시 추가 효과]\")\n",
    "print(f\"  시스템 프롬프트 캐싱으로 반복 호출 비용 90% 절감\")\n",
    "print(f\"  응답 속도 향상\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "exp4_live",
   "metadata": {},
   "source": [
    "### 실험 4: 실제 에이전트 실행 비교\n",
    "\n",
    "실제 에이전트를 다른 설정으로 생성하여 비교합니다."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "exp4_code",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "에이전트 생성 비교\n",
      "============================================================\n",
      "\n",
      "[기본 (모두 비활성화)]\n",
      "  Offloading: ❌\n",
      "  Reduction:  ❌\n",
      "  Caching:    ❌\n",
      "  에이전트 타입: CompiledStateGraph\n",
      "\n",
      "[Offloading만]\n",
      "  Offloading: ✅\n",
      "  Reduction:  ❌\n",
      "  Caching:    ❌\n",
      "  에이전트 타입: CompiledStateGraph\n",
      "\n",
      "[Reduction만]\n",
      "  Offloading: ❌\n",
      "  Reduction:  ✅\n",
      "  Caching:    ❌\n",
      "  에이전트 타입: CompiledStateGraph\n",
      "\n",
      "[모두 활성화]\n",
      "  Offloading: ✅\n",
      "  Reduction:  ✅\n",
      "  Caching:    ✅\n",
      "  에이전트 타입: CompiledStateGraph\n",
      "\n",
      "============================================================\n",
      "모든 에이전트가 성공적으로 생성되었습니다.\n"
     ]
    }
   ],
   "source": [
    "from context_engineering_research_agent import create_context_aware_agent\n",
    "\n",
    "print(\"에이전트 생성 비교\")\n",
    "print(\"=\" * 60)\n",
    "\n",
    "configs = [\n",
    "    {\"name\": \"기본 (모두 비활성화)\", \"offloading\": False, \"reduction\": False, \"caching\": False},\n",
    "    {\"name\": \"Offloading만\", \"offloading\": True, \"reduction\": False, \"caching\": False},\n",
    "    {\"name\": \"Reduction만\", \"offloading\": False, \"reduction\": True, \"caching\": False},\n",
    "    {\"name\": \"모두 활성화\", \"offloading\": True, \"reduction\": True, \"caching\": True},\n",
    "]\n",
    "\n",
    "for cfg in configs:\n",
    "    agent = create_context_aware_agent(\n",
    "        model=\"gpt-4.1\",\n",
    "        enable_offloading=cfg[\"offloading\"],\n",
    "        enable_reduction=cfg[\"reduction\"],\n",
    "        enable_caching=cfg[\"caching\"],\n",
    "    )\n",
    "    print(f\"\\n[{cfg['name']}]\")\n",
    "    print(f\"  Offloading: {'✅' if cfg['offloading'] else '❌'}\")\n",
    "    print(f\"  Reduction:  {'✅' if cfg['reduction'] else '❌'}\")\n",
    "    print(f\"  Caching:    {'✅' if cfg['caching'] else '❌'}\")\n",
    "    print(f\"  에이전트 타입: {type(agent).__name__}\")\n",
    "\n",
    "print(\"\\n\" + \"=\" * 60)\n",
    "print(\"모든 에이전트가 성공적으로 생성되었습니다.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "exp7_distraction_real_md",
   "metadata": {},
   "source": [
    "#### (실행) ToolCallLimitMiddleware로 반복 행동(Distraction) 억제\n",
    "\n",
    "- Baseline: 같은 `web_search`를 반복 호출\n",
    "- With `ToolCallLimitMiddleware(tool_name='web_search', run_limit=1)`: 2회차부터 차단되어 다른 행동으로 전환\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "exp7_distraction_real_code",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "============================================================\n",
      "[Baseline] 제한 없음\n",
      "============================================================\n",
      "00 HUMAN: Context engineering을 조사해줘\n",
      "01 AI: search loop 1\n",
      "     tool_call: name=web_search id=call_76fe1482 args={'query': 'context engineering'}\n",
      "02 TOOL: name=web_search status=success id=call_76fe1482\n",
      "     content: (dummy) result for 'context engineering'\n",
      "03 AI: search loop 2\n",
      "     tool_call: name=web_search id=call_9e523f0a args={'query': 'context engineering'}\n",
      "04 TOOL: name=web_search status=success id=call_9e523f0a\n",
      "     content: (dummy) result for 'context engineering'\n",
      "05 AI: search loop 3\n",
      "     tool_call: name=web_search id=call_ea484b66 args={'query': 'context engineering'}\n",
      "06 TOOL: name=web_search status=success id=call_ea484b66\n",
      "     content: (dummy) result for 'context engineering'\n",
      "07 AI: switch to todos\n",
      "     tool_call: name=write_todos id=call_61a16615 args={'todos': ['summarize findings']}\n",
      "08 TOOL: name=write_todos status=success id=call_61a16615\n",
      "     content: {\"todos\": [\"summarize findings\"]}\n",
      "09 AI: FINAL todo list written\n",
      "\n",
      "============================================================\n",
      "[With ToolCallLimitMiddleware] web_search run_limit=1\n",
      "============================================================\n",
      "00 HUMAN: Context engineering을 조사해줘\n",
      "01 AI: search loop 1\n",
      "     tool_call: name=web_search id=call_281f3c81 args={'query': 'context engineering'}\n",
      "02 TOOL: name=web_search status=success id=call_281f3c81\n",
      "     content: (dummy) result for 'context engineering'\n",
      "03 AI: search loop 2\n",
      "     tool_call: name=web_search id=call_9b876f09 args={'query': 'context engineering'}\n",
      "04 TOOL: name=web_search status=error id=call_9b876f09\n",
      "     content: Tool call limit exceeded. Do not call 'web_search' again.\n",
      "05 AI: switch to todos\n",
      "     tool_call: name=write_todos id=call_5a2def37 args={'todos': ['summarize findings']}\n",
      "06 TOOL: name=write_todos status=success id=call_5a2def37\n",
      "     content: {\"todos\": [\"summarize findings\"]}\n",
      "07 AI: FINAL todo list written\n"
     ]
    }
   ],
   "source": [
    "from __future__ import annotations\n",
    "\n",
    "\n",
    "@tool(description=\"Dummy web search tool\")\n",
    "def web_search(query: str) -> str:\n",
    "    return f\"(dummy) result for {query!r}\"\n",
    "\n",
    "\n",
    "@tool(description=\"Write a todo list\")\n",
    "def write_todos(todos: list[str]) -> str:\n",
    "    return json.dumps({\"todos\": todos})\n",
    "\n",
    "\n",
    "class LoopingSearchModel(BaseChatModel):\n",
    "    def bind_tools(self, tools: list[Any], **kwargs: Any):  # noqa: ANN401\n",
    "        _ = kwargs\n",
    "        self._tool_names = [t.name for t in tools if hasattr(t, 'name')]\n",
    "        return self\n",
    "\n",
    "    @property\n",
    "    def _llm_type(self) -> str:\n",
    "        return 'looping-search'\n",
    "\n",
    "    @property\n",
    "    def _identifying_params(self) -> dict[str, Any]:\n",
    "        return {}\n",
    "\n",
    "    def _generate(self, messages: list[BaseMessage], stop=None, run_manager=None, **kwargs: Any) -> ChatResult:\n",
    "        _ = (stop, run_manager, kwargs)\n",
    "\n",
    "        # Count tool outcomes (robust stop conditions).\n",
    "        ok_search_results = [\n",
    "            m\n",
    "            for m in messages\n",
    "            if isinstance(m, ToolMessage) and m.name == 'web_search' and (m.status is None or m.status == 'success')\n",
    "        ]\n",
    "        error_search_results = [\n",
    "            m\n",
    "            for m in messages\n",
    "            if isinstance(m, ToolMessage) and m.name == 'web_search' and m.status == 'error'\n",
    "        ]\n",
    "        has_todo_result = any(isinstance(m, ToolMessage) and m.name == 'write_todos' for m in messages)\n",
    "\n",
    "        # If we already wrote a todo list, end the run (avoid infinite tool-call loops).\n",
    "        if has_todo_result:\n",
    "            return ChatResult(generations=[ChatGeneration(message=AIMessage(content='FINAL todo list written'))])\n",
    "\n",
    "        if len(ok_search_results) < 3 and not error_search_results:\n",
    "            tcid = f\"call_{uuid.uuid4().hex[:8]}\"\n",
    "            msg = AIMessage(\n",
    "                content=f\"search loop {len(ok_search_results)+1}\",\n",
    "                tool_calls=[{'id': tcid, 'name': 'web_search', 'args': {'query': 'context engineering'}, 'type': 'tool_call'}],\n",
    "            )\n",
    "            return ChatResult(generations=[ChatGeneration(message=msg)])\n",
    "\n",
    "        # If blocked/error occurred (or we reached 3 searches), switch to planning once.\n",
    "        tcid = f\"call_{uuid.uuid4().hex[:8]}\"\n",
    "        msg = AIMessage(\n",
    "            content='switch to todos',\n",
    "            tool_calls=[{'id': tcid, 'name': 'write_todos', 'args': {'todos': ['summarize findings']}, 'type': 'tool_call'}],\n",
    "        )\n",
    "        return ChatResult(generations=[ChatGeneration(message=msg)])\n",
    "\n",
    "\n",
    "user = HumanMessage(content=\"Context engineering을 조사해줘\")\n",
    "state = {\"messages\": [user]}\n",
    "\n",
    "print(\"=\" * 60)\n",
    "print(\"[Baseline] 제한 없음\")\n",
    "print(\"=\" * 60)\n",
    "agent_baseline = create_agent(model=LoopingSearchModel(), tools=[web_search, write_todos], middleware=[])\n",
    "res1 = agent_baseline.invoke(state, {\"configurable\": {\"thread_id\": \"exp7_baseline\"}})\n",
    "_print_messages(res1[\"messages\"])\n",
    "\n",
    "print(\"\\n\" + \"=\" * 60)\n",
    "print(\"[With ToolCallLimitMiddleware] web_search run_limit=1\")\n",
    "print(\"=\" * 60)\n",
    "limiter = ToolCallLimitMiddleware(tool_name='web_search', run_limit=1, exit_behavior='continue')\n",
    "agent_limited = create_agent(model=LoopingSearchModel(), tools=[web_search, write_todos], middleware=[limiter])\n",
    "res2 = agent_limited.invoke(state, {\"configurable\": {\"thread_id\": \"exp7_limited\"}})\n",
    "_print_messages(res2[\"messages\"])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "exp8_poisoning",
   "metadata": {},
   "source": [
    "### 실험 8: Context Poisoning (검증되지 않은 사실의 오염)\n",
    "\n",
    "검증되지 않은 정보가 컨텍스트/메모리에 들어가면, 이후 의사결정이 그 “오염된 사실”을 기반으로 굳어질 수 있습니다.\n",
    "\n",
    "이 실험은:\n",
    "\n",
    "- 출처 없는 메모리 항목(검증되지 않음)이 이후 판단에 끼어드는 상황\n",
    "- 완화책: **출처 태깅 + 검증 게이트(verified only)**\n",
    "\n",
    "을 시뮬레이션합니다.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "exp8_poisoning_code",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "============================================================\n",
      "[A] 정상 메모리\n",
      "============================================================\n",
      "blind plan: Install the package.\n",
      "verified-only plan: Install the package.\n",
      "\n",
      "============================================================\n",
      "[B] 오염된 메모리(Poisoning)\n",
      "============================================================\n",
      "blind plan: Install the package.\n",
      "verified-only plan: Install the package.\n",
      "\n",
      "============================================================\n",
      "[C] 완화책: 출처 없는 사실은 검증 요청으로 라우팅\n",
      "============================================================\n",
      "needs_verification:\n",
      "  - package_installed='yes' source=None verified=False\n",
      "\n",
      "→ 정책: tool로 재확인 후에만 state/memory에 반영\n"
     ]
    }
   ],
   "source": [
    "from __future__ import annotations\n",
    "\n",
    "from dataclasses import dataclass\n",
    "\n",
    "\n",
    "@dataclass(frozen=True)\n",
    "class MemoryItem:\n",
    "    key: str\n",
    "    value: str\n",
    "    source: str | None  # tool_call_id 등\n",
    "    verified: bool\n",
    "\n",
    "\n",
    "def choose_plan(memory: list[MemoryItem]) -> str:\n",
    "    # Toy planner: 메모리를 그대로 신뢰한다(나쁜 예)\n",
    "    installed = next((m.value for m in memory if m.key == \"package_installed\"), \"unknown\")\n",
    "    if installed == \"yes\":\n",
    "        return \"Skip install; proceed to use the package.\"\n",
    "    if installed == \"no\":\n",
    "        return \"Install the package.\"\n",
    "    return \"Check whether the package is installed.\"\n",
    "\n",
    "\n",
    "def choose_plan_verified_only(memory: list[MemoryItem]) -> str:\n",
    "    verified = [m for m in memory if m.verified]\n",
    "    return choose_plan(verified)\n",
    "\n",
    "\n",
    "memory_clean = [\n",
    "    MemoryItem(key=\"package_installed\", value=\"no\", source=\"tool_call_1\", verified=True),\n",
    "]\n",
    "\n",
    "memory_poisoned = [\n",
    "    MemoryItem(key=\"package_installed\", value=\"no\", source=\"tool_call_1\", verified=True),\n",
    "    # 오염: 출처 없음 + 검증되지 않음\n",
    "    MemoryItem(key=\"package_installed\", value=\"yes\", source=None, verified=False),\n",
    "]\n",
    "\n",
    "print(\"=\" * 60)\n",
    "print(\"[A] 정상 메모리\")\n",
    "print(\"=\" * 60)\n",
    "print(\"blind plan:\", choose_plan(memory_clean))\n",
    "print(\"verified-only plan:\", choose_plan_verified_only(memory_clean))\n",
    "print()\n",
    "\n",
    "print(\"=\" * 60)\n",
    "print(\"[B] 오염된 메모리(Poisoning)\")\n",
    "print(\"=\" * 60)\n",
    "print(\"blind plan:\", choose_plan(memory_poisoned))\n",
    "print(\"verified-only plan:\", choose_plan_verified_only(memory_poisoned))\n",
    "print()\n",
    "\n",
    "print(\"=\" * 60)\n",
    "print(\"[C] 완화책: 출처 없는 사실은 검증 요청으로 라우팅\")\n",
    "print(\"=\" * 60)\n",
    "needs_verification = [m for m in memory_poisoned if (m.source is None or not m.verified)]\n",
    "print(\"needs_verification:\")\n",
    "for item in needs_verification:\n",
    "    print(f\"  - {item.key}='{item.value}' source={item.source} verified={item.verified}\")\n",
    "print(\"\\n→ 정책: tool로 재확인 후에만 state/memory에 반영\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "exp8_poisoning_real_md",
   "metadata": {},
   "source": [
    "#### (실행) 검증 게이트(Verification Gate)로 Poisoning 차단\n",
    "\n",
    "- Baseline: `verified=false` 결과를 그대로 믿고 잘못된 계획을 수립\n",
    "- With gate middleware: `verified=false` 사실은 차단하고, 검증된 tool로 재확인하도록 강제\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "exp8_poisoning_real_code",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "============================================================\n",
      "[Baseline] verification gate 없음\n",
      "============================================================\n",
      "00 HUMAN: 패키지 X 설치가 필요한지 판단해줘\n",
      "01 AI: guess\n",
      "     tool_call: name=guess_install_status id=call_ccef8e23 args={}\n",
      "02 TOOL: name=guess_install_status status=success id=call_ccef8e23\n",
      "     content: {\"package_installed\": \"yes\", \"verified\": false, \"source\": \"guess\"}\n",
      "03 AI: FINAL decision=SKIP (source=guess)\n",
      "\n",
      "============================================================\n",
      "[With VerificationGateMiddleware]\n",
      "============================================================\n",
      "00 HUMAN: 패키지 X 설치가 필요한지 판단해줘\n",
      "01 AI: guess\n",
      "     tool_call: name=guess_install_status id=call_70cc8071 args={}\n",
      "02 TOOL: name=guess_install_status status=success id=call_70cc8071\n",
      "     content: {\"package_installed\": \"yes\", \"verified\": false, \"source\": \"guess\"}\n",
      "03 SYSTEM: UNVERIFIED_FACT_BLOCKED: Do not trust guess_install_status. Call scan_install_status and decide based on verified=true only.\n",
      "04 AI: scan\n",
      "     tool_call: name=scan_install_status id=call_9e23662c args={}\n",
      "05 TOOL: name=scan_install_status status=success id=call_9e23662c\n",
      "     content: {\"package_installed\": \"no\", \"verified\": true, \"source\": \"scan\"}\n",
      "06 AI: FINAL decision=INSTALL (source=scan)\n"
     ]
    }
   ],
   "source": [
    "from __future__ import annotations\n",
    "\n",
    "from langchain.agents.middleware.types import AgentState\n",
    "from langgraph.runtime import Runtime\n",
    "\n",
    "\n",
    "@tool(description=\"Unverified guess of install status\")\n",
    "def guess_install_status() -> str:\n",
    "    # Poisoned / unverified\n",
    "    return json.dumps({\"package_installed\": \"yes\", \"verified\": False, \"source\": \"guess\"})\n",
    "\n",
    "\n",
    "@tool(description=\"Verified scan of install status\")\n",
    "def scan_install_status() -> str:\n",
    "    # Verified\n",
    "    return json.dumps({\"package_installed\": \"no\", \"verified\": True, \"source\": \"scan\"})\n",
    "\n",
    "\n",
    "class VerificationGateMiddleware(AgentMiddleware):\n",
    "    def before_model(self, state: AgentState, runtime: Runtime[Any]) -> dict[str, Any] | None:  # noqa: ARG002\n",
    "        messages = state.get('messages', [])\n",
    "\n",
    "        # Avoid repeatedly injecting the same constraint.\n",
    "        if any(isinstance(m, SystemMessage) and 'UNVERIFIED_FACT_BLOCKED' in m.content for m in messages):\n",
    "            return None\n",
    "        if any(isinstance(m, ToolMessage) and m.name == 'scan_install_status' for m in messages):\n",
    "            return None\n",
    "\n",
    "        # If we see an unverified tool result, inject a system constraint.\n",
    "        for m in reversed(messages):\n",
    "            if isinstance(m, ToolMessage) and m.name == 'guess_install_status':\n",
    "                try:\n",
    "                    data = json.loads(str(m.content))\n",
    "                except json.JSONDecodeError:\n",
    "                    continue\n",
    "                if data.get('verified') is False:\n",
    "                    patched = list(messages)\n",
    "                    patched.append(\n",
    "                        SystemMessage(\n",
    "                            content=(\n",
    "                                'UNVERIFIED_FACT_BLOCKED: Do not trust guess_install_status. '\n",
    "                                'Call scan_install_status and decide based on verified=true only.'\n",
    "                            )\n",
    "                        )\n",
    "                    )\n",
    "                    return {'messages': Overwrite(patched)}\n",
    "        return None\n",
    "\n",
    "\n",
    "class InstallPlannerModel(BaseChatModel):\n",
    "    def bind_tools(self, tools: list[Any], **kwargs: Any):  # noqa: ANN401\n",
    "        _ = kwargs\n",
    "        self._tool_names = [t.name for t in tools if hasattr(t, 'name')]\n",
    "        return self\n",
    "\n",
    "    @property\n",
    "    def _llm_type(self) -> str:\n",
    "        return 'install-planner'\n",
    "\n",
    "    @property\n",
    "    def _identifying_params(self) -> dict[str, Any]:\n",
    "        return {}\n",
    "\n",
    "    def _generate(self, messages: list[BaseMessage], stop=None, run_manager=None, **kwargs: Any) -> ChatResult:\n",
    "        _ = (stop, run_manager, kwargs)\n",
    "\n",
    "        # If scan result exists, finalize.\n",
    "        for m in reversed(messages):\n",
    "            if isinstance(m, ToolMessage) and m.name == 'scan_install_status':\n",
    "                data = json.loads(str(m.content))\n",
    "                decision = 'INSTALL' if data.get('package_installed') == 'no' else 'SKIP'\n",
    "                return ChatResult(generations=[ChatGeneration(message=AIMessage(content=f\"FINAL decision={decision} (source=scan)\"))])\n",
    "\n",
    "        blocked = any(\n",
    "            isinstance(m, SystemMessage) and 'UNVERIFIED_FACT_BLOCKED' in m.content for m in messages\n",
    "        )\n",
    "\n",
    "        if blocked:\n",
    "            tcid = f\"call_{uuid.uuid4().hex[:8]}\"\n",
    "            msg = AIMessage(content='scan', tool_calls=[{'id': tcid, 'name': 'scan_install_status', 'args': {}, 'type': 'tool_call'}])\n",
    "            return ChatResult(generations=[ChatGeneration(message=msg)])\n",
    "\n",
    "        # Baseline behavior: trust guess first.\n",
    "        if not any(isinstance(m, ToolMessage) and m.name == 'guess_install_status' for m in messages):\n",
    "            tcid = f\"call_{uuid.uuid4().hex[:8]}\"\n",
    "            msg = AIMessage(content='guess', tool_calls=[{'id': tcid, 'name': 'guess_install_status', 'args': {}, 'type': 'tool_call'}])\n",
    "            return ChatResult(generations=[ChatGeneration(message=msg)])\n",
    "\n",
    "        # If guess exists and no gate, finalize (poisoned).\n",
    "        for m in reversed(messages):\n",
    "            if isinstance(m, ToolMessage) and m.name == 'guess_install_status':\n",
    "                data = json.loads(str(m.content))\n",
    "                decision = 'SKIP' if data.get('package_installed') == 'yes' else 'INSTALL'\n",
    "                return ChatResult(generations=[ChatGeneration(message=AIMessage(content=f\"FINAL decision={decision} (source=guess)\"))])\n",
    "\n",
    "        return ChatResult(generations=[ChatGeneration(message=AIMessage(content='FINAL no decision'))])\n",
    "\n",
    "\n",
    "user = HumanMessage(content='패키지 X 설치가 필요한지 판단해줘')\n",
    "state = {\"messages\": [user]}\n",
    "\n",
    "tools = [guess_install_status, scan_install_status]\n",
    "\n",
    "print(\"=\" * 60)\n",
    "print(\"[Baseline] verification gate 없음\")\n",
    "print(\"=\" * 60)\n",
    "agent_baseline = create_agent(model=InstallPlannerModel(), tools=tools, middleware=[])\n",
    "res1 = agent_baseline.invoke(state, {\"configurable\": {\"thread_id\": \"exp8_baseline\"}})\n",
    "_print_messages(res1['messages'])\n",
    "\n",
    "print(\"\\n\" + \"=\" * 60)\n",
    "print(\"[With VerificationGateMiddleware]\")\n",
    "print(\"=\" * 60)\n",
    "agent_gated = create_agent(model=InstallPlannerModel(), tools=tools, middleware=[VerificationGateMiddleware()])\n",
    "res2 = agent_gated.invoke(state, {\"configurable\": {\"thread_id\": \"exp8_gated\"}})\n",
    "_print_messages(res2['messages'])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "summary",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## 요약\n",
    "\n",
    "### Context Engineering 5가지 전략 요약\n",
    "\n",
    "| 전략 | 트리거 조건 | 효과 |\n",
    "|------|------------|------|\n",
    "| **Offloading** | 20k 토큰 초과 | 파일로 축출 |\n",
    "| **Reduction** | 85% 사용량 초과 | Compaction/Summarization |\n",
    "| **Retrieval** | 파일 접근 필요 | grep/glob 검색 |\n",
    "| **Isolation** | 복잡한 작업 | SubAgent 위임 |\n",
    "| **Caching** | 1k+ 토큰 시스템 프롬프트 | Prompt Caching |\n",
    "\n",
    "### 핵심 인사이트\n",
    "\n",
    "1. **파일시스템 = 외부 메모리**: 컨텍스트 윈도우는 제한되어 있지만, 파일시스템은 무한\n",
    "2. **점진적 공개**: 모든 정보를 한 번에 로드하지 않고 필요할 때만 로드\n",
    "3. **격리된 실행**: SubAgent로 컨텍스트 오염 방지\n",
    "4. **자동화된 관리**: 에이전트가 직접 컨텍스트를 관리하도록 미들웨어 설계\n",
    "5. **실패 모드 방어**: Poisoning/Distraction/Confusion/Clash를 관측하고 완화하는 규칙이 필요\n",
    "6. **도구도 컨텍스트다**: 도구 설명/목록도 최소화하고, 필요할 때만 로딩(또는 계층화)\n",
    "\n",
    "### 추가 실험(실패 모드)\n",
    "\n",
    "- **Confusion**: 유사 도구가 많을수록 선택이 불안정해짐(도구 로딩 제한/계층화로 완화)\n",
    "- **Clash**: 모순 관찰을 충돌로 기록하고 재검증으로 해소\n",
    "- **Distraction**: 장기 로그에서 반복 행동 쏠림(계획/다음 행동 강제로 완화)\n",
    "- **Poisoning**: 출처 없는 사실을 차단하고 검증 게이트로 통제\n",
    "\n",
    "### 추가 실험(실패 모드) - 실제 실행 기반\n",
    "\n",
    "- **Tool Selection**: `LLMToolSelectorMiddleware`로 tool set을 축소해 Confusion 완화\n",
    "- **Tool Call Limiting**: `ToolCallLimitMiddleware`로 반복 tool call을 차단해 Distraction 완화\n",
    "- **Filesystem Tools**: deepagents `FilesystemMiddleware`로 `ls/read_file/glob/grep` 실제 실행 로그 확인\n",
    "- **Custom Guards**: (실험 목적) 충돌 감지/검증 게이트를 `AgentMiddleware`로 구현해 Clash/Poisoning 완화\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "deepagent-context-engineering (3.13.9)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}