* fix: fill implementation gaps across core modules - Replace ConfidenceChecker placeholder methods with real implementations that search the codebase for duplicates, verify architecture docs exist, check research references, and validate root cause specificity - Fix intelligent_execute() error capture: collect actual errors from failed tasks instead of hardcoded None, format tracebacks as strings, and fix variable shadowing bug where loop var overwrote task parameter - Implement ReflexionPattern mindbase integration via HTTP API with graceful fallback when service is unavailable - Fix .gitignore: remove duplicate entries, add explicit !-rules for .claude/settings.json and .claude/skills/, remove Tests/ ignore - Remove unnecessary sys.path hack in cli/main.py - Fix FailureEntry.from_dict to not mutate input dict - Add comprehensive execution module tests: 62 new tests covering ParallelExecutor, ReflectionEngine, SelfCorrectionEngine, and the intelligent_execute orchestrator (136 total, all passing) https://claude.ai/code/session_01AnGJMAA6Qp2j9WKKHHZfB9 * chore: include test-generated reflexion artifacts https://claude.ai/code/session_01AnGJMAA6Qp2j9WKKHHZfB9 * fix: address 5 open GitHub issues (#536, #537, #531, #517, #534) Security fixes: - #536: Remove shell=True and user-controlled $SHELL from _run_command() to prevent arbitrary code execution. Use direct list-based subprocess.run without passing full os.environ to child processes. - #537: Add SHA-256 integrity verification for downloaded docker-compose and mcp-config files. Downloads are deleted on hash mismatch. Gateway config supports pinned hashes via docker_compose_sha256/mcp_config_sha256. Bug fixes: - #531: Add agent file installation to `superclaude install` and `update` commands. 20 agent markdown files are now copied to ~/.claude/agents/ alongside command installation. - #517: Fix MCP env var flag from --env to -e for API key passthrough, matching the Claude CLI's expected format. Usability: - #534: Replace Japanese trigger phrases and report labels in pm-agent.md and pm.md (both src/ and plugins/) with English equivalents for international accessibility. https://claude.ai/code/session_01AnGJMAA6Qp2j9WKKHHZfB9 * docs: align documentation with Claude Code and fix version/count gaps - Update CLAUDE.md project structure to include agents/ (20 agents), modes/ (7 modes), commands/ (30 commands), skills/, hooks/, mcp/, and core/ directories. Add Claude Code integration points section. - Fix version references: 4.1.5 -> 4.2.0 in installation.md, quick-start.md, and package.json (was 4.1.7) - Fix feature counts across all docs: - Commands: 21 -> 30 - Agents: 14/16 -> 20 - Modes: 6 -> 7 - MCP Servers: 6 -> 8 - Update README.md agent count from 16 to 20 - Add docs/user-guide/claude-code-integration.md explaining how SuperClaude maps to Claude Code's native features (commands, agents, hooks, skills, settings, MCP servers, pytest plugin) https://claude.ai/code/session_01AnGJMAA6Qp2j9WKKHHZfB9 * chore: update test-generated reflexion log https://claude.ai/code/session_01AnGJMAA6Qp2j9WKKHHZfB9 * docs: comprehensive Claude Code gap analysis and integration guide - Rewrite docs/user-guide/claude-code-integration.md with full feature mapping: all 28 hook events, skills system with YAML frontmatter, 5 settings scopes, permission rules, plan mode, extended thinking, agent teams, voice, desktop features, and session management. Includes detailed gap table showing where SuperClaude under-uses Claude Code capabilities (skills migration, hooks integration, plan mode, settings profiles). - Add Claude Code native features section to CLAUDE.md with extension points we use vs should use more (hooks, skills, plan mode, settings) - Add Claude Code integration gap analysis to KNOWLEDGE.md with prioritized action items for skills migration, hooks leverage, plan mode integration, and settings profiles https://claude.ai/code/session_01AnGJMAA6Qp2j9WKKHHZfB9 * chore: update test-generated reflexion log https://claude.ai/code/session_01AnGJMAA6Qp2j9WKKHHZfB9 * chore: bump version to 4.3.0 Bump version across all 15 files: - VERSION, pyproject.toml, package.json - src/superclaude/__init__.py, src/superclaude/__version__.py - CLAUDE.md, PLANNING.md, TASK.md, CHANGELOG.md - README.md, README-zh.md, README-ja.md, README-kr.md - docs/getting-started/installation.md, quick-start.md - docs/Development/pm-agent-integration.md Also fixes __version__.py which was out of sync at 0.4.0. Adds comprehensive CHANGELOG entry for v4.3.0. https://claude.ai/code/session_01AnGJMAA6Qp2j9WKKHHZfB9 * i18n: replace all Japanese/Chinese text with English in source files Replace CJK text with English across all non-translation files: - src/superclaude/commands/pm.md: 38 Japanese strings in PDCA cycle, error handling patterns, anti-patterns, document templates - src/superclaude/agents/pm-agent.md: 20 Japanese strings in PDCA phases, self-evaluation, documentation sections - plugins/superclaude/: synced from src/ copies - .github/workflows/readme-quality-check.yml: all Chinese comments, table headers, report strings, and PR comment text - .github/workflows/pull-sync-framework.yml: Japanese comment - .github/PULL_REQUEST_TEMPLATE.md: complete rewrite from Japanese Translation files (README-ja.md, docs/user-guide-jp/, etc.) are intentionally kept in their respective languages. https://claude.ai/code/session_01AnGJMAA6Qp2j9WKKHHZfB9 --------- Co-authored-by: Claude <noreply@anthropic.com>
17 KiB
KNOWLEDGE.md
Accumulated Insights, Best Practices, and Troubleshooting for SuperClaude Framework
This document captures lessons learned, common pitfalls, and solutions discovered during development. Consult this when encountering issues or learning project patterns.
Last Updated: 2025-11-12
🧠 Core Insights
PM Agent ROI: 25-250x Token Savings
Finding: Pre-execution confidence checking has exceptional ROI.
Evidence:
- Spending 100-200 tokens on confidence check saves 5,000-50,000 tokens on wrong-direction work
- Real example: Checking for duplicate implementations before coding (2min research) vs implementing duplicate feature (2hr work)
When it works best:
- Unclear requirements → Ask questions first
- New codebase → Search for existing patterns
- Complex features → Verify architecture compliance
- Bug fixes → Identify root cause before coding
When to skip:
- Trivial changes (typo fixes)
- Well-understood tasks with clear path
- Emergency hotfixes (but document learnings after)
Hallucination Detection: 94% Accuracy
Finding: The Four Questions catch most AI hallucinations.
The Four Questions:
- Are all tests passing? → REQUIRE actual output
- Are all requirements met? → LIST each requirement
- No assumptions without verification? → SHOW documentation
- Is there evidence? → PROVIDE test results, code changes, validation
Red flags that indicate hallucination:
- "Tests pass" (without showing output) 🚩
- "Everything works" (without evidence) 🚩
- "Implementation complete" (with failing tests) 🚩
- Skipping error messages 🚩
- Ignoring warnings 🚩
- "Probably works" language 🚩
Real example:
❌ BAD: "The API integration is complete and working correctly."
✅ GOOD: "The API integration is complete. Test output:
✅ test_api_connection: PASSED
✅ test_api_authentication: PASSED
✅ test_api_data_fetch: PASSED
All 3 tests passed in 1.2s"
Parallel Execution: 3.5x Speedup
Finding: Wave → Checkpoint → Wave pattern dramatically improves performance.
Pattern:
# Wave 1: Independent reads (parallel)
files = [Read(f1), Read(f2), Read(f3)]
# Checkpoint: Analyze together (sequential)
analysis = analyze_files(files)
# Wave 2: Independent edits (parallel)
edits = [Edit(f1), Edit(f2), Edit(f3)]
When to use:
- ✅ Reading multiple independent files
- ✅ Editing multiple unrelated files
- ✅ Running multiple independent searches
- ✅ Parallel test execution
When NOT to use:
- ❌ Operations with dependencies (file2 needs data from file1)
- ❌ Sequential analysis (building context step-by-step)
- ❌ Operations that modify shared state
Performance data:
- Sequential: 10 file reads = 10 API calls = ~30 seconds
- Parallel: 10 file reads = 1 API call = ~3 seconds
- Speedup: 3.5x average, up to 10x for large batches
🛠️ Common Pitfalls and Solutions
Pitfall 1: Implementing Before Checking for Duplicates
Problem: Spent hours implementing feature that already exists in codebase.
Solution: ALWAYS use Glob/Grep before implementing:
# Search for similar functions
uv run python -c "from pathlib import Path; print([f for f in Path('src').rglob('*.py') if 'feature_name' in f.read_text()])"
# Or use grep
grep -r "def feature_name" src/
Prevention: Run confidence check, ensure duplicate_check_complete=True
Pitfall 2: Assuming Architecture Without Verification
Problem: Implemented custom API when project uses Supabase.
Solution: READ CLAUDE.md and PLANNING.md before implementing:
# Check project tech stack
with open('CLAUDE.md') as f:
claude_md = f.read()
if 'Supabase' in claude_md:
# Use Supabase APIs, not custom implementation
Prevention: Run confidence check, ensure architecture_check_complete=True
Pitfall 3: Skipping Test Output
Problem: Claimed tests passed but they were actually failing.
Solution: ALWAYS show actual test output:
# Run tests and capture output
uv run pytest -v > test_output.txt
# Show in validation
echo "Test Results:"
cat test_output.txt
Prevention: Use SelfCheckProtocol, require evidence
Pitfall 4: Version Inconsistency
Problem: VERSION file says 4.1.9, but package.json says 4.1.5, pyproject.toml says 0.4.0.
Solution: Understand versioning strategy:
- Framework version (VERSION file): User-facing version (4.1.9)
- Python package (pyproject.toml): Library semantic version (0.4.0)
- NPM package (package.json): Should match framework version (4.1.9)
When updating versions:
- Update VERSION file first
- Update package.json to match
- Update README badges
- Consider if pyproject.toml needs bump (breaking changes?)
- Update CHANGELOG.md
Prevention: Create release checklist
Pitfall 5: UV Not Installed
Problem: Makefile requires uv but users don't have it.
Solution: Install UV:
# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
# With pip
pip install uv
Alternative: Provide fallback commands:
# With UV (preferred)
uv run pytest
# Without UV (fallback)
python -m pytest
Prevention: Document UV requirement in README
📚 Best Practices
Testing Best Practices
1. Use pytest markers for organization:
@pytest.mark.unit
def test_individual_function():
pass
@pytest.mark.integration
def test_component_interaction():
pass
@pytest.mark.confidence_check
def test_with_pre_check(confidence_checker):
pass
2. Use fixtures for shared setup:
# conftest.py
@pytest.fixture
def sample_context():
return {...}
# test_file.py
def test_feature(sample_context):
# Use sample_context
3. Test both happy path and edge cases:
def test_feature_success():
# Normal operation
def test_feature_with_empty_input():
# Edge case
def test_feature_with_invalid_data():
# Error handling
Git Workflow Best Practices
1. Conventional commits:
git commit -m "feat: add confidence checking to PM Agent"
git commit -m "fix: resolve version inconsistency"
git commit -m "docs: update CLAUDE.md with plugin warnings"
git commit -m "test: add unit tests for reflexion pattern"
2. Small, focused commits:
- Each commit should do ONE thing
- Commit message should explain WHY, not WHAT
- Code changes should be reviewable in <500 lines
3. Branch naming:
feature/add-confidence-check
fix/version-inconsistency
docs/update-readme
refactor/simplify-cli
test/add-unit-tests
Documentation Best Practices
1. Code documentation:
def assess(self, context: Dict[str, Any]) -> float:
"""
Assess confidence level (0.0 - 1.0)
Investigation Phase Checks:
1. No duplicate implementations? (25%)
2. Architecture compliance? (25%)
3. Official documentation verified? (20%)
4. Working OSS implementations referenced? (15%)
5. Root cause identified? (15%)
Args:
context: Context dict with task details
Returns:
float: Confidence score (0.0 = no confidence, 1.0 = absolute certainty)
Example:
>>> checker = ConfidenceChecker()
>>> confidence = checker.assess(context)
>>> if confidence >= 0.9:
... proceed_with_implementation()
"""
2. README structure:
- Start with clear value proposition
- Quick installation instructions
- Usage examples
- Link to detailed docs
- Contribution guidelines
- License
3. Keep docs synchronized with code:
- Update docs in same PR as code changes
- Review docs during code review
- Use automated doc generation where possible
🔧 Troubleshooting Guide
Issue: Tests Not Found
Symptoms:
$ uv run pytest
ERROR: file or directory not found: tests/
Cause: tests/ directory doesn't exist
Solution:
# Create tests structure
mkdir -p tests/unit tests/integration
# Add __init__.py files
touch tests/__init__.py
touch tests/unit/__init__.py
touch tests/integration/__init__.py
# Add conftest.py
touch tests/conftest.py
Issue: Plugin Not Loaded
Symptoms:
$ uv run pytest --trace-config
# superclaude not listed in plugins
Cause: Package not installed or entry point not configured
Solution:
# Reinstall in editable mode
uv pip install -e ".[dev]"
# Verify entry point in pyproject.toml
# Should have:
# [project.entry-points.pytest11]
# superclaude = "superclaude.pytest_plugin"
# Test plugin loaded
uv run pytest --trace-config 2>&1 | grep superclaude
Issue: ImportError in Tests
Symptoms:
ImportError: No module named 'superclaude'
Cause: Package not installed in test environment
Solution:
# Install package in editable mode
uv pip install -e .
# Or use uv run (creates venv automatically)
uv run pytest
Issue: Fixtures Not Available
Symptoms:
fixture 'confidence_checker' not found
Cause: pytest plugin not loaded or fixture not defined
Solution:
# Check plugin loaded
uv run pytest --fixtures | grep confidence_checker
# Verify pytest_plugin.py has fixture
# Should have:
# @pytest.fixture
# def confidence_checker():
# return ConfidenceChecker()
# Reinstall package
uv pip install -e .
Issue: .gitignore Not Working
Symptoms: Files listed in .gitignore still tracked by git
Cause: Files were tracked before adding to .gitignore
Solution:
# Remove from git but keep in filesystem
git rm --cached <file>
# OR remove entire directory
git rm -r --cached <directory>
# Commit the change
git commit -m "fix: remove tracked files from gitignore"
💡 Advanced Techniques
Technique 1: Dynamic Fixture Configuration
@pytest.fixture
def token_budget(request):
"""Fixture that adapts based on test markers"""
marker = request.node.get_closest_marker("complexity")
complexity = marker.args[0] if marker else "medium"
return TokenBudgetManager(complexity=complexity)
# Usage
@pytest.mark.complexity("simple")
def test_simple_feature(token_budget):
assert token_budget.limit == 200
Technique 2: Confidence-Driven Test Execution
def pytest_runtest_setup(item):
"""Skip tests if confidence is too low"""
marker = item.get_closest_marker("confidence_check")
if marker:
checker = ConfidenceChecker()
context = build_context(item)
confidence = checker.assess(context)
if confidence < 0.7:
pytest.skip(f"Confidence too low: {confidence:.0%}")
Technique 3: Reflexion-Powered Error Learning
def pytest_runtest_makereport(item, call):
"""Record failed tests for future learning"""
if call.when == "call" and call.excinfo is not None:
reflexion = ReflexionPattern()
error_info = {
"test_name": item.name,
"error_type": type(call.excinfo.value).__name__,
"error_message": str(call.excinfo.value),
}
reflexion.record_error(error_info)
📊 Performance Insights
Token Usage Patterns
Based on real usage data:
| Task Type | Typical Tokens | With PM Agent | Savings |
|---|---|---|---|
| Typo fix | 200-500 | 200-300 | 40% |
| Bug fix | 2,000-5,000 | 1,000-2,000 | 50% |
| Feature | 10,000-50,000 | 5,000-15,000 | 60% |
| Wrong direction | 50,000+ | 100-200 (prevented) | 99%+ |
Key insight: Prevention (confidence check) saves more tokens than optimization
Execution Time Patterns
| Operation | Sequential | Parallel | Speedup |
|---|---|---|---|
| 5 file reads | 15s | 3s | 5x |
| 10 file reads | 30s | 3s | 10x |
| 20 file edits | 60s | 15s | 4x |
| Mixed ops | 45s | 12s | 3.75x |
Key insight: Parallel execution has diminishing returns after ~10 operations per wave
🎓 Lessons Learned
Lesson 1: Documentation Drift is Real
What happened: README described v2.0 plugin system that didn't exist in v4.1.9
Impact: Users spent hours trying to install non-existent features
Solution:
- Add warnings about planned vs implemented features
- Review docs during every release
- Link to tracking issues for planned features
Prevention: Documentation review checklist in release process
Lesson 2: Version Management is Hard
What happened: Three different version numbers across files
Impact: Confusion about which version is installed
Solution:
- Define version sources of truth
- Document versioning strategy
- Automate version updates in release script
Prevention: Single-source-of-truth for versions (maybe use bumpversion)
Lesson 3: Tests Are Non-Negotiable
What happened: Framework provided testing tools but had no tests itself
Impact: No confidence in code quality, regression bugs
Solution:
- Create comprehensive test suite
- Require tests for all new code
- Add CI/CD to run tests automatically
Prevention: Make tests a requirement in PR template
🔮 Future Explorations
Ideas worth investigating:
- Automated confidence checking - AI analyzes context and suggests improvements
- Visual reflexion patterns - Graph view of error patterns over time
- Predictive token budgeting - ML model predicts token usage based on task
- Collaborative learning - Share reflexion patterns across projects (opt-in)
- Real-time hallucination detection - Streaming analysis during generation
📞 Getting Help
When stuck:
- Check this KNOWLEDGE.md for similar issues
- Read PLANNING.md for architecture context
- Check TASK.md for known issues
- Search GitHub issues for solutions
- Ask in GitHub discussions
When sharing knowledge:
- Document solution in this file
- Update relevant section
- Add to troubleshooting guide if applicable
- Consider adding to FAQ
🔌 Claude Code Integration Gap Analysis (March 2026)
Key Finding: SuperClaude Under-uses Claude Code's Extension Points
Claude Code provides 60+ built-in commands, 28 hook events, a full skills system, 5 settings scopes, agent teams, plan mode, extended thinking, and 60+ MCP servers in its registry. SuperClaude currently uses only a fraction of these.
Biggest Gaps (High Impact)
1. Skills System (CRITICAL)
- Claude Code skills support YAML frontmatter with
model,effort,allowed-tools,context: fork, auto-triggering viadescription, and argument substitution - SuperClaude has only 1 skill (confidence-check); 30 commands could be reimplemented as skills for better auto-triggering and tool restrictions
- Action: Migrate key commands to skills format in v4.3+
2. Hooks System (HIGH)
- Claude Code has 28 hook events (
SessionStart,Stop,PostToolUse,TaskCompleted,SubagentStop,PreCompact, etc.) - SuperClaude defines hooks but doesn't leverage most events
- Action: Use
SessionStartfor PM Agent auto-restore,Stopfor session persistence,PostToolUsefor self-check,TaskCompletedfor reflexion
3. Plan Mode Integration (MEDIUM)
- Claude Code's plan mode provides read-only exploration with visual markdown plans
- SuperClaude's confidence checks could block transition from plan to implementation when confidence < 70%
- Action: Connect confidence checker to plan mode exit gate
4. Settings Profiles (MEDIUM)
- Claude Code has 5 settings scopes with granular permission rules (
Bash(pattern),Edit(path),mcp__server__tool) - SuperClaude could provide recommended settings profiles per workflow (strict security, autonomous dev, research)
- Action: Create
.claude/settings.jsontemplates for common workflows
What's Working Well
- Commands (30): Well-integrated as custom commands in
~/.claude/commands/sc/ - Agents (20): Properly installed to
~/.claude/agents/as subagents - MCP Servers (8+): Good coverage of common tools, AIRIS gateway unifies them
- Pytest Plugin: Clean auto-loading, good fixture/marker system
- Behavioral Modes (7): Effective context injection even without native support
Reference
See docs/user-guide/claude-code-integration.md for the complete feature mapping and gap analysis.
This document grows with the project. Everyone who encounters a problem and finds a solution should document it here.
Contributors: SuperClaude development team and community Maintained by: Project maintainers Review frequency: Quarterly or after major insights