diff --git a/wiki/session-V112-infra-health-report.md b/wiki/session-V112-infra-health-report.md new file mode 100644 index 000000000..3f97d725f --- /dev/null +++ b/wiki/session-V112-infra-health-report.md @@ -0,0 +1,116 @@ +# V112 - infra_health_report consolidated single-query agent - 2026-04-21 + +## Objectif +Créer un agent Orchestrator qui agrège EN UNE SEULE QUERY toute la santé +infra de WEVIA (FPM, tokens, orphans, V83 KPIs, L99, Docker, load, disk). + +UX premium (doctrine 60) : Yacine peut demander `"bilan infra complet"` +et voir tout en 8 lignes. + +## Sources agrégées + +| Source | Version d'origine | Info | +|---|---|---| +| FPM | V110 `fpm-monitor.sh` | load/workers/mem/connections | +| Tokens | V111 `token-health-monitor.sh` | 11 providers, expired_list | +| Orphans | V108 `architecture_quality.orphans_count` | LIVE value/status | +| V83 summary | V100 category catalog | kpis/ok/warn/fail/complete | +| L99 NonReg | `nonreg-latest.json` | pass/score/ts | +| Docker | `docker ps` | containers_running | +| Load | `/proc/loadavg` | 1/5/15 min | +| Disk | `df -h /` | used/avail/pct | +| Uptime | `uptime -p` | human readable | + +## Architecture V112 + +### Script: `/var/www/html/api/scripts/infra-health-report.sh` (1490 bytes) +Appelle les scripts V110 + V111 + curl V83 + jq parse L99 + shell primitives. + +Exécution standalone (~3 secondes): +``` +=== WEVIA Infra Health Report 2026-04-21T10:34:06 === + +[FPM] load=1.68 3.40 3.56 fpm_workers=73/150 mem_used=38% connections=181 +[TOKENS] providers=11 ok=7 expired=4 health=63% expired_list=sambanova,groq,alibaba,github +[ORPHANS] orphans=0 status=ok +[V83] kpis=64 ok=39 warn=25 fail=0 complete=100% +[L99] pass=153/153 score=100 ts=20260421_102743 +[DOCKER] containers_running=19 +[LOAD] 1.65 3.34 3.54 +[DISK] used=116G avail=29G pct=81% +``` + +### Agent Orchestrator +**Fichier**: `/var/www/html/api/wevia-autonomous.php` +**Position**: après `token_health` (V111), avant `screens_s204` + +```php +"infra_health_report" => [ + "cmd"=>"bash /var/www/html/api/scripts/infra-health-report.sh 2>/dev/null", + "keywords"=>["infra","health","report","bilan infra","sante","global status"], + "timeout"=>45 +], +``` + +Note: Pas `default=>true` car scripts V110+V111 déjà dans plan systématique. +C'est un agent ON-DEMAND via keywords pour consolidation exhaustive. + +## Validation live + +Query `"multiagent bilan infra health complet tout"` → +``` +### plan +15 agents: reconcile, providers, wiki, nonreg, ethica, docker, disk, git, + ports, load, architecture_quality, fpm_monitor, token_health, + infra_health_report, machines_all +``` + +**14 → 15 agents** dans le plan multi-agent. + +## L99 NonReg +``` +153/153 PASS | 0 FAIL | 100% | 56.1s +TS: 20260421_103613 +``` + +## Finding supplémentaire + +Pendant le test: **sambanova a aussi passé EXPIRED** (probe multiple en +succession rapide → rate limit burst côté SambaNova). +Liste expired: sambanova, groq, alibaba, github (4/11, 63% health) + +Hypothèse: notre probe backend faisait trop de requêtes successives. +V113+ idea: rate-limit notre propre probe (max 1 appel / 5 min par provider). + +## Chain V96→V112 + +| Version | Commit | Sujet | +|---|---|---| +| V96-V108 | cd86b19f9 | Orphans Rescue + ZERO ORPHANS | +| V110 | ede9a5197 | fpm_monitor | +| V111 | 5e98086e7 | token_health | +| **V112** | TBD | **infra_health_report consolidé** | + +## Doctrines appliquées +- Doctrine 0: Root cause visibilité consolidée (pas just FPM OU tokens isolés) +- Doctrine 2: Zero écrasement (agent additif) +- Doctrine 3: Zero suppression +- Doctrine 4: Zero régression (L99 stable) +- Doctrine 14: Test-driven (standalone + multi-agent live) +- Doctrine 16: Script externe (pattern V110/V111 éprouvé) +- Doctrine 54: chattr unlock/lock +- Doctrine 60: UX premium (ONE query = full health) +- Doctrine 95: Traçabilité wiki + vault +- Doctrine 100: Train commit release + +## Autres Claudes récents +- V9.63 CrowdSec self-ban fix (502 resolved) +- ab78c3a0d playwright-selenium-intents 34/38 triggers match +- 524c25690 create_tool intent promoted (proposal 08:15 → 10:30) + +## Next V113+ pending +- [ ] Rate-limit notre probe token-health (1 appel / 5 min) +- [ ] V86 Auth Guard E2E test Playwright +- [ ] CloudFlare rate-limit monitor +- [ ] NPS validation Yacine (toujours en attente) +- [ ] token-apply.sh creation (queue → secrets.env, need Yacine auth)