V112 infra_health_report - consolidated single-query infra health view
Some checks failed
WEVAL NonReg / nonreg (push) Has been cancelled
Some checks failed
WEVAL NonReg / nonreg (push) Has been cancelled
Agregation UNIQUE query: FPM + tokens + orphans + V83 + L99 + docker + load + disk Plan: 14 -> 15 agents (via keywords match). Components: 1. New /var/www/html/api/scripts/infra-health-report.sh (1490 bytes) Aggregate sources V110 fpm-monitor + V111 token-health + V108 orphans + L99 + docker 2. Agent infra_health_report in __orch_registry keywords: infra health report bilan sante global status timeout 45s (not default=true; on-demand via keywords, FPM+tokens already in default plan) Live standalone output (~3s): [FPM] load=1.68 3.40 3.56 fpm_workers=73/150 mem=38pct conn=181 [TOKENS] providers=11 ok=7 expired=4 health=63pct expired_list=sambanova groq alibaba github [ORPHANS] orphans=0 status=ok [V83] kpis=64 ok=39 warn=25 fail=0 complete=100pct [L99] pass=153/153 score=100 ts=20260421_102743 [DOCKER] containers_running=19 [LOAD] 1.65 3.34 3.54 [DISK] used=116G avail=29G pct=81pct Multiagent Plan: 15 agents (was 14) include infra_health_report. Finding side: sambanova aussi EXPIRED pendant multi-probe (was OK 5min avant) -> rate-limit burst cote provider sur notre probe frequent -> V113 idea: rate-limit notre propre probe 1 per 5min par provider L99 NonReg V112: 153/153 PASS 0 FAIL 100pct 56.1s TS 20260421_103613 Chain V96-V112: V96 fake, V97 dormant, V98 submodule, V99 kpi, V100 V83, V101 intent, V102 orch, V103 retry-429, V104 E2E, V105 enrich, V106 full_report, V107 audit, V108 ZERO ORPHANS, V110 fpm_monitor, V111 token_health, V112 infra_health_report CONSOLIDATED Synchro autres Claudes: - V9.63678ab0975CrowdSec self-ban fix -ab78c3a0dplaywright intents 34/38 triggers -524c25690create_tool intent promoted Zero suppression zero hardcode zero regression zero ecrasement zero fake Doctrines 0+2+3+4+14+16+54+60+95+100 applied
This commit is contained in:
116
wiki/session-V112-infra-health-report.md
Normal file
116
wiki/session-V112-infra-health-report.md
Normal file
@@ -0,0 +1,116 @@
|
||||
# V112 - infra_health_report consolidated single-query agent - 2026-04-21
|
||||
|
||||
## Objectif
|
||||
Créer un agent Orchestrator qui agrège EN UNE SEULE QUERY toute la santé
|
||||
infra de WEVIA (FPM, tokens, orphans, V83 KPIs, L99, Docker, load, disk).
|
||||
|
||||
UX premium (doctrine 60) : Yacine peut demander `"bilan infra complet"`
|
||||
et voir tout en 8 lignes.
|
||||
|
||||
## Sources agrégées
|
||||
|
||||
| Source | Version d'origine | Info |
|
||||
|---|---|---|
|
||||
| FPM | V110 `fpm-monitor.sh` | load/workers/mem/connections |
|
||||
| Tokens | V111 `token-health-monitor.sh` | 11 providers, expired_list |
|
||||
| Orphans | V108 `architecture_quality.orphans_count` | LIVE value/status |
|
||||
| V83 summary | V100 category catalog | kpis/ok/warn/fail/complete |
|
||||
| L99 NonReg | `nonreg-latest.json` | pass/score/ts |
|
||||
| Docker | `docker ps` | containers_running |
|
||||
| Load | `/proc/loadavg` | 1/5/15 min |
|
||||
| Disk | `df -h /` | used/avail/pct |
|
||||
| Uptime | `uptime -p` | human readable |
|
||||
|
||||
## Architecture V112
|
||||
|
||||
### Script: `/var/www/html/api/scripts/infra-health-report.sh` (1490 bytes)
|
||||
Appelle les scripts V110 + V111 + curl V83 + jq parse L99 + shell primitives.
|
||||
|
||||
Exécution standalone (~3 secondes):
|
||||
```
|
||||
=== WEVIA Infra Health Report 2026-04-21T10:34:06 ===
|
||||
|
||||
[FPM] load=1.68 3.40 3.56 fpm_workers=73/150 mem_used=38% connections=181
|
||||
[TOKENS] providers=11 ok=7 expired=4 health=63% expired_list=sambanova,groq,alibaba,github
|
||||
[ORPHANS] orphans=0 status=ok
|
||||
[V83] kpis=64 ok=39 warn=25 fail=0 complete=100%
|
||||
[L99] pass=153/153 score=100 ts=20260421_102743
|
||||
[DOCKER] containers_running=19
|
||||
[LOAD] 1.65 3.34 3.54
|
||||
[DISK] used=116G avail=29G pct=81%
|
||||
```
|
||||
|
||||
### Agent Orchestrator
|
||||
**Fichier**: `/var/www/html/api/wevia-autonomous.php`
|
||||
**Position**: après `token_health` (V111), avant `screens_s204`
|
||||
|
||||
```php
|
||||
"infra_health_report" => [
|
||||
"cmd"=>"bash /var/www/html/api/scripts/infra-health-report.sh 2>/dev/null",
|
||||
"keywords"=>["infra","health","report","bilan infra","sante","global status"],
|
||||
"timeout"=>45
|
||||
],
|
||||
```
|
||||
|
||||
Note: Pas `default=>true` car scripts V110+V111 déjà dans plan systématique.
|
||||
C'est un agent ON-DEMAND via keywords pour consolidation exhaustive.
|
||||
|
||||
## Validation live
|
||||
|
||||
Query `"multiagent bilan infra health complet tout"` →
|
||||
```
|
||||
### plan
|
||||
15 agents: reconcile, providers, wiki, nonreg, ethica, docker, disk, git,
|
||||
ports, load, architecture_quality, fpm_monitor, token_health,
|
||||
infra_health_report, machines_all
|
||||
```
|
||||
|
||||
**14 → 15 agents** dans le plan multi-agent.
|
||||
|
||||
## L99 NonReg
|
||||
```
|
||||
153/153 PASS | 0 FAIL | 100% | 56.1s
|
||||
TS: 20260421_103613
|
||||
```
|
||||
|
||||
## Finding supplémentaire
|
||||
|
||||
Pendant le test: **sambanova a aussi passé EXPIRED** (probe multiple en
|
||||
succession rapide → rate limit burst côté SambaNova).
|
||||
Liste expired: sambanova, groq, alibaba, github (4/11, 63% health)
|
||||
|
||||
Hypothèse: notre probe backend faisait trop de requêtes successives.
|
||||
V113+ idea: rate-limit notre propre probe (max 1 appel / 5 min par provider).
|
||||
|
||||
## Chain V96→V112
|
||||
|
||||
| Version | Commit | Sujet |
|
||||
|---|---|---|
|
||||
| V96-V108 | cd86b19f9 | Orphans Rescue + ZERO ORPHANS |
|
||||
| V110 | ede9a5197 | fpm_monitor |
|
||||
| V111 | 5e98086e7 | token_health |
|
||||
| **V112** | TBD | **infra_health_report consolidé** |
|
||||
|
||||
## Doctrines appliquées
|
||||
- Doctrine 0: Root cause visibilité consolidée (pas just FPM OU tokens isolés)
|
||||
- Doctrine 2: Zero écrasement (agent additif)
|
||||
- Doctrine 3: Zero suppression
|
||||
- Doctrine 4: Zero régression (L99 stable)
|
||||
- Doctrine 14: Test-driven (standalone + multi-agent live)
|
||||
- Doctrine 16: Script externe (pattern V110/V111 éprouvé)
|
||||
- Doctrine 54: chattr unlock/lock
|
||||
- Doctrine 60: UX premium (ONE query = full health)
|
||||
- Doctrine 95: Traçabilité wiki + vault
|
||||
- Doctrine 100: Train commit release
|
||||
|
||||
## Autres Claudes récents
|
||||
- V9.63 CrowdSec self-ban fix (502 resolved)
|
||||
- ab78c3a0d playwright-selenium-intents 34/38 triggers match
|
||||
- 524c25690 create_tool intent promoted (proposal 08:15 → 10:30)
|
||||
|
||||
## Next V113+ pending
|
||||
- [ ] Rate-limit notre probe token-health (1 appel / 5 min)
|
||||
- [ ] V86 Auth Guard E2E test Playwright
|
||||
- [ ] CloudFlare rate-limit monitor
|
||||
- [ ] NPS validation Yacine (toujours en attente)
|
||||
- [ ] token-apply.sh creation (queue → secrets.env, need Yacine auth)
|
||||
Reference in New Issue
Block a user