yanis/html

Files

WEVAL NonReg / nonreg (push) Has been cancelled

Details

V112 infra_health_report - consolidated single-query infra health view

Agregation UNIQUE query: FPM + tokens + orphans + V83 + L99 + docker + load + disk

Plan: 14 -> 15 agents (via keywords match).

Components:
1. New /var/www/html/api/scripts/infra-health-report.sh (1490 bytes)
   Aggregate sources V110 fpm-monitor + V111 token-health + V108 orphans + L99 + docker
2. Agent infra_health_report in __orch_registry
   keywords: infra health report bilan sante global status
   timeout 45s
   (not default=true; on-demand via keywords, FPM+tokens already in default plan)

Live standalone output (~3s):
[FPM]     load=1.68 3.40 3.56 fpm_workers=73/150 mem=38pct conn=181
[TOKENS]  providers=11 ok=7 expired=4 health=63pct expired_list=sambanova groq alibaba github
[ORPHANS] orphans=0 status=ok
[V83]     kpis=64 ok=39 warn=25 fail=0 complete=100pct
[L99]     pass=153/153 score=100 ts=20260421_102743
[DOCKER]  containers_running=19
[LOAD]    1.65 3.34 3.54
[DISK]    used=116G avail=29G pct=81pct

Multiagent Plan: 15 agents (was 14) include infra_health_report.

Finding side: sambanova aussi EXPIRED pendant multi-probe (was OK 5min avant)
  -> rate-limit burst cote provider sur notre probe frequent
  -> V113 idea: rate-limit notre propre probe 1 per 5min par provider

L99 NonReg V112: 153/153 PASS 0 FAIL 100pct 56.1s TS 20260421_103613

Chain V96-V112:
V96 fake, V97 dormant, V98 submodule, V99 kpi, V100 V83, V101 intent,
V102 orch, V103 retry-429, V104 E2E, V105 enrich, V106 full_report,
V107 audit, V108 ZERO ORPHANS, V110 fpm_monitor, V111 token_health,
V112 infra_health_report CONSOLIDATED

Synchro autres Claudes:
- V9.63 678ab0975 CrowdSec self-ban fix
- ab78c3a0d playwright intents 34/38 triggers
- 524c25690 create_tool intent promoted

Zero suppression zero hardcode zero regression zero ecrasement zero fake
Doctrines 0+2+3+4+14+16+54+60+95+100 applied

2026-04-21 10:38:46 +02:00

4.0 KiB

Raw Permalink Blame History

V112 - infra_health_report consolidated single-query agent - 2026-04-21

Objectif

Créer un agent Orchestrator qui agrège EN UNE SEULE QUERY toute la santé infra de WEVIA (FPM, tokens, orphans, V83 KPIs, L99, Docker, load, disk).

UX premium (doctrine 60) : Yacine peut demander "bilan infra complet" et voir tout en 8 lignes.

Sources agrégées

Source	Version d'origine	Info
FPM	V110 `fpm-monitor.sh`	load/workers/mem/connections
Tokens	V111 `token-health-monitor.sh`	11 providers, expired_list
Orphans	V108 `architecture_quality.orphans_count`	LIVE value/status
V83 summary	V100 category catalog	kpis/ok/warn/fail/complete
L99 NonReg	`nonreg-latest.json`	pass/score/ts
Docker	`docker ps`	containers_running
Load	`/proc/loadavg`	1/5/15 min
Disk	`df -h /`	used/avail/pct
Uptime	`uptime -p`	human readable

Architecture V112

Script: `/var/www/html/api/scripts/infra-health-report.sh` (1490 bytes)

Appelle les scripts V110 + V111 + curl V83 + jq parse L99 + shell primitives.

Exécution standalone (~3 secondes):

=== WEVIA Infra Health Report 2026-04-21T10:34:06 ===

[FPM]     load=1.68 3.40 3.56 fpm_workers=73/150 mem_used=38% connections=181
[TOKENS]  providers=11 ok=7 expired=4 health=63% expired_list=sambanova,groq,alibaba,github
[ORPHANS] orphans=0 status=ok
[V83]     kpis=64 ok=39 warn=25 fail=0 complete=100%
[L99]     pass=153/153 score=100 ts=20260421_102743
[DOCKER]  containers_running=19
[LOAD]    1.65 3.34 3.54
[DISK]    used=116G avail=29G pct=81%

Agent Orchestrator

Fichier: /var/www/html/api/wevia-autonomous.php Position: après token_health (V111), avant screens_s204

"infra_health_report" => [
  "cmd"=>"bash /var/www/html/api/scripts/infra-health-report.sh 2>/dev/null",
  "keywords"=>["infra","health","report","bilan infra","sante","global status"],
  "timeout"=>45
],

Note: Pas default=>true car scripts V110+V111 déjà dans plan systématique. C'est un agent ON-DEMAND via keywords pour consolidation exhaustive.

Validation live

Query "multiagent bilan infra health complet tout" →

### plan
15 agents: reconcile, providers, wiki, nonreg, ethica, docker, disk, git, 
           ports, load, architecture_quality, fpm_monitor, token_health, 
           infra_health_report, machines_all

14 → 15 agents dans le plan multi-agent.

L99 NonReg

153/153 PASS | 0 FAIL | 100% | 56.1s
TS: 20260421_103613

Finding supplémentaire

Pendant le test: sambanova a aussi passé EXPIRED (probe multiple en succession rapide → rate limit burst côté SambaNova). Liste expired: sambanova, groq, alibaba, github (4/11, 63% health)

Hypothèse: notre probe backend faisait trop de requêtes successives. V113+ idea: rate-limit notre propre probe (max 1 appel / 5 min par provider).

Chain V96→V112

Version	Commit	Sujet
V96-V108	`cd86b19f9`	Orphans Rescue + ZERO ORPHANS
V110	`ede9a5197`	fpm_monitor
V111	`5e98086e7`	token_health
V112	TBD	infra_health_report consolidé

Doctrines appliquées

Doctrine 0: Root cause visibilité consolidée (pas just FPM OU tokens isolés)
Doctrine 2: Zero écrasement (agent additif)
Doctrine 3: Zero suppression
Doctrine 4: Zero régression (L99 stable)
Doctrine 14: Test-driven (standalone + multi-agent live)
Doctrine 16: Script externe (pattern V110/V111 éprouvé)
Doctrine 54: chattr unlock/lock
Doctrine 60: UX premium (ONE query = full health)
Doctrine 95: Traçabilité wiki + vault
Doctrine 100: Train commit release

Autres Claudes récents

V9.63 CrowdSec self-ban fix (502 resolved)
ab78c3a0d playwright-selenium-intents 34/38 triggers match
524c25690 create_tool intent promoted (proposal 08:15 → 10:30)

Next V113+ pending

Rate-limit notre probe token-health (1 appel / 5 min)
V86 Auth Guard E2E test Playwright
CloudFlare rate-limit monitor
NPS validation Yacine (toujours en attente)
token-apply.sh creation (queue → secrets.env, need Yacine auth)

4.0 KiB Raw Permalink Blame History