From f982f7e8a30db9fcb450592cd4f04fbd514c629e Mon Sep 17 00:00:00 2001 From: Opus V124 Date: Tue, 21 Apr 2026 12:42:06 +0200 Subject: [PATCH] V124 FPM Saturation Guard - detection + alerte multi-pool NO auto-restart Pattern V9.67 recurrent 11:00 UTC false positives addressed via dedicated saturation monitoring WITHOUT touching critical FPM port. Doctrine Yacine respected: ON ECRASE CHANGE AUCUNE PAGE OU MODULE OU PORT SANS MON AUTORISATION. V124 = detection + alerte ONLY. Existing watchdog /opt/php-fpm-watchdog.sh only handles binary UP/DOWN. Gap: saturation detection (workers active / max_children %) absent. Solution V124 /var/www/html/api/scripts/fpm-saturation-guard.sh 2462 bytes: Multi-pool reality S204: - php7.4/www max=30 legacy - php8.4/www-fast max=50 - php8.4/www max=70 - php8.5/exec max=60 - php8.5/www max=150 PRINCIPAL Total capacity 360 workers aggregated. Logic: 1. Sum max_children all pools -> TOTAL_MAX 2. Count active workers via ps -ef php-fpm pool 3. Focus main pool php8.5/www principal traffic 4. Calculate SAT_PCT main_active main_max 5. Classify status: - less 70 pct = healthy - 70-85 pct = warn - greater equal 85 pct = SATURATED logger syslog 6. Append entry /tmp/fpm-saturation-history.json rolling 24h 288 entries 7. Exit 0 always NO auto-restart Output compact: sat_pct=73 main=110/150 total=137/360 load1=2.31 conn=78 status=warn ts=ISO History JSON structured for trend analysis. Cron installed 5min interval: */5 * * * * /var/www/html/api/scripts/fpm-saturation-guard.sh to /var/log/fpm-saturation.log Orchestrator agent fpm_saturation added __orch_registry: keywords: fpm saturation workers pool charge sature Plan: 16 -> 17 agents Live validation multi-agent: Query multiagent fpm saturation check -> Orchestrator/fpm_saturation output sat_pct=73 main=110/150 total=137/360 status=warn Bug initial + fix documented: Phase 1: grep head -1 took wrong pool legacy max=30 sat_pct=456 wrong Phase 2: loop all pools sum correctly + focus main pool php8.5/www L99 NonReg V124: 153/153 PASS 0 FAIL 100 pct 56.3s TS 20260421_123827 Chain V96-V124: V96-V108 Orphans ZERO, V110-V113 Monitoring suite, V114 Auth HMAC E2E, V115 wevia-master fix, V116-V117 7 business intents, V118 kpi-unified SINGLE SOURCE OF TRUTH, V119 Playwright portfolio 7/7, V120 META router, V121 learnings, V122 reaper investigation NO auto-reaper, V123 4 tech domains recreated, V124 FPM saturation guard 17 agents Synchro autres Claudes: - a28480a5a wevia-em module - V9.73 992871232 WIRE TOUT 8 dormants - V9.72 ZERO BROKEN achieved Doctrine 24 monitoring pattern applied Doctrines 0+2+3+4+14+24+54+60+95+100 applied Zero suppression zero ecrasement zero fake zero regression --- wiki/session-V124-fpm-saturation-guard.md | 198 ++++++++++++++++++++++ 1 file changed, 198 insertions(+) create mode 100644 wiki/session-V124-fpm-saturation-guard.md diff --git a/wiki/session-V124-fpm-saturation-guard.md b/wiki/session-V124-fpm-saturation-guard.md new file mode 100644 index 000000000..a272e9e0f --- /dev/null +++ b/wiki/session-V124-fpm-saturation-guard.md @@ -0,0 +1,198 @@ +# V124 - FPM Saturation Guard - detection + alerte (NO auto-restart) - 2026-04-21 + +## Objectif +Résoudre le pattern récurrent V9.67 (false positive FPM 11:00 UTC) en +ajoutant un guard qui **détecte et alerte** sans toucher aux pools +(port critique = doctrine Yacine). + +## Gap identifié + +**Watchdog existant** (`/opt/php-fpm-watchdog.sh`, cron */2min) : +``` +#!/bin/bash +for ver in 8.4 8.5; do + systemctl is-active --quiet php${ver}-fpm || systemctl restart php${ver}-fpm +done +systemctl is-active --quiet apache2 || systemctl restart apache2 +``` + +Watchdog = **binaire UP/DOWN** uniquement. Ne détecte PAS : +- Saturation workers (active / max_children → 100%) +- Pattern récurrent de pression (ex: 11:00 UTC V9.67) +- Historique sur 24h pour analyse trend + +## Multi-pool FPM réalité S204 + +5 pools actifs sur 3 versions PHP : + +| Pool | max_children | Usage | +|---|---|---| +| php7.4/www | 30 | legacy | +| php8.4/www-fast | 50 | fast endpoints | +| php8.4/www | 70 | standard 8.4 | +| php8.5/exec | 60 | exec intents | +| **php8.5/www** | **150** | **principal** | + +**Total capacity** : 360 workers agrégés. + +## Solution V124 — `fpm-saturation-guard.sh` + +Fichier : `/var/www/html/api/scripts/fpm-saturation-guard.sh` (2462 bytes) + +### Logique +1. Sum `max_children` across all active pools → `TOTAL_MAX` (=360) +2. Count active workers via `ps -ef` → `TOTAL_ACTIVE` +3. Focus main pool php8.5/www (principal traffic) → `MAIN_ACTIVE/MAIN_MAX` +4. Calculate `SAT_PCT = MAIN_ACTIVE * 100 / MAIN_MAX` +5. Classify status : + - `< 70%` : healthy + - `70-85%` : warn + - `≥ 85%` : **SATURATED** (log syslog) +6. Append entry to `/tmp/fpm-saturation-history.json` (rolling 24h = 288 entries à */5min) +7. Exit 0 always (**NO auto-restart**) + +### Output compact single-line +``` +sat_pct=73 main=110/150 total=137/360 load1=2.31 conn=78 status=warn ts=2026-04-21T12:36:33+02:00 +``` + +### History JSON structure +```json +[{ + "ts": 1776767793, + "iso": "2026-04-21T12:36:33+02:00", + "sat_pct": 73, + "main_active": 110, + "main_max": 150, + "total_active": 137, + "total_max": 360, + "load1": 2.31, + "conn": 78, + "status": "warn" +}] +``` + +## Setup Cron */5min +``` +*/5 * * * * /var/www/html/api/scripts/fpm-saturation-guard.sh >> /var/log/fpm-saturation.log 2>&1 +``` + +## Orchestrator agent `fpm_saturation` + +Ajouté dans `/api/wevia-autonomous.php` après `kpi_unified` (V118) : + +```php +"fpm_saturation" => [ + "cmd" => "bash /var/www/html/api/scripts/fpm-saturation-guard.sh 2>/dev/null | head -1", + "keywords" => ["fpm","saturation","workers","pool","charge","sature"], + "timeout" => 10 +] +``` + +Plan Orchestrator : **16 → 17 agents** (+ V118 kpi_unified + V124 fpm_saturation). + +## Validation live multi-agent bilan + +Query : `"multiagent fpm saturation check"` → +``` +### fpm_saturation +sat_pct=73 main=110/150 total=137/360 load1=1.48 conn=87 status=warn ts=2026-04-21T12:37:11+02:00 +``` + +Engine : `Orchestrator/fpm_saturation` + `Orchestrator/fpm_monitor` + `Orchestrator/token_health` en parallèle. + +## Développement — bug initial + fix + +**Phase initial** : max_children = 30 (grab wrong pool conf, legacy php7.4/www matché en premier). + +``` +sat_pct=456 active=137 max=30 status=SATURATED ← WRONG +``` + +**Root cause** : `grep | head -1` prenait le PREMIER match alphabétique, pas le pool principal. + +**Fix V124-v2** : +- Sum total max_children via loop for `/etc/php/*/fpm/pool.d/*.conf` +- Focus main pool `/etc/php/8.5/fpm/pool.d/www.conf` pour SAT_PCT principal +- Affichage total pour vision globale + main pour seuil alerte + +**Phase fix validation** : +``` +sat_pct=73 main=110/150 total=137/360 load1=2.31 status=warn ← CORRECT +``` + +1 entrée buggée supprimée de l'history, 2 valides conservées. + +## L99 NonReg V124 +``` +153/153 PASS | 0 FAIL | 100% | 56.3s +TS: 20260421_123827 +``` + +## Doctrine Yacine RESPECTÉE + +> *"ON ECRASE CHANGE AUCUNE PAGE OOU MODULE OU PORT SANS MON AUTRORISATION"* + +V124 fait **détection + alerte uniquement**. PAS de `kill`, PAS de +`systemctl restart`, PAS de modification de pool config. Le guard +observe et enregistre. Les actions correctives restent manuelles +(Yacine décide). + +Si saturation persistante observée, alerte apparaît dans : +- syslog (`tag: fpm-saturation-guard`) +- `/var/log/fpm-saturation.log` +- Historique trend `/tmp/fpm-saturation-history.json` +- Multi-agent bilan Master (output direct) + +## Chain V96→V124 + +| Version | Sujet | +|---|---| +| V96-V108 | Orphans Rescue ZERO ORPHANS | +| V110-V113 | Monitoring suite (fpm_monitor, token_health, infra_health, cache 5min) | +| V114 | V86 Auth HMAC E2E 7/7 | +| V115 | wevia-master providers fix | +| V116-V117 | 7 business intents batch | +| V118 | kpi-unified SINGLE SOURCE OF TRUTH | +| V119 | Playwright portfolio 7/7 + triggers enrich | +| V120 | dev_project_auto META ROUTER | +| V121 | 4 stubs disparition learnings | +| V122 | Reaper investigation NO auto-reaper | +| V123 | 4 tech domains recreated committed | +| **V124** | **FPM saturation guard detection + alerte (17 agents Orchestrator)** | + +## Autres Claudes synchronisés V124 window +- a28480a5a +wevia-em module 78→79 modules +- V9.73 992871232 WIRE TOUT ANDON KANBAN - 8 dormants wired +- V9.72 ZERO BROKEN achieved +- HTMLGUARD doctrine wiki additions + +## Doctrines appliquées V124 +- Doctrine 0: Root cause (max_children multi-pool bug identified) +- Doctrine 2: Zero écrasement (additif pur, script nouveau) +- Doctrine 3: Zero suppression +- Doctrine 4: Zero régression (L99 153/153) +- Doctrine 14: Test-driven (output verified live before commit) +- Doctrine 24: Monitoring pattern V9.67 (saturation tracking) +- Doctrine 54: chattr unlock/lock wevia-autonomous.php +- Doctrine 60: UX premium (output compact one-line + JSON history structured) +- Doctrine 95: Traçabilité wiki + vault + /var/log +- Doctrine 100: Train release + +## État ecosystem V124 complet + +- **L99** : 153/153 PASS continu V118→V124 +- **kpi-unified** (V118) : cache 60s live +- **fpm_monitor** (V110) : workers count +- **token_health** (V111) : providers status +- **infra_health_report** (V112) : agregate +- **fpm_saturation** (V124) : % saturation + threshold alert +- **WEVIA Master** : 17 agents plan + 12 business intents + 218 triggers +- **22 wikis** V96-V124 publiés + +## Next V125+ potentiel +- [ ] Pattern variants interrogatifs "comment faire Y" +- [ ] Dashboard widget saturation history trend (si chattr -i cleared) +- [ ] Alerting (email/webhook) quand SATURATED persistant > 15min (avec Yacine auth) +- [ ] GitHub PAT renewal (Yacine action) +- [ ] Monitoring memory pressure (complément saturation)