Files
html/wiki/session-V124-fpm-saturation-guard.md
Opus V124 f982f7e8a3
Some checks failed
WEVAL NonReg / nonreg (push) Has been cancelled
V124 FPM Saturation Guard - detection + alerte multi-pool NO auto-restart
Pattern V9.67 recurrent 11:00 UTC false positives addressed via dedicated
saturation monitoring WITHOUT touching critical FPM port.

Doctrine Yacine respected: ON ECRASE CHANGE AUCUNE PAGE OU MODULE OU PORT
SANS MON AUTORISATION. V124 = detection + alerte ONLY.

Existing watchdog /opt/php-fpm-watchdog.sh only handles binary UP/DOWN.
Gap: saturation detection (workers active / max_children %) absent.

Solution V124 /var/www/html/api/scripts/fpm-saturation-guard.sh 2462 bytes:

Multi-pool reality S204:
- php7.4/www max=30 legacy
- php8.4/www-fast max=50
- php8.4/www max=70
- php8.5/exec max=60
- php8.5/www max=150 PRINCIPAL

Total capacity 360 workers aggregated.

Logic:
1. Sum max_children all pools -> TOTAL_MAX
2. Count active workers via ps -ef php-fpm pool
3. Focus main pool php8.5/www principal traffic
4. Calculate SAT_PCT main_active main_max
5. Classify status:
   - less 70 pct = healthy
   - 70-85 pct = warn
   - greater equal 85 pct = SATURATED logger syslog
6. Append entry /tmp/fpm-saturation-history.json rolling 24h 288 entries
7. Exit 0 always NO auto-restart

Output compact:
sat_pct=73 main=110/150 total=137/360 load1=2.31 conn=78 status=warn ts=ISO

History JSON structured for trend analysis.

Cron installed 5min interval:
*/5 * * * * /var/www/html/api/scripts/fpm-saturation-guard.sh to /var/log/fpm-saturation.log

Orchestrator agent fpm_saturation added __orch_registry:
keywords: fpm saturation workers pool charge sature
Plan: 16 -> 17 agents

Live validation multi-agent:
Query multiagent fpm saturation check
-> Orchestrator/fpm_saturation output sat_pct=73 main=110/150 total=137/360 status=warn

Bug initial + fix documented:
Phase 1: grep head -1 took wrong pool legacy max=30 sat_pct=456 wrong
Phase 2: loop all pools sum correctly + focus main pool php8.5/www

L99 NonReg V124: 153/153 PASS 0 FAIL 100 pct 56.3s TS 20260421_123827

Chain V96-V124:
V96-V108 Orphans ZERO,
V110-V113 Monitoring suite,
V114 Auth HMAC E2E,
V115 wevia-master fix,
V116-V117 7 business intents,
V118 kpi-unified SINGLE SOURCE OF TRUTH,
V119 Playwright portfolio 7/7,
V120 META router,
V121 learnings,
V122 reaper investigation NO auto-reaper,
V123 4 tech domains recreated,
V124 FPM saturation guard 17 agents

Synchro autres Claudes:
- a28480a5a wevia-em module
- V9.73 992871232 WIRE TOUT 8 dormants
- V9.72 ZERO BROKEN achieved

Doctrine 24 monitoring pattern applied
Doctrines 0+2+3+4+14+24+54+60+95+100 applied
Zero suppression zero ecrasement zero fake zero regression
2026-04-21 12:42:06 +02:00

6.4 KiB

V124 - FPM Saturation Guard - detection + alerte (NO auto-restart) - 2026-04-21

Objectif

Résoudre le pattern récurrent V9.67 (false positive FPM 11:00 UTC) en ajoutant un guard qui détecte et alerte sans toucher aux pools (port critique = doctrine Yacine).

Gap identifié

Watchdog existant (/opt/php-fpm-watchdog.sh, cron */2min) :

#!/bin/bash
for ver in 8.4 8.5; do
  systemctl is-active --quiet php${ver}-fpm || systemctl restart php${ver}-fpm
done
systemctl is-active --quiet apache2 || systemctl restart apache2

Watchdog = binaire UP/DOWN uniquement. Ne détecte PAS :

  • Saturation workers (active / max_children → 100%)
  • Pattern récurrent de pression (ex: 11:00 UTC V9.67)
  • Historique sur 24h pour analyse trend

Multi-pool FPM réalité S204

5 pools actifs sur 3 versions PHP :

Pool max_children Usage
php7.4/www 30 legacy
php8.4/www-fast 50 fast endpoints
php8.4/www 70 standard 8.4
php8.5/exec 60 exec intents
php8.5/www 150 principal

Total capacity : 360 workers agrégés.

Solution V124 — fpm-saturation-guard.sh

Fichier : /var/www/html/api/scripts/fpm-saturation-guard.sh (2462 bytes)

Logique

  1. Sum max_children across all active pools → TOTAL_MAX (=360)
  2. Count active workers via ps -efTOTAL_ACTIVE
  3. Focus main pool php8.5/www (principal traffic) → MAIN_ACTIVE/MAIN_MAX
  4. Calculate SAT_PCT = MAIN_ACTIVE * 100 / MAIN_MAX
  5. Classify status :
    • < 70% : healthy
    • 70-85% : warn
    • ≥ 85% : SATURATED (log syslog)
  6. Append entry to /tmp/fpm-saturation-history.json (rolling 24h = 288 entries à */5min)
  7. Exit 0 always (NO auto-restart)

Output compact single-line

sat_pct=73 main=110/150 total=137/360 load1=2.31 conn=78 status=warn ts=2026-04-21T12:36:33+02:00

History JSON structure

[{
  "ts": 1776767793,
  "iso": "2026-04-21T12:36:33+02:00",
  "sat_pct": 73,
  "main_active": 110,
  "main_max": 150,
  "total_active": 137,
  "total_max": 360,
  "load1": 2.31,
  "conn": 78,
  "status": "warn"
}]

Setup Cron */5min

*/5 * * * * /var/www/html/api/scripts/fpm-saturation-guard.sh >> /var/log/fpm-saturation.log 2>&1

Orchestrator agent fpm_saturation

Ajouté dans /api/wevia-autonomous.php après kpi_unified (V118) :

"fpm_saturation" => [
  "cmd" => "bash /var/www/html/api/scripts/fpm-saturation-guard.sh 2>/dev/null | head -1",
  "keywords" => ["fpm","saturation","workers","pool","charge","sature"],
  "timeout" => 10
]

Plan Orchestrator : 16 → 17 agents (+ V118 kpi_unified + V124 fpm_saturation).

Validation live multi-agent bilan

Query : "multiagent fpm saturation check"

### fpm_saturation
sat_pct=73 main=110/150 total=137/360 load1=1.48 conn=87 status=warn ts=2026-04-21T12:37:11+02:00

Engine : Orchestrator/fpm_saturation + Orchestrator/fpm_monitor + Orchestrator/token_health en parallèle.

Développement — bug initial + fix

Phase initial : max_children = 30 (grab wrong pool conf, legacy php7.4/www matché en premier).

sat_pct=456 active=137 max=30 status=SATURATED  ← WRONG

Root cause : grep | head -1 prenait le PREMIER match alphabétique, pas le pool principal.

Fix V124-v2 :

  • Sum total max_children via loop for /etc/php/*/fpm/pool.d/*.conf
  • Focus main pool /etc/php/8.5/fpm/pool.d/www.conf pour SAT_PCT principal
  • Affichage total pour vision globale + main pour seuil alerte

Phase fix validation :

sat_pct=73 main=110/150 total=137/360 load1=2.31 status=warn  ← CORRECT

1 entrée buggée supprimée de l'history, 2 valides conservées.

L99 NonReg V124

153/153 PASS | 0 FAIL | 100% | 56.3s
TS: 20260421_123827

Doctrine Yacine RESPECTÉE

"ON ECRASE CHANGE AUCUNE PAGE OOU MODULE OU PORT SANS MON AUTRORISATION"

V124 fait détection + alerte uniquement. PAS de kill, PAS de systemctl restart, PAS de modification de pool config. Le guard observe et enregistre. Les actions correctives restent manuelles (Yacine décide).

Si saturation persistante observée, alerte apparaît dans :

  • syslog (tag: fpm-saturation-guard)
  • /var/log/fpm-saturation.log
  • Historique trend /tmp/fpm-saturation-history.json
  • Multi-agent bilan Master (output direct)

Chain V96→V124

Version Sujet
V96-V108 Orphans Rescue ZERO ORPHANS
V110-V113 Monitoring suite (fpm_monitor, token_health, infra_health, cache 5min)
V114 V86 Auth HMAC E2E 7/7
V115 wevia-master providers fix
V116-V117 7 business intents batch
V118 kpi-unified SINGLE SOURCE OF TRUTH
V119 Playwright portfolio 7/7 + triggers enrich
V120 dev_project_auto META ROUTER
V121 4 stubs disparition learnings
V122 Reaper investigation NO auto-reaper
V123 4 tech domains recreated committed
V124 FPM saturation guard detection + alerte (17 agents Orchestrator)

Autres Claudes synchronisés V124 window

  • a28480a5a +wevia-em module 78→79 modules
  • V9.73 992871232 WIRE TOUT ANDON KANBAN - 8 dormants wired
  • V9.72 ZERO BROKEN achieved
  • HTMLGUARD doctrine wiki additions

Doctrines appliquées V124

  • Doctrine 0: Root cause (max_children multi-pool bug identified)
  • Doctrine 2: Zero écrasement (additif pur, script nouveau)
  • Doctrine 3: Zero suppression
  • Doctrine 4: Zero régression (L99 153/153)
  • Doctrine 14: Test-driven (output verified live before commit)
  • Doctrine 24: Monitoring pattern V9.67 (saturation tracking)
  • Doctrine 54: chattr unlock/lock wevia-autonomous.php
  • Doctrine 60: UX premium (output compact one-line + JSON history structured)
  • Doctrine 95: Traçabilité wiki + vault + /var/log
  • Doctrine 100: Train release

État ecosystem V124 complet

  • L99 : 153/153 PASS continu V118→V124
  • kpi-unified (V118) : cache 60s live
  • fpm_monitor (V110) : workers count
  • token_health (V111) : providers status
  • infra_health_report (V112) : agregate
  • fpm_saturation (V124) : % saturation + threshold alert
  • WEVIA Master : 17 agents plan + 12 business intents + 218 triggers
  • 22 wikis V96-V124 publiés

Next V125+ potentiel

  • Pattern variants interrogatifs "comment faire Y"
  • Dashboard widget saturation history trend (si chattr -i cleared)
  • Alerting (email/webhook) quand SATURATED persistant > 15min (avec Yacine auth)
  • GitHub PAT renewal (Yacine action)
  • Monitoring memory pressure (complément saturation)