V124 FPM Saturation Guard - detection + alerte multi-pool NO auto-restart
Some checks failed
WEVAL NonReg / nonreg (push) Has been cancelled
Some checks failed
WEVAL NonReg / nonreg (push) Has been cancelled
Pattern V9.67 recurrent 11:00 UTC false positives addressed via dedicated saturation monitoring WITHOUT touching critical FPM port. Doctrine Yacine respected: ON ECRASE CHANGE AUCUNE PAGE OU MODULE OU PORT SANS MON AUTORISATION. V124 = detection + alerte ONLY. Existing watchdog /opt/php-fpm-watchdog.sh only handles binary UP/DOWN. Gap: saturation detection (workers active / max_children %) absent. Solution V124 /var/www/html/api/scripts/fpm-saturation-guard.sh 2462 bytes: Multi-pool reality S204: - php7.4/www max=30 legacy - php8.4/www-fast max=50 - php8.4/www max=70 - php8.5/exec max=60 - php8.5/www max=150 PRINCIPAL Total capacity 360 workers aggregated. Logic: 1. Sum max_children all pools -> TOTAL_MAX 2. Count active workers via ps -ef php-fpm pool 3. Focus main pool php8.5/www principal traffic 4. Calculate SAT_PCT main_active main_max 5. Classify status: - less 70 pct = healthy - 70-85 pct = warn - greater equal 85 pct = SATURATED logger syslog 6. Append entry /tmp/fpm-saturation-history.json rolling 24h 288 entries 7. Exit 0 always NO auto-restart Output compact: sat_pct=73 main=110/150 total=137/360 load1=2.31 conn=78 status=warn ts=ISO History JSON structured for trend analysis. Cron installed 5min interval: */5 * * * * /var/www/html/api/scripts/fpm-saturation-guard.sh to /var/log/fpm-saturation.log Orchestrator agent fpm_saturation added __orch_registry: keywords: fpm saturation workers pool charge sature Plan: 16 -> 17 agents Live validation multi-agent: Query multiagent fpm saturation check -> Orchestrator/fpm_saturation output sat_pct=73 main=110/150 total=137/360 status=warn Bug initial + fix documented: Phase 1: grep head -1 took wrong pool legacy max=30 sat_pct=456 wrong Phase 2: loop all pools sum correctly + focus main pool php8.5/www L99 NonReg V124: 153/153 PASS 0 FAIL 100 pct 56.3s TS 20260421_123827 Chain V96-V124: V96-V108 Orphans ZERO, V110-V113 Monitoring suite, V114 Auth HMAC E2E, V115 wevia-master fix, V116-V117 7 business intents, V118 kpi-unified SINGLE SOURCE OF TRUTH, V119 Playwright portfolio 7/7, V120 META router, V121 learnings, V122 reaper investigation NO auto-reaper, V123 4 tech domains recreated, V124 FPM saturation guard 17 agents Synchro autres Claudes: -a28480a5awevia-em module - V9.73992871232WIRE TOUT 8 dormants - V9.72 ZERO BROKEN achieved Doctrine 24 monitoring pattern applied Doctrines 0+2+3+4+14+24+54+60+95+100 applied Zero suppression zero ecrasement zero fake zero regression
This commit is contained in:
198
wiki/session-V124-fpm-saturation-guard.md
Normal file
198
wiki/session-V124-fpm-saturation-guard.md
Normal file
@@ -0,0 +1,198 @@
|
||||
# V124 - FPM Saturation Guard - detection + alerte (NO auto-restart) - 2026-04-21
|
||||
|
||||
## Objectif
|
||||
Résoudre le pattern récurrent V9.67 (false positive FPM 11:00 UTC) en
|
||||
ajoutant un guard qui **détecte et alerte** sans toucher aux pools
|
||||
(port critique = doctrine Yacine).
|
||||
|
||||
## Gap identifié
|
||||
|
||||
**Watchdog existant** (`/opt/php-fpm-watchdog.sh`, cron */2min) :
|
||||
```
|
||||
#!/bin/bash
|
||||
for ver in 8.4 8.5; do
|
||||
systemctl is-active --quiet php${ver}-fpm || systemctl restart php${ver}-fpm
|
||||
done
|
||||
systemctl is-active --quiet apache2 || systemctl restart apache2
|
||||
```
|
||||
|
||||
Watchdog = **binaire UP/DOWN** uniquement. Ne détecte PAS :
|
||||
- Saturation workers (active / max_children → 100%)
|
||||
- Pattern récurrent de pression (ex: 11:00 UTC V9.67)
|
||||
- Historique sur 24h pour analyse trend
|
||||
|
||||
## Multi-pool FPM réalité S204
|
||||
|
||||
5 pools actifs sur 3 versions PHP :
|
||||
|
||||
| Pool | max_children | Usage |
|
||||
|---|---|---|
|
||||
| php7.4/www | 30 | legacy |
|
||||
| php8.4/www-fast | 50 | fast endpoints |
|
||||
| php8.4/www | 70 | standard 8.4 |
|
||||
| php8.5/exec | 60 | exec intents |
|
||||
| **php8.5/www** | **150** | **principal** |
|
||||
|
||||
**Total capacity** : 360 workers agrégés.
|
||||
|
||||
## Solution V124 — `fpm-saturation-guard.sh`
|
||||
|
||||
Fichier : `/var/www/html/api/scripts/fpm-saturation-guard.sh` (2462 bytes)
|
||||
|
||||
### Logique
|
||||
1. Sum `max_children` across all active pools → `TOTAL_MAX` (=360)
|
||||
2. Count active workers via `ps -ef` → `TOTAL_ACTIVE`
|
||||
3. Focus main pool php8.5/www (principal traffic) → `MAIN_ACTIVE/MAIN_MAX`
|
||||
4. Calculate `SAT_PCT = MAIN_ACTIVE * 100 / MAIN_MAX`
|
||||
5. Classify status :
|
||||
- `< 70%` : healthy
|
||||
- `70-85%` : warn
|
||||
- `≥ 85%` : **SATURATED** (log syslog)
|
||||
6. Append entry to `/tmp/fpm-saturation-history.json` (rolling 24h = 288 entries à */5min)
|
||||
7. Exit 0 always (**NO auto-restart**)
|
||||
|
||||
### Output compact single-line
|
||||
```
|
||||
sat_pct=73 main=110/150 total=137/360 load1=2.31 conn=78 status=warn ts=2026-04-21T12:36:33+02:00
|
||||
```
|
||||
|
||||
### History JSON structure
|
||||
```json
|
||||
[{
|
||||
"ts": 1776767793,
|
||||
"iso": "2026-04-21T12:36:33+02:00",
|
||||
"sat_pct": 73,
|
||||
"main_active": 110,
|
||||
"main_max": 150,
|
||||
"total_active": 137,
|
||||
"total_max": 360,
|
||||
"load1": 2.31,
|
||||
"conn": 78,
|
||||
"status": "warn"
|
||||
}]
|
||||
```
|
||||
|
||||
## Setup Cron */5min
|
||||
```
|
||||
*/5 * * * * /var/www/html/api/scripts/fpm-saturation-guard.sh >> /var/log/fpm-saturation.log 2>&1
|
||||
```
|
||||
|
||||
## Orchestrator agent `fpm_saturation`
|
||||
|
||||
Ajouté dans `/api/wevia-autonomous.php` après `kpi_unified` (V118) :
|
||||
|
||||
```php
|
||||
"fpm_saturation" => [
|
||||
"cmd" => "bash /var/www/html/api/scripts/fpm-saturation-guard.sh 2>/dev/null | head -1",
|
||||
"keywords" => ["fpm","saturation","workers","pool","charge","sature"],
|
||||
"timeout" => 10
|
||||
]
|
||||
```
|
||||
|
||||
Plan Orchestrator : **16 → 17 agents** (+ V118 kpi_unified + V124 fpm_saturation).
|
||||
|
||||
## Validation live multi-agent bilan
|
||||
|
||||
Query : `"multiagent fpm saturation check"` →
|
||||
```
|
||||
### fpm_saturation
|
||||
sat_pct=73 main=110/150 total=137/360 load1=1.48 conn=87 status=warn ts=2026-04-21T12:37:11+02:00
|
||||
```
|
||||
|
||||
Engine : `Orchestrator/fpm_saturation` + `Orchestrator/fpm_monitor` + `Orchestrator/token_health` en parallèle.
|
||||
|
||||
## Développement — bug initial + fix
|
||||
|
||||
**Phase initial** : max_children = 30 (grab wrong pool conf, legacy php7.4/www matché en premier).
|
||||
|
||||
```
|
||||
sat_pct=456 active=137 max=30 status=SATURATED ← WRONG
|
||||
```
|
||||
|
||||
**Root cause** : `grep | head -1` prenait le PREMIER match alphabétique, pas le pool principal.
|
||||
|
||||
**Fix V124-v2** :
|
||||
- Sum total max_children via loop for `/etc/php/*/fpm/pool.d/*.conf`
|
||||
- Focus main pool `/etc/php/8.5/fpm/pool.d/www.conf` pour SAT_PCT principal
|
||||
- Affichage total pour vision globale + main pour seuil alerte
|
||||
|
||||
**Phase fix validation** :
|
||||
```
|
||||
sat_pct=73 main=110/150 total=137/360 load1=2.31 status=warn ← CORRECT
|
||||
```
|
||||
|
||||
1 entrée buggée supprimée de l'history, 2 valides conservées.
|
||||
|
||||
## L99 NonReg V124
|
||||
```
|
||||
153/153 PASS | 0 FAIL | 100% | 56.3s
|
||||
TS: 20260421_123827
|
||||
```
|
||||
|
||||
## Doctrine Yacine RESPECTÉE
|
||||
|
||||
> *"ON ECRASE CHANGE AUCUNE PAGE OOU MODULE OU PORT SANS MON AUTRORISATION"*
|
||||
|
||||
V124 fait **détection + alerte uniquement**. PAS de `kill`, PAS de
|
||||
`systemctl restart`, PAS de modification de pool config. Le guard
|
||||
observe et enregistre. Les actions correctives restent manuelles
|
||||
(Yacine décide).
|
||||
|
||||
Si saturation persistante observée, alerte apparaît dans :
|
||||
- syslog (`tag: fpm-saturation-guard`)
|
||||
- `/var/log/fpm-saturation.log`
|
||||
- Historique trend `/tmp/fpm-saturation-history.json`
|
||||
- Multi-agent bilan Master (output direct)
|
||||
|
||||
## Chain V96→V124
|
||||
|
||||
| Version | Sujet |
|
||||
|---|---|
|
||||
| V96-V108 | Orphans Rescue ZERO ORPHANS |
|
||||
| V110-V113 | Monitoring suite (fpm_monitor, token_health, infra_health, cache 5min) |
|
||||
| V114 | V86 Auth HMAC E2E 7/7 |
|
||||
| V115 | wevia-master providers fix |
|
||||
| V116-V117 | 7 business intents batch |
|
||||
| V118 | kpi-unified SINGLE SOURCE OF TRUTH |
|
||||
| V119 | Playwright portfolio 7/7 + triggers enrich |
|
||||
| V120 | dev_project_auto META ROUTER |
|
||||
| V121 | 4 stubs disparition learnings |
|
||||
| V122 | Reaper investigation NO auto-reaper |
|
||||
| V123 | 4 tech domains recreated committed |
|
||||
| **V124** | **FPM saturation guard detection + alerte (17 agents Orchestrator)** |
|
||||
|
||||
## Autres Claudes synchronisés V124 window
|
||||
- a28480a5a +wevia-em module 78→79 modules
|
||||
- V9.73 992871232 WIRE TOUT ANDON KANBAN - 8 dormants wired
|
||||
- V9.72 ZERO BROKEN achieved
|
||||
- HTMLGUARD doctrine wiki additions
|
||||
|
||||
## Doctrines appliquées V124
|
||||
- Doctrine 0: Root cause (max_children multi-pool bug identified)
|
||||
- Doctrine 2: Zero écrasement (additif pur, script nouveau)
|
||||
- Doctrine 3: Zero suppression
|
||||
- Doctrine 4: Zero régression (L99 153/153)
|
||||
- Doctrine 14: Test-driven (output verified live before commit)
|
||||
- Doctrine 24: Monitoring pattern V9.67 (saturation tracking)
|
||||
- Doctrine 54: chattr unlock/lock wevia-autonomous.php
|
||||
- Doctrine 60: UX premium (output compact one-line + JSON history structured)
|
||||
- Doctrine 95: Traçabilité wiki + vault + /var/log
|
||||
- Doctrine 100: Train release
|
||||
|
||||
## État ecosystem V124 complet
|
||||
|
||||
- **L99** : 153/153 PASS continu V118→V124
|
||||
- **kpi-unified** (V118) : cache 60s live
|
||||
- **fpm_monitor** (V110) : workers count
|
||||
- **token_health** (V111) : providers status
|
||||
- **infra_health_report** (V112) : agregate
|
||||
- **fpm_saturation** (V124) : % saturation + threshold alert
|
||||
- **WEVIA Master** : 17 agents plan + 12 business intents + 218 triggers
|
||||
- **22 wikis** V96-V124 publiés
|
||||
|
||||
## Next V125+ potentiel
|
||||
- [ ] Pattern variants interrogatifs "comment faire Y"
|
||||
- [ ] Dashboard widget saturation history trend (si chattr -i cleared)
|
||||
- [ ] Alerting (email/webhook) quand SATURATED persistant > 15min (avec Yacine auth)
|
||||
- [ ] GitHub PAT renewal (Yacine action)
|
||||
- [ ] Monitoring memory pressure (complément saturation)
|
||||
Reference in New Issue
Block a user