diff --git a/wiki-public/wevia-master-honesty-guard-b12.md b/wiki-public/wevia-master-honesty-guard-b12.md index d41c0c85a..25c40f85b 100644 --- a/wiki-public/wevia-master-honesty-guard-b12.md +++ b/wiki-public/wevia-master-honesty-guard-b12.md @@ -167,3 +167,44 @@ Tests zero regression : FastPath + V84 smart_client_help + pages publiques OK. WEVIA devient plus prudente sur les chiffres internes aussi : si un tool retourne vide/incomplet elle refuse de citer les valeurs connues du systemPrompt (146K HCPs, 619 tools). Workaround : utiliser les intents dedies (ethica_count, tools_registry_count) qui retournent les valeurs live via exec real. +--- + +## FIX 6b + 6c — DETERMINISTIC EXTERNAL REFUSE (15:35) + +### Problem + +Fix 6a (prompt-only) insufficient: LLM cascade non-deterministic. +- Attempt 1 meteo : honest +- Attempt 2 meteo : halluc (19 deg C partiellement nuageux) +- Attempt 3 meteo : halluc (20 deg C nord-ouest) +Mistral / Qwen ignore buried prompt rules. Only Cerebras obeys. + +### Solution 6c : pre-intercept via opus-intents + +Same pattern as Fix 5 smart_client_help which works perfectly. +New INTENT in wevia-opus-intents.php BEFORE LLM cascade. +Regex : meteo/temperature/humidite/cours/bitcoin/btc/eth/sp500/news/resultat/heure exacte. + +### Result + +3/3 meteo now deterministically return honest refuse: +> "Je n ai pas acces a cette information en temps reel. Veux-tu que je lance un outil dedie ?" + +Bitcoin, sports, news also handled by same regex. + +### 6 honesty defenses layers + +1. HONESTY_GUARD_V1 x 3 (enrichPrompts after exec - Fix 1) +2. HONESTY_GUARD_MAIN_V6 (systemPrompt trailing - Fix 6a) +3. HONESTY_GUARD_LEAD_V6B (systemPrompt leading - Fix 6b) +4. EXTERNAL_INFO_REFUSE_V6C (opus-intents pre-intercept - Fix 6c) + +Deterministic opus-intents = strong defense. +Prompt-level = soft defense for unplanned cases. + +### Files + +- /var/www/html/api/wevia-opus-intents.php : 67001 -> 68182 B (+1181) +- /var/www/html/api/wevia-autonomous.php : 85050 -> 86228 B (+1178 via Fix 6a + Fix 6b) + +Both chattr +i relocked.