The verbose 55-token system prompt was the cheapest TTFT win on the
kv-only path (52 ms per prefill token). Compacting it to 25 tokens while
keeping the three load-bearing constraints — Kazeia identity, French only,
short replies, /no_think — measurably improved end-to-end latency.
Validated 'Bonjour, comment vas-tu ?' on tablet:
Before: prompt_tokens=80, TTFT=4202ms, total=5716ms
After: prompt_tokens=53, TTFT=2865ms, total=4034ms (-1.3s, -32% TTFT)
Reply quality preserved: "Bonjour ! Je vais bien, merci. Comment vas-tu ?"
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>