kazeia/kazeia-android
Kazeia Team 3d435f9cdd LLM: trim system prompt to drop ~27 prefill tokens (-1.3s TTFT)
The verbose 55-token system prompt was the cheapest TTFT win on the
kv-only path (52 ms per prefill token). Compacting it to 25 tokens while
keeping the three load-bearing constraints — Kazeia identity, French only,
short replies, /no_think — measurably improved end-to-end latency.

Validated 'Bonjour, comment vas-tu ?' on tablet:
  Before: prompt_tokens=80, TTFT=4202ms, total=5716ms
  After:  prompt_tokens=53, TTFT=2865ms, total=4034ms (-1.3s, -32% TTFT)

Reply quality preserved: "Bonjour ! Je vais bien, merci. Comment vas-tu ?"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:16:11 +02:00
..
app LLM: trim system prompt to drop ~27 prefill tokens (-1.3s TTFT) 2026-04-14 12:16:11 +02:00
gradle/wrapper Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch 2026-04-09 08:42:11 +02:00
COMPILE_WHISPER_NPU.md Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch 2026-04-09 08:42:11 +02:00
RAPPORT_TTS_NPU.md Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch 2026-04-09 08:42:11 +02:00
RAPPORT_TTS_QWEN3_TESTS.md Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch 2026-04-09 08:42:11 +02:00
build.gradle.kts Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch 2026-04-09 08:42:11 +02:00
gradle.properties Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch 2026-04-09 08:42:11 +02:00
gradlew Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch 2026-04-09 08:42:11 +02:00
gradlew.bat Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch 2026-04-09 08:42:11 +02:00
settings.gradle.kts Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch 2026-04-09 08:42:11 +02:00