KV window of 64 sufficient for ~70 token generation (10 prefill + 58 gen). 36% less KV memcpy per talker step (28L × 2 × 64×8×128 vs 100×8×128). Generation: 3795ms → 3647ms, total: 6438ms → 6093ms Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| app | ||
| gradle/wrapper | ||
| COMPILE_WHISPER_NPU.md | ||
| RAPPORT_TTS_NPU.md | ||
| RAPPORT_TTS_QWEN3_TESTS.md | ||
| build.gradle.kts | ||
| gradle.properties | ||
| gradlew | ||
| gradlew.bat | ||
| settings.gradle.kts | ||