Root cause: embeds must come from SAME NPU model instance. Python fp32 embeds cause divergence on NPU fp16 after ~20 steps. Solution: Java pipeline captures embeds on-device during generation. Captured embeds work perfectly with C++ pipeline (validated "bon"). - Added capture mode: touch /data/local/tmp/kazeia/capture_mode - Embeds saved to captured_embeds.bin (same format as pipeline input) - KV_LEN restored to 100 (KV=64 lost role tokens → quality loss) - C++ uses pre-computed embeds as-is (no double codec_sum) Production path: Java pipeline RTF 1.8 for new texts (good quality) Replay path: C++ pipeline RTF 1.26 with captured embeds Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| app | ||
| gradle/wrapper | ||
| COMPILE_WHISPER_NPU.md | ||
| RAPPORT_TTS_NPU.md | ||
| RAPPORT_TTS_QWEN3_TESTS.md | ||
| build.gradle.kts | ||
| gradle.properties | ||
| gradlew | ||
| gradlew.bat | ||
| settings.gradle.kts | ||