Root cause: embeds must come from SAME NPU model instance.
Python fp32 embeds cause divergence on NPU fp16 after ~20 steps.
Solution: Java pipeline captures embeds on-device during generation.
Captured embeds work perfectly with C++ pipeline (validated "bon").
- Added capture mode: touch /data/local/tmp/kazeia/capture_mode
- Embeds saved to captured_embeds.bin (same format as pipeline input)
- KV_LEN restored to 100 (KV=64 lost role tokens → quality loss)
- C++ uses pre-computed embeds as-is (no double codec_sum)
Production path: Java pipeline RTF 1.8 for new texts (good quality)
Replay path: C++ pipeline RTF 1.26 with captured embeds
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>