Commit Graph

3 Commits

Author SHA1 Message Date
Kazeia Team 3dcf73aa38 Restore KV=100 + fix as-is embeds + multi-segment support
- KV_LEN restored to 100 (KV=64 caused quality loss from evicted role tokens)
- C++ uses pre-computed embeds as-is (no double codec_sum)
- Multi-segment format support in Kotlin (detects n_segments header)
- prepare_tts_segments.py: splits text + generates per-segment embeds
- Quality issue: Python-captured embeds differ from original working file
  (original was likely captured on-device, not from Python model.forward)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 22:26:20 +02:00
Kazeia Team a688edc9ec Reduce talker KV_LEN 100→64: saves 148ms (RTF 1.31)
KV window of 64 sufficient for ~70 token generation (10 prefill + 58 gen).
36% less KV memcpy per talker step (28L × 2 × 64×8×128 vs 100×8×128).

Generation: 3795ms → 3647ms, total: 6438ms → 6093ms

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 12:47:30 +02:00
Kazeia Team 389ffa7c61 Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch
Full Qwen3-TTS-0.6B pipeline running on Snapdragon 8 Elite NPU:
  - Talker (28L) and Code Predictor (5L) as .pte on QNN HTP fp16
  - JNI integration, no root required
  - Validated audio quality: RTF 3.9

  Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:42:11 +02:00