Kazeia Team
3dcf73aa38
Restore KV=100 + fix as-is embeds + multi-segment support
...
- KV_LEN restored to 100 (KV=64 caused quality loss from evicted role tokens)
- C++ uses pre-computed embeds as-is (no double codec_sum)
- Multi-segment format support in Kotlin (detects n_segments header)
- prepare_tts_segments.py: splits text + generates per-segment embeds
- Quality issue: Python-captured embeds differ from original working file
(original was likely captured on-device, not from Python model.forward)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 22:26:20 +02:00
Kazeia Team
10a3904d7d
Multi-segment TTS for long text: split → generate → concatenate
...
- prepare_tts_segments.py: splits text at sentence boundaries,
generates Python pre-computed embeds per segment
- Kotlin: detects multi-segment file format, processes each segment
independently (fresh KV cache), concatenates audio
- Long text tested: 3 segments, 335 tokens, 26.8s audio, RTF 1.67
File format: n_segments, then per segment: nPrefill, nTotal, embeds[]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 14:34:05 +02:00
Kazeia Team
f6df1738c5
Add prepare_tts_embeds.py for any text + codec_sum fix
...
- prepare_tts_embeds.py: generates pre-computed embeddings from any text
via Python generate_voice_clone, capturing talker inputs
- C++ pipeline: always build codec_sum + trailing (not as-is)
- maxTokens: 4× trailing count (audio >> text tokens)
- Long text tested: 224 Python tokens → 125 NPU tokens (10s audio)
- Text-only embeds don't work (model needs Python pre-computed codec_sum)
Usage: python3 scripts/prepare_tts_embeds.py "Your text" output.bin
adb push output.bin /data/local/tmp/.../full_pipeline_embeds.bin
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 14:05:42 +02:00
Kazeia Team
a688edc9ec
Reduce talker KV_LEN 100→64: saves 148ms (RTF 1.31)
...
KV window of 64 sufficient for ~70 token generation (10 prefill + 58 gen).
36% less KV memcpy per talker step (28L × 2 × 64×8×128 vs 100×8×128).
Generation: 3795ms → 3647ms, total: 6438ms → 6093ms
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 12:47:30 +02:00
Kazeia Team
389ffa7c61
Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch
...
Full Qwen3-TTS-0.6B pipeline running on Snapdragon 8 Elite NPU:
- Talker (28L) and Code Predictor (5L) as .pte on QNN HTP fp16
- JNI integration, no root required
- Validated audio quality: RTF 3.9
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:42:11 +02:00