Kazeia Team
|
f6df1738c5
|
Add prepare_tts_embeds.py for any text + codec_sum fix
- prepare_tts_embeds.py: generates pre-computed embeddings from any text
via Python generate_voice_clone, capturing talker inputs
- C++ pipeline: always build codec_sum + trailing (not as-is)
- maxTokens: 4× trailing count (audio >> text tokens)
- Long text tested: 224 Python tokens → 125 NPU tokens (10s audio)
- Text-only embeds don't work (model needs Python pre-computed codec_sum)
Usage: python3 scripts/prepare_tts_embeds.py "Your text" output.bin
adb push output.bin /data/local/tmp/.../full_pipeline_embeds.bin
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-09 14:05:42 +02:00 |
Kazeia Team
|
a688edc9ec
|
Reduce talker KV_LEN 100→64: saves 148ms (RTF 1.31)
KV window of 64 sufficient for ~70 token generation (10 prefill + 58 gen).
36% less KV memcpy per talker step (28L × 2 × 64×8×128 vs 100×8×128).
Generation: 3795ms → 3647ms, total: 6438ms → 6093ms
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-09 12:47:30 +02:00 |
Kazeia Team
|
389ffa7c61
|
Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch
Full Qwen3-TTS-0.6B pipeline running on Snapdragon 8 Elite NPU:
- Talker (28L) and Code Predictor (5L) as .pte on QNN HTP fp16
- JNI integration, no root required
- Validated audio quality: RTF 3.9
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-09 08:42:11 +02:00 |