Commit Graph

55 Commits

Author SHA1 Message Date
Kazeia Team 393ce79eb5 Native C++ pipeline: RTF 1.4 (was 3.6 in Java)
Full talker+CP autoregressive loop in C++ via JNI.
Talker 20ms/step, CP 44ms/step, total 6.6s for 4.64s audio.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 10:09:32 +02:00
Kazeia Team fb6045a635 Pre-load CP heads + GPU decoder test (reverted) + headArgmaxOffset
- Pre-load all 15 CP heads at first CP call (eliminates lazy-load lag)
- Tested BigVGAN on GPU Adreno: no gain (+300ms vs CPU), kept on CPU
- Added headArgmaxOffset for future batch optimization
- Cancel previous pipeline on new run_pipeline intent

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 09:57:01 +02:00
Kazeia Team 6e6c562d53 Add DSP warmup + fix pipeline thread contention
- Warmup forward() for talker+CP during init (avoids 7s DSP compilation
  on first pipeline run)
- Cancel previous pipeline job before starting new one
- Use Dispatchers.IO for pipeline intent

First run after warmup: talker 19ms/step, CP 59ms/step → RTF ~1.9

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 09:24:18 +02:00
Kazeia Team 8bfe6c7445 Add NEON SIMD heads argmax for CP — 2.3× speedup
CP head dot products (15 × 2048×1024) optimized with ARM NEON
vfmaq_f32 (4 accumulators, 16 floats/iteration).

CP/frame: 131ms → 58ms, total pipeline: 22.7s → 14.7s (RTF 3.2)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:55:20 +02:00
Kazeia Team 389ffa7c61 Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch
Full Qwen3-TTS-0.6B pipeline running on Snapdragon 8 Elite NPU:
  - Talker (28L) and Code Predictor (5L) as .pte on QNN HTP fp16
  - JNI integration, no root required
  - Validated audio quality: RTF 3.9

  Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 08:42:11 +02:00