kazeia

History

Kazeia Team 10fd10fd90 TTS: overlap CP↔BigVGAN — first audio 14.5s → 10.9s per segment Streaming variant of the per-segment decode pipeline. As soon as SEQ_LEN codes are accumulated from the talker/CP loop, BigVGAN is dispatched on a background coroutine while the producer keeps generating the rest of the segment. The BigVGAN consumer feeds a streaming crossfader that emits stable audio as it arrives and holds back overlapSamples for the next chunk's blend. Mirrors decodeChunked's semantics exactly so final audio is bit-identical modulo the fadeOut application location (now applied to the final emission tail instead of the full buffer; the last 40ms still get faded). Validated A/B on the same prompt 3 used in the recent benchmark: prompt: "Je me sens un peu triste aujourdhui…" seg 0 first audio: 14 485 ms → 10 936 ms (−3.5 s) end-to-end first audio (LLM trigger → audio): 16.2 s → 12.7 s Stream LLM total: 33 234 ms → 28 594 ms (−4.6 s) Short segments (<SEQ_LEN codes) and the legacy non-streaming callers (generateSegmentAudioVC, decodeChunked, multi-segment pipelines, etc.) are untouched. The new path is gated behind USE_STREAMING_DECODE so it can be reverted by flipping a single const if a regression is found. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>		2026-04-14 16:22:15 +02:00
..
src/main	TTS: overlap CP↔BigVGAN — first audio 14.5s → 10.9s per segment	2026-04-14 16:22:15 +02:00
build.gradle.kts	TTS tremor investigation: identify cross-arch numerical floor, gate diag flags	2026-04-13 00:15:14 +02:00
proguard-rules.pro	Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch	2026-04-09 08:42:11 +02:00