kazeia

History

Kazeia Team c25040a780 TTS: conditional tail-trim + export script accepts voice path arg Two small changes: * export_tts_text_embeddings.py now takes the voice wav as an optional second CLI arg (defaults to damien_15s_24k.wav). Lets the same script capture voice-prefix+suffix for any speaker wav without editing the source — used today to test Elodie alongside Damien. * synthesizeTextStreaming + generateSegmentAudioVC only run the trimTailLowEnergy trim when n >= maxGen. The trim's 35%-of-peak threshold is tuned to catch "page beg beg" filler after the talker fails to emit EOS — but it was cutting valid speech when EOS fired early (observed on Elodie seg 1: 10.08 s → 2.92 s, a 4-second over- trim). With the guard it's a no-op on converging generations and only fires on the ~15% of segments that hit maxGen. Validation after the fix (Elodie, Baer monologue): - seg 1: 126 tokens = maxGen → trimmed 10.08 s → 8.88 s (1.2 s cut, the filler tail) - seg 2: 105 tokens < 138 maxGen → no trim, 8.4 s kept as-is - seg 3: 69 tokens < 96 maxGen → no trim, 5.6 s kept as-is Voice prefix/suffix shape is speaker-invariant except position 7 (the xvector). Confirmed by capturing both Damien and Elodie and diffing: positions 0-6 and 8 identical within 1e-8, suffix identical within 1e-8, only pos 7 has a different xvector embedding (norm 10.36 vs 10.12). That means swapping speakers on-device is a 45 KB file push — no app rebuild, no re-export of the 297 MB vocabulary table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>		2026-04-13 11:32:33 +02:00
..
cp_et_runner.cpp	Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch	2026-04-09 08:42:11 +02:00
export_cp_pte.py	Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch	2026-04-09 08:42:11 +02:00
export_talker_pte.py	Restore KV=100 + fix as-is embeds + multi-segment support	2026-04-09 22:26:20 +02:00
export_tts_text_embeddings.py	TTS: conditional tail-trim + export script accepts voice path arg	2026-04-13 11:32:33 +02:00
prepare_tts_embeds.py	Add prepare_tts_embeds.py for any text + codec_sum fix	2026-04-09 14:05:42 +02:00
prepare_tts_native.py	TTS tremor investigation: identify cross-arch numerical floor, gate diag flags	2026-04-13 00:15:14 +02:00
prepare_tts_segments.py	TTS Stage 1 streaming: play each segment the moment it's decoded	2026-04-13 08:43:30 +02:00
prepare_tts_voiceclone.py	TTS tremor investigation: identify cross-arch numerical floor, gate diag flags	2026-04-13 00:15:14 +02:00
qc_schema_serialize_patched.py	Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch	2026-04-09 08:42:11 +02:00
test_cp_et_quality.py	Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch	2026-04-09 08:42:11 +02:00