kazeia/scripts
Kazeia Team c25040a780 TTS: conditional tail-trim + export script accepts voice path arg
Two small changes:

  * export_tts_text_embeddings.py now takes the voice wav as an optional
    second CLI arg (defaults to damien_15s_24k.wav). Lets the same script
    capture voice-prefix+suffix for any speaker wav without editing the
    source — used today to test Elodie alongside Damien.

  * synthesizeTextStreaming + generateSegmentAudioVC only run the
    trimTailLowEnergy trim when n >= maxGen. The trim's 35%-of-peak
    threshold is tuned to catch "page beg beg" filler after the talker
    fails to emit EOS — but it was cutting valid speech when EOS fired
    early (observed on Elodie seg 1: 10.08 s → 2.92 s, a 4-second over-
    trim). With the guard it's a no-op on converging generations and
    only fires on the ~15% of segments that hit maxGen.

Validation after the fix (Elodie, Baer monologue):
  - seg 1: 126 tokens = maxGen → trimmed 10.08 s → 8.88 s (1.2 s cut,
           the filler tail)
  - seg 2: 105 tokens < 138 maxGen → no trim, 8.4 s kept as-is
  - seg 3: 69 tokens < 96 maxGen → no trim, 5.6 s kept as-is

Voice prefix/suffix shape is speaker-invariant except position 7 (the
xvector). Confirmed by capturing both Damien and Elodie and diffing:
positions 0-6 and 8 identical within 1e-8, suffix identical within
1e-8, only pos 7 has a different xvector embedding (norm 10.36 vs 10.12).
That means swapping speakers on-device is a 45 KB file push — no app
rebuild, no re-export of the 297 MB vocabulary table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 11:32:33 +02:00
..
cp_et_runner.cpp Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch 2026-04-09 08:42:11 +02:00
export_cp_pte.py Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch 2026-04-09 08:42:11 +02:00
export_talker_pte.py Restore KV=100 + fix as-is embeds + multi-segment support 2026-04-09 22:26:20 +02:00
export_tts_text_embeddings.py TTS: conditional tail-trim + export script accepts voice path arg 2026-04-13 11:32:33 +02:00
prepare_tts_embeds.py Add prepare_tts_embeds.py for any text + codec_sum fix 2026-04-09 14:05:42 +02:00
prepare_tts_native.py TTS tremor investigation: identify cross-arch numerical floor, gate diag flags 2026-04-13 00:15:14 +02:00
prepare_tts_segments.py TTS Stage 1 streaming: play each segment the moment it's decoded 2026-04-13 08:43:30 +02:00
prepare_tts_voiceclone.py TTS tremor investigation: identify cross-arch numerical floor, gate diag flags 2026-04-13 00:15:14 +02:00
qc_schema_serialize_patched.py Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch 2026-04-09 08:42:11 +02:00
test_cp_et_quality.py Initial commit: Kazeia TTS pipeline on NPU via ExecuTorch 2026-04-09 08:42:11 +02:00