Commit Graph

1 Commits

Author SHA1 Message Date
Kazeia Team db281002d9 scripts: export per-voice prefix/suffix embeddings
New tool + generated artefacts so the on-device voice spinner can now
hot-swap between all 8 voices — previously only Damien's prefix/suffix
were present in the model dir, and the tablet fell back to him
regardless of selection.

scripts/export_voice_prefix_suffix.py runs Qwen3TTS's voice-clone
path under a forward hook, captures the first prefill call's 1024-dim
talker input embeddings, aborts the rest of the (very slow on CPU)
decode via a sentinel exception, and slices out the first 9 vectors
as <name>_voice_prefix.bin and the last 2 as <name>_voice_suffix.bin.
Validated against the shipped damien_voice_prefix.bin: using
damien_15s_24k.wav as the reference audio, max|diff| = 0, so the
extraction matches the original tooling bit-for-bit.

Generated and adb-pushed to
/data/local/tmp/kazeia/models/qwen3-tts-npu/:
  amir / didier / elodie / jerome / richard / sid / zelda
  (+ re-generated damien from the canonical 15s_24k reference)

Qwen3TtsEngine.setVoice (already wired) reads <voice>_voice_prefix.bin
/ <voice>_voice_suffix.bin by basename, so voice changes now take
effect from the next synthesized segment with no app restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:09:23 +02:00