kazeia

Commit Graph

Author	SHA1	Message	Date
Kazeia Team	db281002d9	scripts: export per-voice prefix/suffix embeddings New tool + generated artefacts so the on-device voice spinner can now hot-swap between all 8 voices — previously only Damien's prefix/suffix were present in the model dir, and the tablet fell back to him regardless of selection. scripts/export_voice_prefix_suffix.py runs Qwen3TTS's voice-clone path under a forward hook, captures the first prefill call's 1024-dim talker input embeddings, aborts the rest of the (very slow on CPU) decode via a sentinel exception, and slices out the first 9 vectors as <name>_voice_prefix.bin and the last 2 as <name>_voice_suffix.bin. Validated against the shipped damien_voice_prefix.bin: using damien_15s_24k.wav as the reference audio, max\|diff\| = 0, so the extraction matches the original tooling bit-for-bit. Generated and adb-pushed to /data/local/tmp/kazeia/models/qwen3-tts-npu/: amir / didier / elodie / jerome / richard / sid / zelda (+ re-generated damien from the canonical 15s_24k reference) Qwen3TtsEngine.setVoice (already wired) reads <voice>_voice_prefix.bin / <voice>_voice_suffix.bin by basename, so voice changes now take effect from the next synthesized segment with no app restart. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 00:09:23 +02:00

Author

SHA1

Message

Date

Kazeia Team

db281002d9

scripts: export per-voice prefix/suffix embeddings

New tool + generated artefacts so the on-device voice spinner can now
hot-swap between all 8 voices — previously only Damien's prefix/suffix
were present in the model dir, and the tablet fell back to him
regardless of selection.

scripts/export_voice_prefix_suffix.py runs Qwen3TTS's voice-clone
path under a forward hook, captures the first prefill call's 1024-dim
talker input embeddings, aborts the rest of the (very slow on CPU)
decode via a sentinel exception, and slices out the first 9 vectors
as <name>_voice_prefix.bin and the last 2 as <name>_voice_suffix.bin.
Validated against the shipped damien_voice_prefix.bin: using
damien_15s_24k.wav as the reference audio, max|diff| = 0, so the
extraction matches the original tooling bit-for-bit.

Generated and adb-pushed to
/data/local/tmp/kazeia/models/qwen3-tts-npu/:
  amir / didier / elodie / jerome / richard / sid / zelda
  (+ re-generated damien from the canonical 15s_24k reference)

Qwen3TtsEngine.setVoice (already wired) reads <voice>_voice_prefix.bin
/ <voice>_voice_suffix.bin by basename, so voice changes now take
effect from the next synthesized segment with no app restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-15 00:09:23 +02:00

1 Commits