Three tightly-coupled UX fixes the user flagged during live testing.
**Voice hot-swap (Qwen3TtsEngine.setVoice)**: previously a no-op
stub — the spinner callback updated the orb color but the actual
audio kept using Damien's cached prefix/suffix embeddings. Now we
derive the voice id from the WAV basename (elodie.wav → 'elodie'),
look up `<id>_voice_prefix.bin` + `<id>_voice_suffix.bin` in the
model dir, parse their headers, and atomically replace the embedding
arrays so the NEXT synthesized segment uses the new voice. If the
files aren't present we log a clear warning pointing at
prepare_tts_native.py — the hot-swap is wired, but per-voice prefix/
suffix still need to be generated offline and adb-pushed.
KazeiaService.setVoice now forwards to Qwen3TtsEngine in addition to
the Chatterbox branch.
**Emoji stripping**: the model loves closing on "😊" and it was
reaching TTS as a standalone segment that synthesized a fraction of
a second of junk. KazeiaPipeline.speakText now runs each sentence
through stripNonSpeakable before enqueueing — drops Unicode emoji /
dingbat / pictograph / flag blocks plus variation selectors and
zero-width joiners, then trims. Empty-after-strip sentences are
skipped entirely. The chat bubble still shows the original text
(with emojis) — only the audio path drops them.
**Typing dots indicator**: while LLM is done but TTS synthesis is
still running (~3–5 s for the first segment), the Kazeia bubble now
shows an animated ". / .. / ..." cycle at 400 ms cadence instead of
sitting empty. The moment the first segment actually starts playing,
the cycle cancels, the bubble resets to empty, and the existing
word-by-word reveal takes over. A defensive finally block also
cancels the job when no segment ever fires (e.g. all-emoji reply).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>