Matches the 'conversation' feel the user asked for. Previously the
full LLM response appeared in the chat as soon as generation
finished, then audio played 5–10 s later — text and sound felt
decoupled. Now:
- The KAZEIA bubble is created empty and only starts filling when
the first TTS segment actually starts playing through the speaker
(we already split the response by sentence for the chained-
MediaPlayer pipeline; that split drives the reveal too).
- Inside each sentence, words are appended one by one at a cadence
of (audio duration / word count) — slower sentences reveal slower,
matching speech pacing. The first word of each sentence appears
immediately so audio and text stay aligned at the start.
Implementation:
- Qwen3TtsEngine: added `onSegmentPlaying(sentence, durationMs)`
listener, invoked from the chained-MediaPlayer worker the moment
each segment's MediaPlayer.start() lands. Sentence + duration are
carried end-to-end via a new SegmentReady data class.
- KazeiaPipeline.speakText: forwards an optional listener down to
the TTS engine, same signature.
- KazeiaService: new updateMessageText(id, text) helper. In
processLlmResponse, the bubble is added empty before speakText and
grown by a reveal coroutine per sentence; after speakText returns
we snap to the full text as a safety net.
No change to the stream_llm debug intent path — it still uses the
old enqueueSentence flow directly and doesn't need the reveal (no
UI bubble there).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>