Adds a breathing lavender orb centred above the chat list that tracks
the actual audio state of the app:
- **Idle**: slow respiratory pulsation (~4 s cycle) at 20 fps. The
chatbot is visually "awake" without animating loudly.
- **Listening**: halo swells with live mic RMS from the VAD loop, so
the user sees Kazeia hearing them even before Whisper has produced
any transcription. Mic RMS is normalised with the same sqrt
squashing the TTS envelope uses so quiet speech still reads visibly.
- **Speaking**: amplitude + halo driven by a pre-computed RMS envelope
(50 ms windows, sqrt-normalised) produced at synthesis time. Ripples
fire on local peaks above 0.35 — matches speech rhythm without
overwhelming. Timer is internal to the view, synced to the segment's
durationMs; no MediaPlayer position polling.
Architecture:
- Sidecar RMS envelope. Computed in Qwen3TtsEngine.generateSegmentAudioVC
right after PCM is available, packed into SegmentReady, and handed to
onSegmentPlaying(sentence, durationMs, rmsEnvelope) when each MediaPlayer
starts. Zero extra IO — runs on the same PCM we already write to WAV.
- KazeiaService exposes VisualizerSignal (Idle | Listening(rms) |
Speaking(env, dur)) as a StateFlow. The VAD loop pushes Listening,
processLlmResponse pushes Speaking from the per-segment TTS callback,
and finally clears to Idle when no mic is open.
- AudioVisualizerView renders via Choreographer.FrameCallback, self-
throttled to 20 fps at Idle and full refresh during Listening/
Speaking. Hardware layer. Pure Kotlin + Canvas, no deps. ~280 LOC.
Layout: 140 dp strip between voiceBar and rvMessages in activity_chat.xml.
No 3D engine, no Unity, no splash extension. The avatar design work
remains on disk for a later phase when the TTS+streaming pipeline
stabilises enough to spend time on DECA/FLAME integration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>