Extensive investigation of the audible "tremor" in the generated voice-cloned
audio. Conclusion is architectural, not a bug:
* Hexagon HMX fp16 talker logits correlate with PyTorch fp32 at 0.999998
* ONNX Runtime CP V2 is bit-identical to PyTorch greedy CP (0.24% residual
divergence measured by injecting Python's captured cb0 at each step —
14/16 codebooks match 100%, cb14/cb15 miss 1 token out of 53)
* BigVGAN decoder is bit-identical to PyTorch (validated earlier)
* Therefore the tremor is caused entirely by the ~28% of cb0 argmax flips
where the tiny fp16 logits drift crosses the top-1/top-2 margin. This
cascades through the autoregressive chain into a trajectory the model
never saw at training time → incoherent artifacts.
Cross-architecture test (x86 AVX-512 / ARM64 NEON+HMX) cannot be zeroed by
any runtime swap — LibTorch Android would use NEON kernels with a different
reduction order than PyTorch x86, same class of error, smaller but non-zero
residual. Temperature tweaking (0.3 → 0.9) and greedy-vs-sample gave no
perceptual difference: the floor is numeric, not in the sampling layer.
Accepted for MVP. Documented in project_tts_cross_arch_limit.md — this is a
thesis-relevant finding about on-device TTS deployment limits.
Cleanup:
* All diagnostic flags (force_inject_pycb0, force_greedy_cb0, cb0_temp,
force_python_codes, force_cpu_talker, force_cpu_talker_gguf) now gated
behind BuildConfig.DEBUG via diagFlag()/diagFile() helpers. Release
builds JIT-eliminate the file checks; debug builds keep the whole
experimental toolchain for re-running the analysis for demos/thesis.
* force_hexagon + force_cp_v2 stay unconditional — production routing.
* Prefill cb0 now respects force_greedy_cb0 (was always sampleTopK 0.9).
* Native TTS pipeline (executorch-custom/jni_layer_tts.cpp,
app/src/main/jni/tts_pipeline.cpp): pad-zone sampling switched to
greedy argmax so EOS gets a fair chance (temp 0.9 top-k kept producing
audio past EOS where Python's seeded sampler terminated naturally).
* scripts/prepare_tts_voiceclone.py: new script that captures Python
greedy-CP reference (stochastic talker for EOS, deterministic CP) for
token-by-token comparison.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>