Key breakthrough: C++ pipeline loop using the SAME Method* instances
that Java loaded (via Module::method("forward")). This gives:
- Same QNN compiled graph → identical numerical results → no trembling
- C++ loop → no Java Tensor/EValue allocation overhead
- prepare_input_tensors + memcpy + Method::execute (like cp_et_runner)
Pipeline: talker ~20ms/step + CP ~44ms/step + decoder 2.8s = 7.3s for 4.64s
Added to executorch JNI:
- Module.nativeSetCpModule() — registers CP module for pipeline
- Module.nativeRunTtsPipeline(...) — runs full talker+CP loop in C++
- Updated executorch.jar with new native method declarations
From RTF 4.9 (start of session) to RTF 1.6 with impeccable audio quality.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause found: QNN HTP level=1 compilation is not bitwise
deterministic. Two loads of the same .pte produce slightly different
hidden states → audible trembling in decoded speech.
Java pipeline uses single QNN instance → no trembling, validated quality.
C++ pipeline code preserved for future use when QNN context caching
is fixed (would make both loads use same compiled graph).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The native pipeline was adding zeros after trailing text tokens
instead of tts_eos_embed then tts_pad_embed. This caused the model
to mispronounce final words (e.g. "développement" → "devopment").
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full talker+CP autoregressive loop in C++ via JNI.
Talker 20ms/step, CP 44ms/step, total 6.6s for 4.64s audio.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>