- BigVGAN: 8 threads (2757→1872ms), pre_conv/pre_transformer: 4 threads
- Restored pre-computed embeds format (codec_sum+text from Python)
- Text-only trailing embeds don't work: model needs codec_sum for EOS
For long phrases, pre-computed embeds must be generated from Python.
RTF 1.26 on short phrase.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cache input tensor pointers after first prepare_input_tensors call,
then memcpy directly into them for all subsequent steps.
Eliminates ~14000 mallocs per pipeline run (986 CP + 58 talker calls).
Generation: 4640ms → 4007ms (-633ms), total RTF: 1.6 → 1.51
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Key breakthrough: C++ pipeline loop using the SAME Method* instances
that Java loaded (via Module::method("forward")). This gives:
- Same QNN compiled graph → identical numerical results → no trembling
- C++ loop → no Java Tensor/EValue allocation overhead
- prepare_input_tensors + memcpy + Method::execute (like cp_et_runner)
Pipeline: talker ~20ms/step + CP ~44ms/step + decoder 2.8s = 7.3s for 4.64s
Added to executorch JNI:
- Module.nativeSetCpModule() — registers CP module for pipeline
- Module.nativeRunTtsPipeline(...) — runs full talker+CP loop in C++
- Updated executorch.jar with new native method declarations
From RTF 4.9 (start of session) to RTF 1.6 with impeccable audio quality.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>