Cache input tensor pointers after first prepare_input_tensors call, then memcpy directly into them for all subsequent steps. Eliminates ~14000 mallocs per pipeline run (986 CP + 58 talker calls). Generation: 4640ms → 4007ms (-633ms), total RTF: 1.6 → 1.51 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| Module.java | ||
| cp_et_runner.cpp | ||
| cp_et_test_client.cpp | ||
| jni_layer_tts.cpp | ||
| tts_pipeline_jni.cpp | ||