kazeia/executorch-custom
Kazeia Team 985fd9cff9 Direct output→input KV copy: RTF 1.51 → 1.31
Skip intermediate KV buffer: copy output tensors directly into
next step's input pointers. Saves ~1.5GB/run of memcpy for talker
(28L × 2 × 100×8×128 floats × 58 steps) and CP similarly.

Generation: 4007ms → 3713ms, total: 7180ms → 6078ms

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 12:23:45 +02:00
..
Module.java Shared Module C++ pipeline: RTF 1.6 with perfect quality 2026-04-09 12:05:58 +02:00
cp_et_runner.cpp Native C++ pipeline: RTF 1.4 (was 3.6 in Java) 2026-04-09 10:09:32 +02:00
cp_et_test_client.cpp Native C++ pipeline: RTF 1.4 (was 3.6 in Java) 2026-04-09 10:09:32 +02:00
jni_layer_tts.cpp Direct output→input KV copy: RTF 1.51 → 1.31 2026-04-09 12:23:45 +02:00
tts_pipeline_jni.cpp Disable C++ pipeline (QNN non-deterministic), keep Java RTF 1.8 2026-04-09 11:42:49 +02:00