Kazeia Team
|
8bfe6c7445
|
Add NEON SIMD heads argmax for CP — 2.3× speedup
CP head dot products (15 × 2048×1024) optimized with ARM NEON
vfmaq_f32 (4 accumulators, 16 floats/iteration).
CP/frame: 131ms → 58ms, total pipeline: 22.7s → 14.7s (RTF 3.2)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-09 08:55:20 +02:00 |