World’s first SIMD ray-triangle intersection with ARM/NEON
I’ve wrote possibly world’s first SIMD ray-triangle intersection code with ARM/NEON instruction.
The code runs finely on iPod touch 3G(ARM Cortex) and iPhone 3GS.
The performance on iPod touch 3G is around 348 cycles per isect4().
# of assembly instruction of isect4() is around 140, thus the CPI = 348/140 = 2.48, which is not an efficient number for me(CPI should be near to 1.0).
Note that there’s still a room to optimize the code.
e.g. use fmad instruction instead of fadd/fmul combination.
– SIMD ray-triangle intersection in Intel/AVX
– SIMD ray-triangle intersection in OpenCL