World’s first SIMD ray-triangle intersection with ARM/NEON

by syoyo

I’ve wrote possibly world’s first SIMD ray-triangle intersection code with ARM/NEON instruction.

The code runs finely on iPod touch 3G(ARM Cortex) and iPhone 3GS.
The performance on iPod touch 3G is around 348 cycles per isect4().
# of assembly instruction of isect4() is around 140, thus the CPI = 348/140 = 2.48, which is not an efficient number for me(CPI should be near to 1.0).

Note that there’s still a room to optimize the code.
e.g. use fmad instruction instead of fadd/fmul combination.

Related articles

– SIMD ray-triangle intersection in Intel/AVX
http://lucille.atso-net.jp/blog/?p=649

– SIMD ray-triangle intersection in OpenCL
http://lucille.atso-net.jp/blog/?p=910

Advertisements