AMD OpenCL(x86) v.s. MUDA

by syoyo

Recently, AMD released OpenCL drivers which runs on x86 CPU.
I think it’d be better idea to measure performance of Black-Scholes compuation on AMD’s OpenCL(x86) and MUDA(SSE x86).

Both was run on 1.86 GHz Core2 Linux 32bit.

AMD’s OpenCL(x86)

Option samples           Time taken(sec)          Options / sec
10240000                 4.497                    2.27707e+06

AMD’s OpenCL results in 2.3 M ops/sec.

MUDA version

MUDA version of BlackScholes are taken from MUDA’s source code tree.

[Setup] Generating input data ...
[Setup] Generating input data ... DONE
[Measure] Computing reference ...
[Measure] Computing reference ... DONE
[Measure] Computing with MUDA ...
[Measure] Computing with MUDA ... DONE
[Perf] CPU  = 4781.213000 (msec)  4.283432 MOps/sec
[Perf] MUDA = 781.397300 (msec)  26.209458 MOps/sec
L1 norm: 1.853851E-07
Max absolute error: 1.335144E-05

MUDA computes B/S at 26 M ops/sec, which is 10x faster than OpenCL(x86) version!

Because MUDA is open source, you could find why MUDA is faster than AMD’s OpenCL code, hehe…


AMD から OpenCL ベータ実装がリリースされ、x86 でもそのカーネルが動くということで、
MUDA(SSE x86 出力) と比較してみました.
比較対象は BlackScholes 計算式.


AMD’s OpenCL(x86) 2.3 M ops/sec
MUDA 26 M ops/sec

… なんだ、圧倒的じゃないか、我が MUDA は 😉
んー、なんで AMD の OpenCL はこんなに遅いんですかねー.
サンプル数は 10240000 個と、データ転送や API コールがネックにならないようにしてみたのですが.

ちなみに、AMD の OpenCL SDK の配布物をいろいろ見てみたのですが、
OpenCL 言語のコンパイラに LLVM 使っていますね.