AMD OpenCL(x86) v.s. MUDA

by syoyo

Recently, AMD released OpenCL drivers which runs on x86 CPU.
I think it’d be better idea to measure performance of Black-Scholes compuation on AMD’s OpenCL(x86) and MUDA(SSE x86).

Both was run on 1.86 GHz Core2 Linux 32bit.

AMD’s OpenCL(x86)


Option samples           Time taken(sec)          Options / sec
10240000                 4.497                    2.27707e+06

AMD’s OpenCL results in 2.3 M ops/sec.

MUDA version

MUDA version of BlackScholes are taken from MUDA’s source code tree.
https://lucille.svn.sourceforge.net/svnroot/lucille/angelina/haskellmuda/examples/BlackScholes/


[Setup] Generating input data ...
[Setup] Generating input data ... DONE
[Measure] Computing reference ...
[Measure] Computing reference ... DONE
[Measure] Computing with MUDA ...
[Measure] Computing with MUDA ... DONE
[Perf] CPU  = 4781.213000 (msec)  4.283432 MOps/sec
[Perf] MUDA = 781.397300 (msec)  26.209458 MOps/sec
L1 norm: 1.853851E-07
Max absolute error: 1.335144E-05
TEST PASSED

MUDA computes B/S at 26 M ops/sec, which is 10x faster than OpenCL(x86) version!

Because MUDA is open source, you could find why MUDA is faster than AMD’s OpenCL code, hehe…

[Ja]

AMD から OpenCL ベータ実装がリリースされ、x86 でもそのカーネルが動くということで、
MUDA(SSE x86 出力) と比較してみました.
比較対象は BlackScholes 計算式.

結果は…

AMD’s OpenCL(x86) 2.3 M ops/sec
MUDA 26 M ops/sec

… なんだ、圧倒的じゃないか、我が MUDA は 😉
んー、なんで AMD の OpenCL はこんなに遅いんですかねー.
サンプル数は 10240000 個と、データ転送や API コールがネックにならないようにしてみたのですが.

ちなみに、AMD の OpenCL SDK の配布物をいろいろ見てみたのですが、
OpenCL 言語のコンパイラに LLVM 使っていますね.

Advertisements