March 2009 – Syoyo Fujita's Blog

第二回 LLVM 勉強会参加ありがとうございました.

少々遅れましたが、第二回 LLVM 勉強会、参加された方ありがとうございました. miura さんには遠路はるばる来ていただき yarv2llvm について世界初の講演していただきました. 非常に貴重な勉強会だったと思います. 当日の資料は以下のリンク先にて入手できます. 第二回 LLVM 勉強会ところで、最近は MacRuby も LLVM を使って ruby を高速化しようとしていたり、 Python も unladen-swallow で LLVM を使って Python 実行を高速にしようとしていたりと、だんだんと動的言語も LLVM で早くしていこうみたいな動きが出てきていますね.

lucille v.s. PRMan: Updated

I misinterpreted the value of ShadingRate. ShadingRate 0.5 means that shading point will be generated 2 points per pixel, not 4(2×2) points per pixel. Thus I matches the condition same with lucille, by setting ShadingRate 1.0 and PixelSamples 1 1(1×1 subsamples/pixel). Here’s the updated condition for rendering lighthouse scene: – 512×512 pixel – 2 threads(2Continue reading “lucille v.s. PRMan: Updated”

lucille v.s. PRMan: Performance comparison

(updated) OK, next challenge is fighting the giant: PRMan. soichi_h kindly measured rendering time of lighthouse scene and Mandelbrot shader in PRMan(version 14.2) and reported numbers and rendered images to me. lighthouse scene with AO The setting is same in the previous article, but # of gather samples are raised from 32 to 64 inContinue reading “lucille v.s. PRMan: Performance comparison”

Larrabee New Instruction in C intrinsic function

(via ompf.org) http://software.intel.com/en-us/articles/prototype-primitives-guide/ LNI(Larrabee New Instruction) の C イントリンジック関数定義(とその SW 実装)が出てました. これは C ソースレベルでの LNI のシミュレータとして使うことを想定しているようです. _M512 型とか、_mm512_add_ps() とか、ほとんど SSE や AVX の C イントリンジック関数と同じですね. 実際のアセンブラ命令名も AVX のような形になるようで(vaddps とか)、単に AVX の 512bit 版という感じになるっぽい. LNI についての詳細は今週末の GDC 2009 で公開されるので、そちらも注目ですね.

[Solved] Differences on gradation in mandelbrot shader.

Why the rendering result of mandelbort shader differs is that lucille’s result was GAMMA CORRECTED. (gamma 2.2, rendered by JIT shader) (gamma 1.0, rendered by JIT shader. Same result with 3delight) Wow, how different the image are! [Ja] lucille(JIT shader) と 3delight で mandelbrot shader の結果が違う理由が解決しました. その理由は JIT shader の方は gamma 2.2 で結果を補正していたから！(なんでそうしていたのかは忘れた) gamma 1.0Continue reading “[Solved] Differences on gradation in mandelbrot shader.”

[Paper, ICFP09] Dataflow Optimization Made Simple

Dataflow Optimization Made Simple John Dias, Norman Ramsey, and Simon Peyton Jones, and Satnam Singh, ICFP 09. コンパイラが行うデータフロー最適化は、実はコンパイラの教科書で解説されているほど難しくはなく書けるんだよ、というのを示しているらしい. 論文で提案されているデータフロー最適化のフレームワークは、実際に GHC(Haskell コンパイラ)の新しいバックエンドとして採用されているとのこと. そのためデータフロー最適化フレームワークの実装言語は Haskell で、ターゲットの言語は C–(cminusminus). C– は GHC の中間言語(の中でも下のほうあたり?)として使われている言語. 見た目は C に近いけど、LLVM IR のようにコンパイラが扱いやすい形式となっている. 論文では、提案するデータフロー最適化のフレームワークを使って以下のような最適化や解析が簡単に実装できることを示している. – 生存解析(Liveness Analysis) – 定数畳み込み(Constant Folding) – 使われない代入の除去(Dead-assignment elimination) そして、ここが今回の論文のキモであろう部分: コンパイラ書きにとっては、このフレームワークを使えば、以下の 3 つを提供すれば好みのデータフロー最適化を実現できるらしい. – assertion ( C の assert() とは異なり、Continue reading “[Paper, ICFP09] Dataflow Optimization Made Simple”

lucille v.s. 3delight: Shader performance

OK, next is the comparison of (programmable) shading perfomance. Condition – Mandelbrot shader – 1 shading sample per pixel – Rendering 512×512 one quad polygon – Rendeting with 1 thread(1 core) For lucille, JIT Shader was used. Here is the mandelbort shader code. Result (0.43 secs, lucille JIT shader engine) (5 secs, 3delight 8.0.1) There’sContinue reading “lucille v.s. 3delight: Shader performance”

lucille v.s. 3delight, continued.

I measured raytracing performance of lucille and 3delight again with a moderately complexed scene(200K polys). The setting is same as done in previous article, except for # of samples in gather shader which was reduced from 128 to 32. (lucille, 38.5 secs) (3delight, 213 secs) In this case, lucille is about 5.5x faster than 3delight,Continue reading “lucille v.s. 3delight, continued.”

lucille v.s. 3delight: raytracing performance.

lucille is near to next release and I am freezing features and cleaning up codes. Since codebase of lucille become stable for next release, I did small benchmark comparison with lucille & 3delight, especially for comparing raytracing performance. The condition is ambient occlusion scene(using naive but true raytracing for both renderer). (lucille, 57 secs) (3delightContinue reading “lucille v.s. 3delight: raytracing performance.”

[緊急告知] ヤッターマン + ドラゴンボール鑑賞会 3/14(土)

ゴ、ゴゴ、ゴゴゴゴ、、、　　　　ゴゴ、ゴゴゴ、、、　　ゴ　　　　ゴゴ、、、　　ゴ、、　　ゴ、、、　　　　ゴゴ、ゴゴ、、、、、　　ゴ、、、、突然ですが、Philo 式とコラボが実現しました. 3/14(土曜) にヤッターマン + ドラゴンボール鑑賞会オフを行います. 14:30 渋谷集合になります. CG 野郎で都合の付く方は　3/14 10:00 くらいまでに syoyofujita @ Gメールアドレスまでご連絡ください. # ただしヤッターマンは超人気らしいので、チケットとれない可能性があります.