Syoyo Fujita's Blog

raytracing monte carlo

Month: January, 2012

OpenSM on Illumos + Hermon(ConnectX) … works!

In my previous post, I repoted OpenSM doesn’t work.

But, eventually, I got success to run OpenSM by updating OpenSM’s version. I grabbed patched OpenSM source code from Solaris 11’s GPL source(open-fabrics package), which is version 3.3.9, then adapted it to ofusr package.

That’s all. Yay! It didn’t required to modify hermon driver.

Solaris InfiniBand tools repo

I created a repository which is usable for constructing InfiniBand stuff on OpenIndiana/Illumos

https://github.com/syoyo/solaris-infiniband-tools

Its something like a debian PPA.

I don’t know Illumos or OI community provides such a PPA infrastructure, so I made a stuff on github.

Enjoy, and feedbacks are welcome!

OpenSM on Illumos + Hermon(ConnectX)

(Updated: Now confirmed OpenSM works. see my new post)

I am trying to run opensm on illumos(OpenIndiana151a) + hermon(ConnectX).
I found opensm stops at umad_recv(), entering infinite loop of dev_poll().

# opensm
...

ibwarn: [2196] umad_recv: fd 5 umad 8147b88 timeout 4294967295

Breakpoint 1, umad_recv (fd=8, umad=0x8124c68, length=0xfe260f58, timeout_ms=-1) at ../src/umad.c:923
923 struct ib_user_mad *mad = umad;
(gdb) n
926 errno = 0;
(gdb)
927 TRACE("fd %d umad %p timeout %u", fd, umad, timeout_ms);
(gdb)
ibwarn: [2103] umad_recv: fd 8 umad 8124c68 timeout 4294967295
929 if (!umad || !length) {
(gdb)
934 if (timeout_ms && (n = dev_poll(fd, timeout_ms)) < 0) {
(gdb)
^C

Debugging kernel IB driver with dtrace shows hermon(or sol_umad) kernel is also entering infinite loop

...
3 71205 umad_prop_op:entry umad_enter:
3 71206 umad_prop_op:return 0x00000000, 0
3 71205 umad_prop_op:entry umad_enter:
3 71206 umad_prop_op:return 0x00000000, 0
...

So, my humble guess is there’s some race-condition between opensm(upper layer umad) and ib kernel(kernel layer umad).

Soft RoCE

Soft RoCE

http://www.systemfabricworks.com/downloads/roce

is a software implementation of IBoE(InfiniBand over Ethernet).

You can program your InfiniBand program and run it on any Ether devices(e.g. 1 GbE ethernet) without RoCE-capable HBA.

I got success to install and run Soft RoCE on Ubuntu 10.4 TLS running on VMWare fusion, on my MacBookPro(CentOS 5.7 and CentOS 6.2 doesn’t work well. kernel patching failed).

Which means you don’t need InfiniBand hardware to program and test your IB Verbs program.

Happy IB verbs coding with Soft RoCE!