OpenSM on Illumos + Hermon(ConnectX)

by syoyo

(Updated: Now confirmed OpenSM works. see my new post)

I am trying to run opensm on illumos(OpenIndiana151a) + hermon(ConnectX).
I found opensm stops at umad_recv(), entering infinite loop of dev_poll().

# opensm
...

ibwarn: [2196] umad_recv: fd 5 umad 8147b88 timeout 4294967295

Breakpoint 1, umad_recv (fd=8, umad=0x8124c68, length=0xfe260f58, timeout_ms=-1) at ../src/umad.c:923
923 struct ib_user_mad *mad = umad;
(gdb) n
926 errno = 0;
(gdb)
927 TRACE("fd %d umad %p timeout %u", fd, umad, timeout_ms);
(gdb)
ibwarn: [2103] umad_recv: fd 8 umad 8124c68 timeout 4294967295
929 if (!umad || !length) {
(gdb)
934 if (timeout_ms && (n = dev_poll(fd, timeout_ms)) < 0) {
(gdb)
^C

Debugging kernel IB driver with dtrace shows hermon(or sol_umad) kernel is also entering infinite loop

...
3 71205 umad_prop_op:entry umad_enter:
3 71206 umad_prop_op:return 0x00000000, 0
3 71205 umad_prop_op:entry umad_enter:
3 71206 umad_prop_op:return 0x00000000, 0
...

So, my humble guess is there’s some race-condition between opensm(upper layer umad) and ib kernel(kernel layer umad).

Advertisements