OFED 1.5.4.1 userland port to OpenIndiana/Illumos

I’ve ported OFED 1.5.4.1 userland to OpenIndiana151a/Illumos.

https://github.com/syoyo/oi-build/tree/ofed-1.5.4.1/components/open-fabrics

This port is based on previous work of open-fabrics package from Solaris 11, which is disclosed under a GPL/CDDL.

Still I just got success to port and build the source code.
I need a people who can evaluate this in an actual IB environment.

Challengers wanted!

57 thoughts on “OFED 1.5.4.1 userland port to OpenIndiana/Illumos

    1. NFS/RDMA is not part of userland. I didn’t ported kernelland of OFED 1.5.4.1 to Illumos. Anyway I guess NDS/RDMA works with a Linux client(using current IB kernel module in Illumos).

  1. which driver do you use for your ib card? Could you show me more information for the driver installation?

  2. Ok, I build this under openindiana 151a4

    But it still seems like some linux stuff is still in it, I verified the patchs where applied to the source.

    ————————————————-
    OpenSM 3.3.9
    Command Line Arguments:
    Log File: /var/log/opensm.log
    ————————————————-
    OpenSM 3.3.9

    ibwarn: [15310] umad_init: can’t read ABI version from /sys/class/infiniband_mad/abi_version (m): is ib_umad module loaded?
    Entering DISCOVERING state

    No local ports detected!
    Exiting SM

    1. I think you are using old OpenSM.

      If I remember correctly, the build script of open-fabrics package might not build & install opensm/ directory(which is version 3.3.13). You might need fresh opensm binary by hand.

      1. odd, I downloaded it from your repo above, and that compiles opensm fine, using version 3.3.9. I modified it for 3.3.13, but the patchs fail, I could probably adjust the patchs easily enough.

  3. Thanks for you answer.
    I tried another card which is ”
    Mellanox Technologies MT25418 [ConnectX VPI PCIe 2.0 2.5GT/s – IB DDR / 10GigE]

    This is what it shows from prtconf:
    pci15b3,634a (driver not attached)

    from modinfo:
    102 fffffffff7da1000 beb0 – 1 ibdm (InfiniBand Device Manager)
    103 fffffffff7dad000 143e8 – 1 ibtl (IB Transport Layer)
    104 fffffffff7dc1000 67bf0 – 1 ibmf (IB Agent Interfaces 2.0)
    130 fffffffff7e72000 2de38 – 1 ibcm (IB Communication Manager)

    1. something is either broken on your system, or your using old version. Atleast using openindiana that should work out of the box.

      hermon “pciex15b3,6340”
      hermon “pciex15b3,634a”
      hermon “pciex15b3,6732”
      hermon “pciex15b3,673c”
      hermon “pciex15b3,6746”

  4. Thanks you for the information guys.

    for those cards with “○は動作確認済” in the list, Do I need to install the driver manually with openindiana?

    1. As you may know, there’s only 2 driver exists for OpenIndinana: Tavor(InfiniHost) and Hermon(ConnectX). And these driver are provided by default, using pkg:/driver/network/ib.

      So if you use ConnectX, basically you don’t need to install driver manually.

      1. I used the latest build of openindiana and needed to install hermon driver manually. This is because I used desktop build. Make sure you didn’t make this same mistake.

  5. this is the version that I am using:

    root@openindiana:~# cat /etc/release
    OpenIndiana Development oi_151a X86
    Copyright 2010 Oracle and/or its affiliates. All rights reserved.
    Use is subject to license terms.
    Assembled 01 September 2011

    root@openindiana:~# uname -a
    SunOS openindiana 5.11 oi_151a3 i86pc i386 i86pc Solaris

    1. “pci15b3” seems ok, since my Tavor(InfiniHost) card is detected as,

      $ ls -R /devices | grep 15b3
      pci15b3,6278@0
      pci15b3,6278@0:devctl
      /devices/pci@0,0/pci1022,9603@2/pci15b3,6278@0:
      /devices/pci@0,0/pci1022,9603@2/pci15b3,6278@0/ibport@1,0,ipib:
      /devices/pci@0,0/pci1022,9603@2/pci15b3,6278@0/ibport@2,0,ipib:

    2. Odd, seems your have a pci version of that card, but the driver only is setup fro the pci-express version. Just adding pci15b3,634a would bind the driver to that card.

  6. is this the correct way for binding the driver? I put this line to the /etc/driver_aliases:

    hermon “pci15b3,634a”

  7. I got the same error when I used add_drv to bind the driver, same for ib and hermon

    devfsadm: driver failed to attach: ib
    Warning: Driver (ib) successfully added to system but failed to attach

  8. Hello Syoyo,

    I have another problem. I would like to use OI with ZFS to provide storage for a glusterfs server with nfs.
    My test environment:
    Node 1: CentOS 6.2 with OFED 1.5.4.1
    Node 2: OI 151a4 with native IB

    If I run a test with iperf:
    From CentOS to OI throughput around 4.90Gbit/s

    > [root@dev-cos62 ~]# iperf -c 1.1.1.2
    > ————————————————————
    > Client connecting to 1.1.1.2, TCP port 5001
    > TCP window size: 193 KByte (default)
    > ————————————————————
    > [ 3] local 1.1.1.1 port 36173 connected with 1.1.1.2 port 5001
    > [ ID] Interval Transfer Bandwidth
    > [ 3] 0.0-10.0 sec 5.66 GBytes 4.86 Gbits/sec

    From OI to CentOS throughput only 900Mbit/s

    > Croot@dev-oi:~# iperf -c 1.1.1.1
    > ————————————————————
    > Client connecting to 1.1.1.1, TCP port 5001
    > TCP window size: 256 KByte (default)
    > ————————————————————
    > [ 3] local 1.1.1.2 port 35841 connected with 1.1.1.1 port 5001
    > [ ID] Interval Transfer Bandwidth
    > [ 3] 0.0-10.0 sec 1.13 GBytes 968 Mbits/sec

    My IB Hardware is a new Mellanox InfiniScale switch and some older 10Gbit Mellanox HCAs.

    Have you any idea?

    thanks and greetings from Germany

    1. CentOS -> OI seems good, because connection is done with IPoIB, so there’s CPU overhead to process IP packet.

      OI -> CentOS is strange… Are you using same HW configuration(especially CPU)? If so, my humble guess is OI consumes a lot of IP processing power than Linux(for security, packet filtering or else).

      Also, iperf might not be a good measurement tool.

      rdma_bw, rdma_lat is the best tool measure the performance of RDMA.

    2. Did you try both sides in connected and datagram mode?

      My speed is horrible at datagram mode with 2k mtu, so much better when it can offload full 64k frames all at once using connected mode, but linux defaults to datagram mode, and oi defaults to connected mode.

  9. Many thanks for the fast reply

    @Patrick, i have tested with datagram and connected mode. at the moment have i used the connected mode. Same results

    @Syoyo, my test environment. CentOS Machine is a DL585 with 4x dual Opteron CPUs and 32GB Ram SAS Harddrives . The OI are two 100% identical FSC workstations with 2x Dualcore Xeon CPUs and 4GB RAM.
    I have testet least three weeks some differnet installations und connections. also OI to OI.

    I have the error reduced. From CentOS with ofed 1.5.4.1 to a CentOS with ofed 1.5.4.1. the troughput are good with around 5Gbit/s.

    I think its the firmware compatibility with the native kernel driver on OI(solaris):
    1. OI outgoing troughput with fw version 3.5 on MTLP23108 Adapter max 980Mbit/s
    2. OI outgoing troughput with fw version 3.1 on MTLP23108 Adapter max 3.5Gbit/s
    that ist much better!

    but the fw 3.1 do not work with the switch. The adapter Port come not up.
    that works only when i connect the adapter ports direkt “adapter to adpater” without switch.

    i want use your Ofed 1.5.4.1 for OI to test the results. But at compile i become a error at libibvers.

    many thaks!

    1. What is the error message when you compile libibverbs? I might be able to fix it.

      Note that My 1.5.4.1 port is only userland codes, it does not include update on kernel drivers.

      Thus, 1.5.4.1 on OI might not improve the performance on fw3.5.

      1. Hi Syoyo, i have buy me now some other HCAs. four SFS-HCA-E2T7-A1 equal to MHEA28-1TC (10Gbit) and four HP 592520R-B21 equal to MHQH29C-XTR (40 Gbit). I testing this adapters and poste here the results for an Hardware compatibillity list. I think that is a good idea 🙂

        when i’m later at the office i try to compile the ofed again and tell you step by step what i do.

  10. yes i have some news.
    i have test some scenarios.
    centos to centos, centso to OI, OI to Centos and OI to OI.
    the test hardware was two IBM X3655 Servers with 16 GB RAM
    One Mellanox Infiniscale IV 8 Port ADR Switch

    1. HP NC570C is equal to Mellanox InfiniHost MHXL-CF128 with chipset MTLP23108-C it’s a PCI-X card. The chipset MTPL23108 works with Firmware 3.1 at all systems. But “not” with the switch! With the Firmware 3.5 works the cards with the switch but “not” with OI, the outgoing throughput is limited to 1Gbit. Incomming throughput ist ok.
    Card with chipset MT50521D01 have the same problem.
    All cards in OI with firmware 3.1 have an high latenz 0.140 0.170 ms
    All Cards in Centsos with fw 3.1 and 3.5 have a low latenz
    Positive is: very cheep card you can buy this at ebay for 20,- Euros
    I think all older Infinihost Adapters (cougar cub) have this problems. buy with caution if you want use this with OI

    2. InfiniHost III Ex (Lion Cub) (Chipset MT25208).
    The Cisco SFS-HCA-E2T7-A1 Dual Port is equal to Mellanox MHEA28-1TC that it’s a PCI-E x8 Card. The HCA works fine with new Firmware and with all systems with low latenz. The Bandwith works with maximum performance.
    So the HCA works out of the box. Price 100,- Euro at ebay

    3. ConnectX-2 VDI Adpater works with no problems.
    I have buy HP 592520-B21 HCA is equal to MHQH29C-XTR .It’s a 40Gb/s 10GbE, PCIe 2.0 x8 5.0GT/s Adapter. The Adapter works with all Systems out of the Box.
    Caution: Put you this adapter in a PCI-E 1.0 slot the HCA reduce the Bandwith down to 10Gbit! It has lasted a while until I discovered this. But that works fine. My IBM have only PCI-E 1.0 i can not teste the full bandwith but i think that works.

  11. Syoyo

    I have been reading your blog entries for a week trying to get opensm running on OpenIndiana 151a5. I am not a developer and have had to learn a few things about compiling on OI and git.

    I have installed the updated binary tavor driver and compiled your 1.5.4.1 ofed userland.

    My hardware is two Infinihost PCI-X adapters connected back to back which have been working when one was running Linux. Firmware is 3.5.0

    I get the following:
    root@openindiana:~# /usr/sbin/amd64/opensm
    ————————————————-
    OpenSM 3.3.13
    Command Line Arguments:
    Log File: /var/log/opensm.log
    ————————————————-
    OpenSM 3.3.13

    Entering DISCOVERING state

    Using default GUID 0x5ad00000363aa

    Error from osm_opensm_bind (0x2A)
    Perhaps another instance of OpenSM is already running
    Exiting SM

    Log file shows:
    Aug 14 19:31:47 887819 [0001] 0x03 -> OpenSM 3.3.13
    Aug 14 19:31:47 888175 [0001] 0x80 -> OpenSM 3.3.13
    Aug 14 19:31:47 907994 [0001] 0x02 -> osm_vendor_init: 1000 pending umads specified
    Aug 14 19:31:47 908487 [0001] 0x80 -> Entering DISCOVERING state
    Aug 14 19:31:48 062860 [0001] 0x02 -> osm_vendor_bind: Binding to port 0x5ad00000363aa
    Aug 14 19:31:48 447321 [0001] 0x01 -> osm_vendor_open_port: ERR 542C: umad_open_port() failed
    Aug 14 19:31:48 447455 [0001] 0x01 -> osm_vendor_bind: ERR 5424: Unable to open port 0x5ad00000363aa
    Aug 14 19:31:48 447511 [0001] 0x01 -> osm_sm_mad_ctrl_bind: ERR 3118: Vendor specific bind failed
    Aug 14 19:31:48 447537 [0001] 0x01 -> osm_sm_bind: ERR 2E10: SM MAD Controller bind failed (IB_ERROR)
    Aug 14 19:31:48 447634 [0001] 0x01 -> perfmgr_mad_unbind: ERR 4C05: No previous bind
    Aug 14 19:31:48 447657 [0001] 0x01 -> osm_sa_mad_ctrl_unbind: ERR 1A11: No previous bind
    Aug 14 19:31:48 449072 [0001] 0x80 -> Exiting SM

    Any pointers as to how I can get opensm running would be appreciated.

    Thanks

    Peter

      1. Yes I have confirmed it connected directly to a Linux box using opensm 3.3.13 under CentOS. It works with both the standard tavor driver and your modified tavor binary.

        When you got opensm working which sources did you use?

      2. At least, OpenSM 3.3.9(+patch) will work with ConnectX. 3.3.13(+patch) will be much better, but might not be good for InfiniHost.

      3. Sorry to take so long to reply but I have some more information. My IB HCA is not mem free and is confirmed working.

        I have compiled OpenSM 3.3.13 again from scratch and can see why it is having issues.

        When opensm starts up it is trying to open /devices/ib/umad0 which does not exist. It then exits.

        /devices/ib looks like this:

        drwxr-xr-x 2 root sys 2 … daplt@0
        crw-r–r– 1 root sys 69, 0 … daplt@0:daplt
        drwxr-xr-x 2 root sys 2 … eibnx@0
        crw-rw-rw- 1 root sys 75, 0 … eibnx@0:devctl
        drwxr-xr-x 2 root sys 2 …iser@0
        crw——- 1 root sys 42, 0 … iser@0:iser
        drwxr-xr-x 2 root sys 2 … rdsib@0
        crw-r–r– 1 root sys 71, 0 … rdsib@0:rdsib
        drwxr-xr-x 2 root sys 2 … rdsv3@0
        crw-r–r– 1 root sys 59, 0 … rdsv3@0:rdsv3
        drwxr-xr-x 2 root sys 2 … rpcib@0
        crw-r–r– 1 root sys 169, 0 … rpcib@0:rpcib
        drwxr-xr-x 2 root sys 2 … sdpib@0
        crw-r–r– 1 root sys 46, 0 … sdpib@0:sdpib
        drwxr-xr-x 2 root sys 2 … sol_umad@0
        crw-rw-rw- 1 root sys 31, 32768 … sol_umad@0:issm0
        crw-rw-rw- 1 root sys 31, 32769 … sol_umad@0:issm1
        crw-rw-rw- 1 root sys 31, 0 … sol_umad@0:umad0
        crw-rw-rw- 1 root sys 31, 1 … sol_umad@0:umad1
        drwxr-xr-x 2 root sys 2 … sol_uverbs@0
        crw-rw-rw- 1 root sys 32, 17 … sol_uverbs@0:event
        crw-rw-rw- 1 root sys 32, 16 … sol_uverbs@0:ucma
        crw-rw-rw- 1 root sys 32, 0 … sol_uverbs@0:uverbs0

        Am I missing something?

  12. I downloaded the newer oi-build (opensm 3.3.13), but a ‘gmake install’ can’t patch and fails:

    /usr/gnu/bin/patch -d /scratch/oi-build/components/open-fabrics/libibumad/libibumad-1.3.7 -p1 –backup –version-control=numbered < patches/base.patch
    abort: There is no Mercurial repository here (.hg not found)!
    patching file Makefile.in
    Hunk #1 FAILED at 367.

    The former with opensm 3.3.9 compiles but exit.

    OpenSM 3.3.9

    ibwarn: [26075] umad_init: can't read ABI version from /sys/class/infiniband_mad/abi_version (m): is ib_umad module loaded?
    Entering DISCOVERING state

    No local ports detected!
    Exiting SM

    The card is a Mellanox MHQH29-XTC and works fine when connected to win7 with openfabric's subnet manager.

    This is on oi_151a5.

    regards
    Claus

    1. I have had similar issues you just need to install mercurial (pkg install mercurial) and you should be good to go.

  13. Tried to run OFED 1.5.4.1 on OI 151a5. Got the same issues as in https://syoyo.wordpress.com/2012/02/13/ofed-1-5-4-1-userland-port-to-openindianaillumos/#comment-365

    umad tries to open /devices/ib/umad0 which does not exist. I thought it was meant /devices/ib/sol_umad@0:umad0 which is there – changed sources and rebuilt everything – subnet seems to be really UP. For a second 🙂 After that kernel crashes and crash dump saved.

    So the question is what else can I try to make it working? Perhaps I should switch to the same OI build which you used? Or ib switch with sm from ebay would be the most effective solution?

    I’m using ConnextX card. Works perfectly if subnet manager is running on other side.

    And I’m using really ofed-1.5.4.1 branch with 3.3.13 opensm and libraries built from the same branch.

    1. I used OI151a, but its too older thus you’d be better to stick into recent OI build: i.e. 151a5.
      Unfortunately I have no idea why the kernel crashes… As far as I know IB kernel code in illumos has not been maintained for 2 or more years, so the problem might be exist in other area, not in IB kerne/OFED stack.

  14. Hi,

    I’m trying to run OpenSM on OpenIndiana 151a7 but it does not works.
    The infiniband card is MHQH29C-XTR and works in SRP and IPOIB when subnet manager is running on other side.

    # ./opensm/build/i86/opensm/opensm
    ————————————————-
    OpenSM 3.3.13
    Command Line Arguments:
    Log File: /var/log/opensm.log
    ————————————————-
    OpenSM 3.3.13

    ibwarn: [11438] umad_init: can’t read ABI version from /sys/class/infiniband_mad/abi_version (m): is ib_umad module loaded?
    Entering DISCOVERING state

    No local ports detected!
    Exiting SM

    ###############################################

    On OpenIndiana there is not /sys/class. This error seems to block the port detection, but i’m not sure.

    Do you have an idea ?

    Thanks,
    Julien Durand.

Leave a reply to syoyo Cancel reply