Syoyo Fujita's Blog

raytracing monte carlo

Category: infiniband


rspreload is a DLL replacement for socket() functions  taking a leverage of RDMA transport layer without any application modification.

rspreload is built on top of rsockets feature. See details here for rsockets:

Recent advances of rspreload/rsocket finalIy enables  accelerating existing TCP/IP socket application such like iperf.

(At least it iperf with rspreload didn’t work a years ago).

$ LD_PRELOAD=/usr/local/lib/rsocket/ iperf -c

Client connecting to, TCP port 5001
TCP window size: 128 KByte (default)
[ 3] local port 51626 connected with port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 22.9 GBytes 19.7 Gbits/sec

It can achieve 19.7 Gbits/s in my IB QDR configuration.

This is 2/3 of theoretical peak 32 Gbits/s!  Simply super.

More details here:

Silent InfiniBand QDR switch!

InifniBand switch is knows as very loud because of its high-power fan cooler.

So we made InfiniBand QDR switch… silent!

Now its good to use IB for faster and silent Home/SOHO network 🙂

OpenSM on Illumos + Hermon(ConnectX) … works!

In my previous post, I repoted OpenSM doesn’t work.

But, eventually, I got success to run OpenSM by updating OpenSM’s version. I grabbed patched OpenSM source code from Solaris 11’s GPL source(open-fabrics package), which is version 3.3.9, then adapted it to ofusr package.

That’s all. Yay! It didn’t required to modify hermon driver.

Solaris InfiniBand tools repo

I created a repository which is usable for constructing InfiniBand stuff on OpenIndiana/Illumos

Its something like a debian PPA.

I don’t know Illumos or OI community provides such a PPA infrastructure, so I made a stuff on github.

Enjoy, and feedbacks are welcome!

Soft RoCE

Soft RoCE

is a software implementation of IBoE(InfiniBand over Ethernet).

You can program your InfiniBand program and run it on any Ether devices(e.g. 1 GbE ethernet) without RoCE-capable HBA.

I got success to install and run Soft RoCE on Ubuntu 10.4 TLS running on VMWare fusion, on my MacBookPro(CentOS 5.7 and CentOS 6.2 doesn’t work well. kernel patching failed).

Which means you don’t need InfiniBand hardware to program and test your IB Verbs program.

Happy IB verbs coding with Soft RoCE!

OpenSM on OpenIndiana151a(Illumos kernel) works!

Finally I got success to running OpenSM on OpenIndiana151a(Illumos).

OpenSM is compiled from source with solris patch(provided by ofusr source package from OpenSolaris era, or open-fabrics source package from Solaris 11, which is licensed under CDDL).

In this time, I use OpenSM 3.3.9 + solaris patch(shipped with open-fabrics package).

To run OpenSM on OpenIndiana 151a(Illumos), first I faced it doesn’t work, so I have to debug a IB kernel modules(i.e., sol_umad and tavor InfiniHost driver) to find where the error is.

*DTrace* greatly helped me to investigate where causes error and found tavor doesn’t behave well.

Finally I found I need to tweak a source code of tavor driver source, but modification is very simple. Just change a one line of code. Recompile a kernel module with recent illumos-gate, replace tavor kernel with new one, then things goes well!

openindiana$ sudo opensm
OpenSM 3.3.9
Command Line Arguments:
Log File: /var/log/opensm.log
OpenSM 3.3.9

Entering DISCOVERING state

Using default GUID 0x2c90200201e29

SM Port is down

Entering MASTER state



OpenSM on Solaris11… works!

I got success to compile opensm on Solaris 11, and it seems work well!
(At least, it works well on my IB facilities)

root@solaris11:~# opensm
OpenSM 3.3.9
Command Line Arguments:
Log File: /var/log/opensm.log
OpenSM 3.3.9

Entering DISCOVERING state

Using default GUID 0x2c903000736b9
Entering MASTER state


opensm on Solaris is what many people awaiting for.

opensm is a subnet manager, and at least one subnet manager should be run anywhere on InfiniBand fabric to discover each IB node.
Without opensm, you can’t discover/communicate with other IB nodes.
If there’s no opensm on Solaris, you have to have another linux, windows or IB switch running opensm. But, if you have opensm on Solaris, you don’t need extra IB node!

I don’t know why Solaris 11 open-fabrics package doesn’t include opensm binary…. opensm from open-fabrics source(OFED 1.5.3 Solaris port) is ready to be compiled on Solaris 11.

Unfortunately, running opensm on OpenIndiana(Illumos) is not possible since there is no corresponding kernel driver(kernel component of OFED 1.5.3 Solaris port) on it at this time.

InfiniBand status on Solaris 11

Solaris 11 has InfiniBand HCA driver/SW stacks.
SW stack is based on OFED 1.5.3, and I’ve confirmed some of them works well

Here’s a summary on Solaris 11 InfiniBand status with our facility.

HW: Mellanox ConnectX QDR 1port, Mellanox 8 port QDR switch.
SW: OFUV(Solaris11, installed with default), OFED SL6.0), WinOF 3.0 RC4(Windows)


Works well with Linux client, Windows7 client.

SRP target

SRP target on Solaris11.
Works well with Linux SRP initiator.
Not tested with Windows7 SRP initiator.


ibstat seems doesn’t work for 1 port ConnectX HCA.
It fails with following report:

rdma_bw, rdma_lat, …

Works well.

ib_read_bw, ib_read_lat, …

Works well with linux client, but not with Windows7.


Not tested.


Not tested(How can I test it?)


Some of tools(e.g. ibstat) doesn’t work well on Solaris11 (at least on our facility).
But some vital features work well(RDMA CM, IPoIB, SRP).

Testers wanted

Our test(and investigation) is done in very limited facility and OS configurations.
If you are also interested in(or investing) InfiniBand + Solaris11, I’d like to hear repots from you.

Solaris InfiniBand SW stack short summary

I’m been investigating InfiniBand(RDMA) things on Solaris 10/11.

My ultimate goal is to realize fast and reliable InfiniBand + ZFS storage on top of (Oracle) Solaris 11 or OpenIndiana.

Following is the memo of my survey of InfiniBand stack status on Solaris 10/11.

At this time, OpenIndiana and Nexenta is not based on Solaris 11 kernel/kernel modules(and will never), so many things will fallback to Solaris 10 case.


– OFED ported to Solaris 11 is based on OFED 1.5.3
– OFED ported to OpenIndiana seems based on OFED 1.3. OFED 1.5.3 grabbed from Oracle Solaris 11 doesn’t work on OpenIndiana 151a.

Kernel/kernel module components(10/11)

– uDAPL?
– umad, uverbs, ucma

All these components are kernel component, so you don’t need to install open-fabrics package(OFED upper layer library ported to Solaris).

IPoIB performance on OpenIndiana 151a

Measured with netperf, on AMD AthlonII Neo + IB SDR

1 GbE : 110 MB/s
IB SDR : 620 MB/s

Theoretical peak of IB SDR is around 900 MB/s, so the number of IB SDR will increase if you have much more better CPU.

SRP performance on OpenIndiana 151a

Measured with hdparm against a file created onto /tmp filesystem(ramdisk), on AMD AthlonII Neo + IB SDR

IB SDR + SRP : 558.94 MB/s

IB SRP seems slower than IPoIB, even though measurement situation is not same.
Will need an investigation further.

Solaris as a InfiniBand-ready storage

On current OpenIndiana 151a, you can’t use OFED upper layer tools, e.g. ibstat, ib_read_bw.
Also, you can’t do a RDMA programming using RDMA-CM with same programming API in Linux.
But you can use IPoIB and SRP.
SDP also might work, but I haven’t confirmed it yet.

Thus, to use OpenIndiana as a InfiniBand + ZFS storage, current solution goes to deploing a storage system with IPoIB or SRP.

You might not able to use IPoIB-CM to get a better network performance.

Towards InfiniBand-connected render farm

Recently I’m investigating InfiniBand networking for the use of render farm.

Many render guys might not ever heard of it, so let me briefly explain what is it.

InfiniBand is a low latency, high bandwidth network interface.

For example, InfiniBand QDR(40Gbps. This is the publicly available fastest InfiniBand configuration as of Jun 2011) can achieve 3.2GB/s peak bandwith. Transferring 10GB data within 3 seconds, awesome! This is about 50x more faster than 1 GbE ethernet.

InfiniBand has been widely used in HPC field, but now it seems going to Enterprise market.

In the future, I can easily imagine that network and disk I/O is the most major bottleneck of large scale rendering. This is why I am interested in InfiniBand for network I/O.

Here’s are slides showing my InfiniBand experience.

I am quite confident in InfiniBand right now. Its fast and cost-effective.

In the next phase, I am interested in ioDrive, the fastest SSS(Solid State Storage) disk from Fusion-io, since this might improve reading performance of massive textures and geometries from disk.