Monday, February 21, 2011

Building Mellanox OFED 1.5.2 for Rocks 5.4

Here are my notes from Rocks 5.4 and Mellanox OFED 1.5.2

Perform the build steps on a compute node. That way if the build process, run as root, has a bug, we don't risk having to rebuild the head node.

The MLNX_OFED-1.5.2 comes with modules for kernel 2.6.18-194.el5, we are using 2.6.18-194.17.1.el5, so we need to build new kernel modules.

1. Download the ISO file MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5.iso from this page

2. Ensure that the build system is running the correct kernel

# uname -r

2.6.18-194.17.1.el5

3. Mount the ISO and copy the contents to a scratch work area

# mount -t iso9660 -o loop /root/MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5.iso /mnt/cdrom 
# mkdir /root/MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5-2.6.18-194.17.1.el5
# cp -r /mnt/cdrom/* /root/MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5-2.6.18-194.17.1.el5/
# umount /mnt/cdrom
# rm /root/MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5.iso

4. Install some dependencies

# yum -y install libtool tcl-devel libstdc++-devel mkisofs gcc-c++ rpm-build

5. Uninstall some RPM files that will fail to uninstall during the ISO build

# yum remove \*openmpi\*

6. Build the new ISO file

# cd /root/MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5-2.6.18-194.17.1.el5

# ./docs/mlnx_add_kernel_support.sh -i /root/MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5.iso
Note: This program will create MLNX_OFED_LINUX ISO for rhel5.5 under /tmp directory.
      All Mellanox, OEM, OFED, or Distribution IB packages will be removed.
Do you want to continue?[y/N]:y
Building OFED RPMs...
Removing OFED RPMs...
Running mkisofs...
Created /tmp/MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5.iso

# mkdir /share/apps/mellanox/MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5-2.6.18-194.17.1.el5
# mv /tmp/MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5.iso /share/apps/mellanox/MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5-2.6.18-194.17.1.el5/MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5-2.6.18-194.17.1.el5.iso

7. Copy the new files from the iso to the NFS share

# mount -t iso9660 -o loop /share/apps/mellanox/MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5-2.6.18-194.17.1.el5/MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5-2.6.18-194.17.1.el5.iso /mnt/cdrom
# rsync -a /mnt/cdrom/ /share/apps/mellanox/MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5-2.6.18-194.17.1.el5/

# umount /mnt/cdrom

8. List the new kernel modules

# cd /share/apps/mellanox/MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5-2.6.18-194.17.1.el5
# find . -name kernel-* | grep 194.17
./x86_64/kernel-ib-1.5.2-2.6.18_194.17.1.el5.x86_64.rpm
./x86_64/kernel-mft-2.6.2-2.6.18_194.17.1.el5.x86_64.rpm
./x86_64/kernel-ib-devel-1.5.2-2.6.18_194.17.1.el5.x86_64.rpm

9. Test the installer on one of the compute nodes

# cd /share/apps/mellanox/MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5-2.6.18-194.17.1.el5
# ./mlnxofedinstall --force --hpc

This will automatically update the firmware on the HCA.

10. This OFED can be installed on the compute nodes by adding this section to extend-compute.xml (note, I normally put other driver updates into this 'post-98-installdrivers' script). Also notice the yum install, the MLNX OFED install will remove any package containing 'openmpi' in the package name, this line reinstalls said packages


<file name="/etc/rc.d/rocksconfig.d/post-98-installdrivers" perms="0755">
#!/bin/sh

# Install Mellanox
if [ "$(/sbin/lspci | grep -i connectx)" != "" ] ; then
  /usr/bin/yum -y remove openmpi\* rocks-openmpi\*
  /share/apps/mellanox/MLNX_OFED_LINUX-1.5.2-2.0.0-rhel5.5-2.6.18-194.17.1.el5/mlnxofedinstall --hpc --force

  /sbin/chkconfig --add openibd
  /sbin/chkconfig openibd on
  /sbin/service openibd start
fi

/usr/bin/yum -y install my-custom-openmpi my-custom-application-openmpi

/bin/mv /etc/rc.d/rocksconfig.d/post-98-installdrivers /root/post-98-installdrivers

# Reboot one final time
/sbin/shutdown -r now

</file>

Adding Infiniband over IP to Rocks

20120611 - Based on a question to the Rocks mailing list, I'm adding this section to explain how to enable TCP/IP over Inifiniband via Rocks. This process should add the IP addresses to the Rocks managed DNS / hosts. The IP addresses of my compute-0-x nodes start at 254 and work backwards, so that's what I used for the IB ip addresses: First add the new network, calling it 'infiniband', or whatever name you'd like
# rocks add network infiniband subnet=192.168.3.0 netmask=255.255.255.0
# ip=254 && for node in {1..16}; do
   rocks add host interface compute-0-${node} ib0 \
     ip=192.168.3.${ip} subnet=ib-cheaha ;
   let ip=${ip}-1 ;
done
Repeat for the next set of nodes
# ip=238 && for node in {1..16}; do
   rocks add host interface compute-1-${node} ib0 \
     ip=192.168.3.${ip} subnet=ib-cheaha ;
   let ip=${ip}-1 ;
done
And so on... Change the sshd_config on the compute nodes to not use DNS. I have found that ssh to compute nodes take close to a minute when this is set to true
# rocks set attr ssh_use_dns false
Synchronize the configuration

# rocks sync config
Now open the firewall the ib0 for all ports and protocols
# rocks open appliance firewall compute \
   network=infiniband service="all" protocol="all"

# rocks sync host firewall compute

# rocks list host firewall compute-0-1
SERVICE PROTOCOL CHAIN ACTION NETWORK   OUTPUT-NETWORK FLAGS                                COMMENT SOURCE
ssh     tcp      INPUT ACCEPT public     -------------- -m state --state NEW                 ------- G     
all     all      INPUT ACCEPT public     -------------- -m state --state RELATED,ESTABLISHED ------- G     
all     all      INPUT ACCEPT infiniband -------------- ------------------------------------ ------- A     
all     all      INPUT ACCEPT private    -------------- ------------------------------------ ------- G     
Hope this helps

No comments: