Tuesday, March 2, 2010

Build and Install Latest Grid Engine Roll for Rocks 5.3

This FAQ discusses the process of building the Sun Grid Engine roll for a Rocks 5.3 hpc cluster.

Why would you want to rebuild the roll? Possibilities include adding a later version of grid engine, or customizing the roll different from that provided by Rocks.

The latest version of Rocks at this time is 5.3 which includes Grid Engine 6.2u4.

The Grid Engine team released 6.2u5 in January. This version contains many new features and bug fixes over 6.2u4.

I built and tested this roll on a Rocks 5.3 x86_64 virtual machine prior to deploying it to the production cluster.

The overview:
  • Download the Rocks source code
  • Download the Grid Engine source code
  • Build the Roll
  • Add the Roll to the head node
  • Update the RPM on the head node
  • Rebuild the compute nodes

Obtain the Rocks Source Code

  1. Download the Rocks source code tree (http://www.rocksclusters.org/roll-documentation/base/5.3/source-access.html) and make a backup of the original sge roll source in case you need to restore or compare files
  2. $ mkdir -p ~/software/rocks-cluster
    $ cd ~/software/rocks-cluster
    $ hg clone http://fyp.rocksclusters.org/hg/rocks-5.3
    
    destination directory: rocks-5.3
    real URL is http://fyp.rocksclusters.org/hg/rocks-5.3/
    requesting all changes
    adding changesets
    adding manifests
    adding file changes
    added 1 changesets with 2815 changes to 2815 files
    2815 files updated, 0 files merged, 0 files removed, 0 files unresolved
  3. Create a new sge roll build area by copying the existing sge roll directory
  4. $ cd ~/software/rocks-cluster/rocks-5.3/src/roll
    $ cp -a sge sge-v62u5
  5. Download the Grid Engine source code (all versions of source are available at the Documents and Files page http://gridengine.sunsource.net/servlets/ProjectDocumentList ). Extract the source so we can later copy the installer script
  6. $ mkdir -p ~/software/ge6.2u5-source
    $ cd ~/software/ge6.2u5-source
    $ wget http://gridengine.sunsource.net/files/documents/7/215/ge-V62u5_TAG-src.tar.gz
    $ tar -zxf ge-V62u5_TAG-src.tar.gz

Build the SGE Roll

  1. cd to the sge roll directory and copy the 6.2u5 source code to the roll src directory
  2. $ cd ~/software/rocks-cluster/rocks-5.3/src/roll/sge-v62u5/
    $ cp ~/software/ge6.2u5-source/ge-V62u5_TAG-src.tar.gz ~/software/rocks-cluster/rocks-5.3/src/roll/sge-v62u5/src/sge/
  3. Edit the roll version info
  4. vi ~/software/rocks-cluster/rocks-5.3/src/roll/sge-v62u5/version.mk
    
    ROLLNAME = sge
    RELEASE = 62u5
    COLOR   = plum
    
    REDHAT.ROOT = $(PWD)
  5. Edit the SGE version info
  6. vi ~/software/rocks-cluster/rocks-5.3/src/roll/sge-v62u5/src/sge/version.mk
    
    NAME            = sge
    VERSION         = V62u5
    RELEASE         = 1
  7. Create a new patch-files directory for V62u5 and copy the aimk SGE installer script and then edit as follows
  8. $ mkdir -p src/sge/patch-files/V62u5/gridengine/source
    $ cp ~/software/ge6.2u5-source/gridengine/source/aimk src/sge/patch-files/V62u5/gridengine/source/
  9. Edit the aimk install script as follows
  10. $ vim src/sge/patch-files/V62u5/gridengine/source/aimk
    
    set KRBLIB            = "-lkrb5 -lz"
  11. Edit the sge-client.xml file if you want to do any customizations to the clients, leave alone to accept the Rocks customizations
  12. Edit the sge-server.xml file to if you want to customize the head node settings (create custom parallel environments, etc...)
  13. Build the Roll (note you have to be ROOT to run this command)
  14. # cd ~me/software/rocks-cluster/rocks-5.3/src/roll/sge-v62u5
    # make roll > make-roll.log 2>&1

Once this command completes, you should have a new ISO file in the build directory named "sge-5.2-62u5.x86_64.disk1.iso".

If you are installing a new Rocks 5.3 cluster, simply burn this ISO to CD and use it when selecting the SGE roll.

If you are upgrading your existing Rocks 5.3 cluster from 6.2u4 to 6.2u5, proceed to the next section.

Update GE on Existing Rocks 5.3 Cluster

Rocks 5.3 ships with GE62u4, following the procedure above we have built a roll to install GE62u5 to a new cluster. This roll can also be used to upgrade GE on an existing 5.3 cluster. I haven't tested or even thought about the idea of using this roll on a Rocks 5.2 cluster, so I really do not know if it would work.

The downside to this approach is that it doesn't allow you to install the newer version 'side by side' with the previous version. So I would recommend testing this on a virtual machine Rocks 5.3 cluster before doing so on the production cluster.

  1. copy the new roll to the head node if it isn't already there
  2. Add the roll using the rocks command. This will add the updated files from the roll that differ from the already installed sge roll
  3. # rocks add roll sge-5.3-1.x86_64.disk1.iso
    # cd /export/rocks/install
    # rocks create distro
  4. Make sure that all running jobs have drained from the system before proceeding by first disabling the queues.
  5. $ qmod -d '*'
  6. Verify via qstat that running jobs have finished
  7. Stop the SGE services on the compute nodes, again be sure that the jobs have stopped otherwise this process will terminate them (repeat for other compute appliance types)
  8. $ sudo rocks run host compute command="/sbin/service sgeexecd.$(hostname -s) stop"
    $ sudo rocks run host verari-compute command="/sbin/service sgeexecd.$(hostname -s) stop"
  9. Stop the SGE master process on the head node
  10. $ sudo /sbin/service sgemaster.$(hostname -s) stop
  11. Backup your /opt/gridengine directory
  12. $ cd /opt
    $ sudo tar -cjf gridengine-backup-62u4.tar.bz2 gridengine
  13. Run the roll to generate an install script. Under normal "Adding a roll" circumstances, you would procede by running the script, since we are upgrading an existing roll, it isn't necesary to do so. In fact, running the script may potentially overwrite some customizations that have been made to the grid engine since install time.

    We will simply examine the script to find out what is necessary to perform the upgrade.
  14. # rocks run roll sge > /tmp/install-sge-6.2u5.sh
  15. DON'T run the script!
  16. Examination of the script reveals that the only RPM that we really need to upgrade is sge-V62u5-1.x86_64.rpm, the rest of the script was already performed when we installed the head node (creation of parallel environments, sge user account, etc...)
  17. Use the loop below to update the rpms, again the only one that will actually update is sge-V62u5-1.x86_64.rpm, but just in case try them all
  18. # for n in $(grep ^rpm /tmp/install-sge-6.2u5.sh |awk '{print $5}'); do echo $n; rpm -Uvh $n; done
    
    /export/rocks/install/rocks-dist/x86_64/RedHat/RPMS/db4-utils-4.3.29-10.el5.x86_64.rpm
    Preparing...                ########################################### [100%]
    package db4-utils-4.3.29-10.el5.x86_64 is already installed
    
    /export/rocks/install/rocks-dist/x86_64/RedHat/RPMS/javacc-4.0-3jpp.3.x86_64.rpm
    Preparing...                ########################################### [100%]
    package javacc-4.0-3jpp.3.x86_64 is already installed
    
    /export/rocks/install/rocks-dist/x86_64/RedHat/RPMS/junit-3.8.2-3jpp.1.x86_64.rpm
    Preparing...                ########################################### [100%]
    package junit-3.8.2-3jpp.1.x86_64 is already installed
    
    /export/rocks/install/rocks-dist/x86_64/RedHat/RPMS/libXp-devel-1.0.0-8.1.el5.x86_64.rpm
    Preparing...                ########################################### [100%]
    package libXp-devel-1.0.0-8.1.el5.x86_64 is already installed
    
    /export/rocks/install/rocks-dist/x86_64/RedHat/RPMS/openmotif-2.3.1-2.el5.i386.rpm
    Preparing...                ########################################### [100%]
    package openmotif-2.3.1-2.el5.i386 is already installed
    
    /export/rocks/install/rocks-dist/x86_64/RedHat/RPMS/openmotif-devel-2.3.1-2.el5.i386.rpm
    Preparing...                ########################################### [100%]
    package openmotif-devel-2.3.1-2.el5.i386 is already installed
    
    /export/rocks/install/rocks-dist/x86_64/RedHat/RPMS/pam-devel-0.99.6.2-6.el5.i386.rpm
    Preparing...                ########################################### [100%]
    package pam-devel-0.99.6.2-6.el5.i386 is already installed
    
    /export/rocks/install/rocks-dist/x86_64/RedHat/RPMS/rocks-command-sge-5.3-0.x86_64.rpm
    Preparing...                ########################################### [100%]
    package rocks-command-sge-5.3-0.x86_64 is already installed
    file /opt/rocks/lib/python2.4/site-packages/rocks/commands/remove/host/plugin_sge.pyc from install of rocks-command-sge-5.3-0.x86_64 conflicts with file from package rocks-command-sge-5.3-0.x86_64
    file /opt/rocks/lib/python2.4/site-packages/rocks/commands/remove/host/plugin_sge.pyo from install of rocks-command-sge-5.3-0.x86_64 conflicts with file from package rocks-command-sge-5.3-0.x86_64
    file /opt/rocks/lib/python2.4/site-packages/rocks/commands/report/sge/__init__.pyc from install of rocks-command-sge-5.3-0.x86_64 conflicts with file from package rocks-command-sge-5.3-0.x86_64
    file /opt/rocks/lib/python2.4/site-packages/rocks/commands/report/sge/__init__.pyo from install of rocks-command-sge-5.3-0.x86_64 conflicts with file from package rocks-command-sge-5.3-0.x86_64
    file /opt/rocks/lib/python2.4/site-packages/rocks/commands/report/sge/machines/__init__.pyc from install of rocks-command-sge-5.3-0.x86_64 conflicts with file from package rocks-command-sge-5.3-0.x86_64
    file /opt/rocks/lib/python2.4/site-packages/rocks/commands/report/sge/machines/__init__.pyo from install of rocks-command-sge-5.3-0.x86_64 conflicts with file from package rocks-command-sge-5.3-0.x86_64
    
    /export/rocks/install/rocks-dist/x86_64/RedHat/RPMS/rocks-sge-5.3-2.x86_64.rpm
    Preparing...                ########################################### [100%]
    package rocks-sge-5.3-2.x86_64 is already installed
    
    /export/rocks/install/rocks-dist/x86_64/RedHat/RPMS/rocks-sge-5.3-2.x86_64.rpm
    Preparing...                ########################################### [100%]
    package roll-sge-usersguide-5.3-0.x86_64 is already installed
    file /var/www/html/roll-documentation/sge/5.3/index.html from install of roll-sge-usersguide-5.3-0.x86_64 conflicts with file from package roll-sge-usersguide-5.3-0.x86_64
    file /var/www/html/roll-documentation/sge/5.3/rocks-copyright.html from install of roll-sge-usersguide-5.3-0.x86_64 conflicts with file from package roll-sge-usersguide-5.3-0.x86_64
    file /var/www/html/roll-documentation/sge/5.3/roll-sge-usersguide.pdf from install of roll-sge-usersguide-5.3-0.x86_64 conflicts with file from package roll-sge-usersguide-5.3-0.x86_64
    
    /export/rocks/install/rocks-dist/x86_64/RedHat/RPMS/sge-V62u5-1.x86_64.rpm
    Preparing...                ########################################### [100%]
    1:sge                    ########################################### [100%]
    
    /export/rocks/install/rocks-dist/x86_64/RedHat/RPMS/sge-drmaa-5.3-0.noarch.rpm
    Preparing...                ########################################### [100%]
    package sge-drmaa-5.3-0.noarch is already installed
    
    /export/rocks/install/rocks-dist/x86_64/RedHat/RPMS/sge-ganglia-5.3-2.x86_64.rpm
    Preparing...                ########################################### [100%]
    package sge-ganglia-5.3-2.x86_64 is already installed
    file /opt/ganglia/lib64/ganglia/python_modules/sge.pyc from install of sge-ganglia-5.3-2.x86_64 conflicts with file from package sge-ganglia-5.3-2.x86_64
    file /opt/ganglia/lib64/ganglia/python_modules/sge.pyo from install of sge-ganglia-5.3-2.x86_64 conflicts with file from package sge-ganglia-5.3-2.x86_64
    
    /export/rocks/install/rocks-dist/x86_64/RedHat/RPMS/sge-ganglia-5.3-2.x86_64.rpm
    Preparing...                ########################################### [100%]
    package sge-ganglia-5.3-2.x86_64 is already installed
    file /opt/ganglia/lib64/ganglia/python_modules/sge.pyc from install of sge-ganglia-5.3-2.x86_64 conflicts with file from package sge-ganglia-5.3-2.x86_64
    file /opt/ganglia/lib64/ganglia/python_modules/sge.pyo from install of sge-ganglia-5.3-2.x86_64 conflicts with file from package sge-ganglia-5.3-2.x86_64
    
    /export/rocks/install/rocks-dist/x86_64/RedHat/RPMS/sge-insert-ethers-5.3-0.x86_64.rpm
    Preparing...                ########################################### [100%]
    package sge-insert-ethers-5.3-0.x86_64 is already installed
    file /opt/rocks/var/plugins/insertethers/sge.pyc from install of sge-insert-ethers-5.3-0.x86_64 conflicts with file from package sge-insert-ethers-5.3-0.x86_64
    file /opt/rocks/var/plugins/insertethers/sge.pyo from install of sge-insert-ethers-5.3-0.x86_64 conflicts with file from package sge-insert-ethers-5.3-0.x86_64
  19. Start the SGE master process on the head node
  20. $ sudo /sbin/service sgemaster.$(hostname -s) start
  21. Rebuild the compute nodes
  22. # rocks run host compute '/boot/kickstart/cluster-kickstart'
    # rocks run host verari-compute '/boot/kickstart/cluster-kickstart'
  23. Once all of the compute nodes are rebuilt (and finished rebooting multiple times, i.e. OFED install, lustre, etc...) and you are ready to release the jobs, enable the queues
  24. $ qmod -e '*'
Grid Engine 6.2u5 is now installed and running on the head node and compute nodes.

3 comments:

Anonymous said...

This is actually a comment in relation to your repair of an F250 driver's side mirror on "Who Cares". You mentioned splicing the wires from the old male adapter to the new mirror. My question is the new round connector has three wires...one yellow, and the old rectangular connector has four wires, two yellow. What did you do with the extra yellow wire?
Having a difficult time with this, any help would be appreciated. Don.

FlakRat said...

I'll take a look at the wiring and get back to you.

FlakRat said...

I haven't had a chance to look (truck has been in the shop for a few weeks now) but I believe I soldered the 3 yellow wires together.