Thursday, December 1, 2016

HowTo: Restore a GridScaler GPFS Client Node after Reinstalling the Node

I ran into this issue after reinstalling several compute nodes on our cluster shortly after bringing our new DDN GridScaler GPFS storage cluster online.
$ sudo mmstartup -N c0040
Fri Dec  2 03:36:03 UTC 2016: mmstartup: Starting GPFS ...
c0040:  mmremote: determineMode: Missing file /var/mmfs/gen/mmsdrfs.
c0040:  mmremote: This node does not belong to a GPFS cluster.
mmstartup: Command failed. Examine previous error messages to determine cause.

One method I discovered online was to take the affected node off of the network (or reboot it), remove it from the GPFS cluster, once it's back on the network (or fully rebooted), add it back, license it and start it.

Later I was introduced to the mmsdrrestore command (portion of the man file below:
mmsdrrestore command

Restores the latest GPFS system files on the specified nodes.

Synopsis

mmsdrrestore [-p NodeName] [-F mmsdrfsFile] [-R remoteFileCopyCommand]
             [-a | -N {Node[,Node...] | NodeFile | NodeClass}]

Availability

Available on all IBM Spectrum Scale editions.

Description

The mmsdrrestore command is intended for use by experienced
system administrators.

Use the mmsdrrestore command to restore the latest GPFS
system files on the specified nodes. If no nodes are specified,
the command restores the configuration information only on the
node on which is it run. If the local GPFS configuration file is
missing, the file that is specified with the -F option from
the node that is specified with the -p option is used
instead. This command works best when used with the
mmsdrbackup user exit. See the following IBM Spectrum
Scale: Administration and Programming Reference topic:
mmsdrbackup user exit.

...

Here's an example of using the command to restore the configuration to node c0040 using primary server gs0 (i.e. one of the NSD servers)
$ sudo mmsdrrestore -p gs0 -N c0040
Fri Dec  2 03:47:06 UTC 2016: mmsdrrestore: Processing node gs0
Fri Dec  2 03:47:08 UTC 2016: mmsdrrestore: Processing node c0040
mmsdrrestore: Command successfully completed

Finally, start GPFS on the client (which also mounts the file system(s) if configured to do so
$ sudo mmstartup -N c0040

Monday, August 22, 2016

Dell OMSA 8.3 on CentOS 7.2 "Error! Chassis info setting unavailable on this system."

After installing Dell OMSA 8.3 on a new PowerEdge 730xd running CentOS 7.2 x86_64, the omreport chassis info command reports the following (after starting the services):

# omreport chassis info

Error! Chassis info setting unavailable on this system.
First, the solution (Zurd on the mailing list pointed me here: http://lists.us.dell.com/pipermail/linux-poweredge/2016-August/050692.html) followed by the full ticket I sent to the Dell linux-poweredge mailing list.

The solution for CentOS users (and possibly other non-supported distros) is to stop the services, make the following change, then restart the services
--- /opt/dell/srvadmin/etc/srvadmin-storage/stsvc.ini.orig 2016-08-22 21:28:32.079580254 -0500
+++ /opt/dell/srvadmin/etc/srvadmin-storage/stsvc.ini 2016-08-22 21:20:32.374317823 -0500
@@ -116,7 +116,7 @@
 vil4=dsm_sm_sasvil
 vil5=dsm_sm_sasenclvil
 vil6=dsm_sm_swrvil
-vil7=dsm_sm_psrvil
+; vil7=dsm_sm_psrvil
 vil8=dsm_sm_rnavil

 [SSDSmartInterval]
Now on to the full details of the issue:
  1. Install Dell OMSA 8.3
  2. # wget -q -O - http://linux.dell.com/repo/hardware/dsu/bootstrap.cgi | bash
    # yum clean all
    # yum -y install kernel-devel kernel-headers gcc dell-system-update
    # yum -y install srvadmin-all
  3. Next check the status of the services (not started)
    # srvadmin-services.sh status
    dell_rbu (module) is stopped
    ipmi driver is running
    dsm_sa_datamgrd is stopped
    dsm_sa_eventmgrd is stopped
    dsm_sa_snmpd is stopped
    ● dsm_om_shrsvc.service - LSB: DSM OM Shared Services
       Loaded: loaded (/etc/rc.d/init.d/dsm_om_shrsvc)
       Active: inactive (dead)
         Docs: man:systemd-sysv-generator(8)
     
    Aug 22 20:43:08 r730xd-srv01.local systemd[1]: Starting LSB: DSM OM Shared Services...
    Aug 22 20:43:08 r730xd-srv01.local dsm_om_shrsvc[5144]: [47B blob data]
    Aug 22 20:43:08 r730xd-srv01.local systemd[1]: Started LSB: DSM OM Shared Services.
    Aug 22 20:43:08 r730xd-srv01.local dsm_om_shrsvc[5144]: tput: No value for $TERM and no -T specified
    Aug 22 20:45:55 r730xd-srv01.local systemd[1]: Stopping LSB: DSM OM Shared Services...
    Aug 22 20:45:55 r730xd-srv01.local dsm_om_shrsvc[8804]: [52B blob data]
    Aug 22 20:45:55 r730xd-srv01.local systemd[1]: Stopped LSB: DSM OM Shared Services.
    Aug 22 20:46:28 r730xd-srv01.local systemd[1]: Stopped LSB: DSM OM Shared Services.
    ● dsm_om_connsvc.service - LSB: DSM OM Connection Service
       Loaded: loaded (/etc/rc.d/init.d/dsm_om_connsvc)
       Active: inactive (dead)
         Docs: man:systemd-sysv-generator(8)
     
    Aug 22 20:43:08 r730xd-srv01.local systemd[1]: Starting LSB: DSM OM Connection Service...
    Aug 22 20:43:08 r730xd-srv01.local dsm_om_connsvc[5145]: [50B blob data]
    Aug 22 20:43:08 r730xd-srv01.local systemd[1]: Started LSB: DSM OM Connection Service.
    Aug 22 20:45:55 r730xd-srv01.local systemd[1]: Stopping LSB: DSM OM Connection Service...
    Aug 22 20:46:02 r730xd-srv01.local dsm_om_connsvc[8844]: [55B blob data]
    Aug 22 20:46:02 r730xd-srv01.local systemd[1]: Stopped LSB: DSM OM Connection Service.
    Aug 22 20:46:29 r730xd-srv01.local systemd[1]: Stopped LSB: DSM OM Connection Service.
  4. Start the services
    # srvadmin-services.sh start
    Starting instsvcdrv (via systemctl):                       [  OK  ]
    Starting dataeng (via systemctl):                          [  OK  ]
    Starting dsm_om_shrsvc (via systemctl):                    [  OK  ]
    Starting dsm_om_connsvc (via systemctl):                   [  OK  ]
  5. Try running the chassis info command
    # omreport chassis info
    Error! Chassis info setting unavailable on this system.
     
    # omreport about
    Product name : Dell OpenManage Server Administrator
    Version      : 8.3.0
    Copyright    : Copyright (C) Dell Inc. 1995-2015 All rights reserved.
    Company      : Dell Inc.
  6. The following are the rpms installed via yum
    # rpm -qa | grep srvadmin
    srvadmin-xmlsup-8.3.0-1908.9058.el7.x86_64
    srvadmin-omacore-8.3.0-1908.9058.el7.x86_64
    srvadmin-server-snmp-8.3.0-1908.9058.el7.x86_64
    srvadmin-oslog-8.3.0-1908.9058.el7.x86_64
    srvadmin-idrac-vmcli-8.3.0-1908.9058.el7.x86_64
    srvadmin-storageservices-snmp-8.3.0-1908.9058.el7.x86_64
    srvadmin-smcommon-8.3.0-1908.9058.el7.x86_64
    srvadmin-omcommon-8.3.0-1908.9058.el7.x86_64
    srvadmin-smweb-8.3.0-1908.9058.el7.x86_64
    srvadmin-racsvc-8.3.0-1908.9058.el7.x86_64
    srvadmin-nvme-8.3.0-1908.9058.el7.x86_64
    srvadmin-storage-cli-8.3.0-1908.9058.el7.x86_64
    srvadmin-storageservices-8.3.0-1908.9058.el7.x86_64
    srvadmin-omilcore-8.3.0-1908.9058.el7.x86_64
    srvadmin-racadm4-8.3.0-1908.9058.el7.x86_64
    srvadmin-isvc-8.3.0-1908.9058.el7.x86_64
    srvadmin-argtable2-8.3.0-1908.9058.el7.x86_64
    srvadmin-racadm5-8.3.0-1908.9058.el7.x86_64
    srvadmin-cm-8.3.0-1908.9058.el7.x86_64
    srvadmin-isvc-snmp-8.3.0-1908.9058.el7.x86_64
    srvadmin-rac4-populator-8.3.0-1908.9058.el7.x86_64
    srvadmin-tomcat-8.3.0-1908.9058.el7.x86_64
    srvadmin-itunnelprovider-8.3.0-1908.9058.el7.x86_64
    srvadmin-storelib-sysfs-8.3.0-1908.9058.el7.x86_64
    srvadmin-storageservices-cli-8.3.0-1908.9058.el7.x86_64
    srvadmin-deng-8.3.0-1908.9058.el7.x86_64
    srvadmin-rac-components-8.3.0-1908.9058.el7.x86_64
    srvadmin-ominst-8.3.0-1908.9058.el7.x86_64
    srvadmin-sysfsutils-8.3.0-1908.9058.el7.x86_64
    srvadmin-rac5-8.3.0-1908.9058.el7.x86_64
    srvadmin-base-8.3.0-1908.9058.el7.x86_64
    srvadmin-idrac-ivmcli-8.3.0-1908.9058.el7.x86_64
    srvadmin-rac4-8.3.0-1908.9058.el7.x86_64
    srvadmin-webserver-8.3.0-1908.9058.el7.x86_64
    srvadmin-standardAgent-8.3.0-1908.9058.el7.x86_64
    srvadmin-storelib-8.3.0-1908.9058.el7.x86_64
    srvadmin-storage-snmp-8.3.0-1908.9058.el7.x86_64
    srvadmin-omacs-8.3.0-1908.9058.el7.x86_64
    srvadmin-racdrsc-8.3.0-1908.9058.el7.x86_64
    srvadmin-idracadm-8.3.0-1908.9058.el7.x86_64
    srvadmin-idrac-snmp-8.3.0-1908.9058.el7.x86_64
    srvadmin-realssd-8.3.0-1908.9058.el7.x86_64
    srvadmin-storage-8.3.0-1908.9058.el7.x86_64
    srvadmin-all-8.3.0-1908.9058.el7.x86_64
    srvadmin-hapi-8.3.0-1908.9058.el7.x86_64
    srvadmin-deng-snmp-8.3.0-1908.9058.el7.x86_64
    srvadmin-server-cli-8.3.0-1908.9058.el7.x86_64
    srvadmin-jre-8.3.0-1908.9058.el7.x86_64
    srvadmin-idrac-8.3.0-1908.9058.el7.x86_64

Wednesday, June 22, 2016

How To: Enable PXE and Configure Boot Order Via Dell RACADM Command

Our HPC cluster was lucky enough double in compute capacity recently. Whoop! The new hardware brought with it some significant changes in rack layout and networking fabric. The compute nodes are a combination of Dell R630, DSS1500 and R730 (for GPU K80 and Intel Phi nodes).

The existing core 10GbE CAT6 based fabric (made up of Dell Force10 S4820T) was replaced by a Dell Force10 Z9500 and fiber (Z9500 has 132 QSFP+ 40GbE ports that can in turn be broken out into 528 10GbE SFP+ ports).

Physical changes aside (like wiring, top of rack 40GbE to 10GbE breakout panels, etc...), the above meant we had to change the primary boot device from the addon 10GbE CAT6 based NIC to the onboard fiber 10GbE NIC (CentOS 7 sees this interface as eno1).

This required two changes at the system BIOS / nic hardware config level
  • Enable PXE boot on the NIC
  • Modify the BIOS boot order
One method to make these two changes in bulk is to use the Dell OpenManage command line tool racadm, which was what we decided to use.

The following are notes I took while working on a subset of the compute nodes.

Enable PXE on the fiber interface

The first step is to identify the names of the network interfaces. I queried a single node to get the full list of interfaces followed by querying the specific interface (.1) just for grins to see what settings were available. In this case the first integrated port is referenced as NIC.nicconfig.1 and NIC.Integrated.1-1-1
# Get list of Nics
racadm -r 172.16.3.48 -u root -p xxxxxxx get nic.nicconfig

NIC.nicconfig.1 [Key=NIC.Integrated.1-1-1#nicconfig]
NIC.nicconfig.2 [Key=NIC.Integrated.1-2-1#nicconfig]
NIC.nicconfig.3 [Key=NIC.Integrated.1-3-1#nicconfig]
NIC.nicconfig.4 [Key=NIC.Integrated.1-4-1#nicconfig]
NIC.nicconfig.5 [Key=NIC.Slot.3-1-1#nicconfig]
NIC.nicconfig.6 [Key=NIC.Slot.3-2-1#nicconfig]

racadm -r 172.16.3.48 -u root -p xxxxxxx get nic.nicconfig.1

[Key=NIC.Integrated.1-1-1#nicconfig]
LegacyBootProto=NONE
#LnkSpeed=AutoNeg
NumberVFAdvertised=64
VLanId=0
WakeOnLan=Disabled

 
Next we can enable PXE boot on NIC.Integrated.1-1-1 for the set of nodes. In order for the change to take affect you have to create a job followed by a reboot.
for n in {48..10} ; do
  ip=172.16.3.${n}
  echo "IP: $ip - configuring nic.nicconfig.1.legacybootproto PXE"
  # Get Nic config for integrated port 1
  racadm -r $ip -u root -p xxxxxxx get nic.nicconfig.1 | grep Legacy
  # Set to PXE
  racadm -r $ip -u root -p xxxxxxx set nic.nicconfig.1.legacybootproto PXE
  # Verify it's set to PXE (pending)
  racadm -r $ip -u root -p xxxxxxx get nic.nicconfig.1 | grep Legacy
  # Create a job to enable the changes following the reboot
  racadm -r $ip -u root -p xxxxxxx jobqueue create NIC.Integrated.1-1-1
  # reboot so that the configur job will execute
  ipmitool -I lanplus -H $ip -U root -P xxxxxxx chassis power reset
done 

Configure the BIOS boot order

Now that the NIC has PXE enabled and the changes have been applied, the boot order can be modified. If this fails for a node it either means that the job failed to run in the previous step, start debugging.
for n in {48..10} ; do
  ip=172.16.3.${n}
  echo "IP: $ip - configuring BIOS.biosbootsettings.BootSeq NIC.Integrated.1-1-1,...."
  # Get Bios Boot sequence
  racadm -r $ip -u root -p xxxxxxx get BIOS.biosbootsettings.BootSeq | grep BootSeq
  # Set Bios boot sequence
  racadm -r $ip -u root -p xxxxxxx set BIOS.biosbootsettings.BootSeq NIC.Integrated.1-1-1,NIC.Integrated.1-3-1,NIC.Slot.3-1-1,Optical.SATAEmbedded.J-1,HardDisk.List.1-1
  # Create a BIOS reboot job so that the boot order changes are applied
  racadm -r $ip -u root -p xxxxxxx jobqueue create BIOS.Setup.1-1 -r pwrcycle -s TIME_NOW -e TIME_NA
done
 

Modify the switch and port locations in BrightCM

We use Bright Computing Cluster Manager for HPC to manage our HPC nodes (this recently replaced Rocks in our environment). Within BrightCM we had to modify the boot interface for the set of compute nodes. BrightCM provides excellent CLI support, hooray!
cmsh -c "device foreach -n node0001..node0039 (interfaces; use bootif; set networkdevicename eno1); commit"
 

Update the switch port locations in BrightCM

BrightCM keeps track of the switch port to node NIC mapping. One reason for this is to prevent accidentally imprinting the wrong image on nodes that got swapped (i.e. you remove two nodes for service and insert them back into the rack in the wrong location).
First I had to identify the new port number for a node, I chose the node that would be last in sequence on the switch. This happened to show up in BrightCM as port 171. I found this by PXE booting the compute node, once it comes up BrightCM notices a discrepancy and displays an interface that allows you to manually address the issue, something akin to "node0039 should be on switch XXX port ## but it's showing up on switch z9500-r05-03-38 port 171" blah blah blah.
Instead of manually addressing the issue, it can be done via the CLI in bulk (assuming there's a sequence). Each of our nodes have two NICs wired to the Z9500 (node0039 would be on ports 171 and 172), thus in the code below I decrement by 2 ports for each node's boot device.
port=171
for n in {39..26} ; do
  let port=${port}-2
  cmsh -c "device; set node00${n} ethernetswitch z9500-r05-03-38:${port}; commit ;"
done
unset port