Friday, August 7, 2015

How To: Clear Dell iDRAC Job Queue

I'm in the process of deploying 41 new Dell R630 PowerEdge servers in our HPC environment. To help manage the hardware I'm using a new tool (to us anyways), Dell OpenManage Essentials.

OME requires a Microsoft Windows OS, luckily (since we are a Linux shop) it's a snap to install Windows Server 2012 in KVM.

Some of the functionality provided by OME:

  • Reporting and alerting
  • Firmware upgrades
  • Configuration deployment (BIOS settings, iDRAC, RAID, etc...)
  • Bare metal provisioning
While OME is free, some of the features require a license. I've only been using OME for a couple of days so I haven't had a chance to test all of its features, but I have found that configuration requires a license (ex: ability to push a configuration template out to a node(s)). Firmware upgrades and reporting do not require a license.

The first task to be handled by OME, firmware upgrades on all 41 nodes. My initial attempts failed. Reading through the logs revealed that the remote clients couldn't reach TCP port 1278 on the OME server. Firmware upgrades started deploying after opening that TCP 1278 in the Windows firewall.

Each server had a long list of upgrades including BIOS, iDRAC, and the 6 network cards (mix of 10Gbit and 1Gbit). All of the firmware deployed successfully, with the exception of the Ethernet cards. Grrr, back to the scanning the logs.


Results:  
 Downloading Packages.
 Calling InstallFromUri method to Download packages to the iDRAC 
 There are some pending reboot jobs on the iDRAC that maybe block updating the system. It is recommended that you clear all the jobs before updating
 Downloading Package: Network_Firmware_6FD9P_WN64_16.5.20_A00.EXE onto the iDRAC 
 Package download has successfully started and the Job ID is JID_388846411941
 The URI given to the iDRAC to download from: http://192.168.2.69:1278/install_packages/Packages/Network_Firmware_6FD9P_WN64_16.5.20_A00.EXE

Ok, but how do you do this? I didn't see any native way to do this from within OME, so on to Google.

Thanks to this post on Jon Munday's blog, I was able to clear the pending jobs with a little PowerShell for loop action to hit all nodes.

The following command displays the job queue for the range of compute nodes (192.16.2.10 thru 50)

For ($i=10; $i -lt 51; $i++) { winrm e cimv2/root/dcim/DCIM_LifecycleJob -u:$USER -p:$PASSWORD -SkipCNcheck -SkipCAcheck -r:https://192.168.2.$i/wsman -auth:basic -encoding:utf-8 }
 
The next command clears the queue. Sorry for the long single line, I don't know if PowerShell supports spanning a command across multiple lines like I can do in Bash:

For ($i=10; $i -lt 51; $i++) { winrm invoke DeleteJobQueue "cimv2/root/dcim/DCIM_JobService?CreationClassName=DCIM_JobService+Name=JobService+SystemName=Idrac+SystemCreationClassName=DCIM_ComputerSystem" '@{JobID="JID_CLEARALL"}' -r:https://192.168.2.$i/wsman -u:$USER -p:$PASSWORD -SkipCNCheck -SkipCACheck -auth:basic -encoding:utf-8 -format:pretty }