Friday, November 18, 2011

Dell Optiplex 790 Workstations hang while rebooting with CentOS 6

I'm working on deploying a large number of Dell Optiplex 790 workstations using kickstart and CentOS 6.

During the initial testing I found that the 790's wouldn't completely reboot with CentOS 6 installed or booted into the install media. They'd get as far as "Restarting".

The solution is to pass an option to the kernel:

reboot=pci


This can be added manually to the grub configuration file for systems already installed. For kickstarting:

1. Add the option in your kickstart file

bootloader --location=mbr --driveorder=sda --append="crashkernel=auto rhgb quiet reboot=pci" --md5pass=$1$.xxxxx


2. During the initial boot off of the CD/DVD, press TAB to alter to boot options (this is all one continuous line broken into multiple for readability)

> vmlinuz initrd=initrd.img ks=http://192.168.1.5/ks/el6/wks1.cfg
    ip=192.168.1.100 netmask=255.255.255.0 gateway=192.168.1.1 nameserver=192.168.1.1
    ksdevice=eth0 reboot=pci

Thursday, November 10, 2011

Fedora 16 does not Boot if /boot is on Software RAID

In previous versions of Fedora, you could configure /boot to exist on a software RAID device (say a software mirror), however in Fedora 16 this will result in failure to boot. This wasn't a supported configuration, but it used to work.

This is a known "issue" and is explained as follows:

Cannot boot with /boot partition on a software RAID array
link to this item - Bugzilla: #750794

Attempting to boot after installing Fedora 16 with the /boot partition on a software RAID array will fail, as the software RAID modules for the grub2 bootloader are not installed. Having the /boot partition on a RAID array has never been a recommended configuration for Fedora, but up until Fedora 16 it has usually worked.

To work around this issue, do not put the /boot partition on the RAID array. Create a simple BIOS boot partition and a /boot partition on one of the disks, and place the other system partitions on the RAID array. Alternatively, you can install the appropriate grub2 modules manually from anaconda's console before rebooting from the installer, or from rescue mode. Edit the file /mnt/sysimage/boot/grub2/grub.cfg and add the lines:

insmod raid
insmod mdraid09
insmod mdraid1x
Now run these commands:

chroot /mnt/sysimage
grub2-install /dev/sda
grub2-install /dev/sdb
Adjust the device names as appropriate to the disks used in your system.

I had a system where I'd created a mirror for /boot that had been reinstalled from Fedora 13, 14, 15 and now 16. As reported, it failed to boot following the F16 install.

Destroying the mirror and creating a simple /dev/sda2 partition for /boot got it booting.

Friday, November 4, 2011

Using check_dell_bladechassis with check_mk

This post builds off of a previous post that documented getting check_openmanage working with check_mk.

In this post we'll add check_dell_bladechassis to the mix to allow for monitoring of Dell M1000e blade chassis' (via the CMC management card).

This was done on the following system:
Unless otherwise specified all paths are relative to the site owners home (ex: /opt/omd/sites/mysite) The check_openmanage code in this blog post is not necessary to get check_dell_bladechassis, I'm just including it to help tie this entry to the previous post.
  1. Change users on your OMD server to the site user: $ su - mysite
  2. Download the latest check_dell_bladechassis from http://folk.uio.no/trondham/software/check_dell_bladechassis.html to ~/tmp and extract
  3. copy the check_dell_bladechassis script to local/lib/nagios/plugins (this defaults to $USER2$ in your commands)
    
    $ cp tmp/check_dell_bladechassis-1.0.0/check_dell_bladechassis local/lib/nagios/plugins/
    $ chmod +x local/lib/nagios/plugins/check_dell_bladechassis
    
  4. copy the PNP4Nagios template
    
    $ cp tmp/check_dell_bladechassis-1.0.0/check_dell_bladechassis.php etc/pnp4nagios/templates/
    
  5. Test check_dell_bladechassis to see that it can successfully query an M1000e CMC (I've inserted carriage returns in the output to make it more readable)
    
    local/lib/nagios/plugins/check_dell_bladechassis -H dell-m1000e-01 -p -C MySecretCommunity
    
    OK - System: 'PowerEdge M1000e', SN: 'XXXXXX', Firmware: '3.03', hardware working fine|
    'total_watt'=1500.000W;0;7928.000 'total_amp'=6.750A;0;0 'volt_ps1'=239.500V;0;0 
    'volt_ps2'=242.750V;0;0 'volt_ps3'=242.750V;0;0 'volt_ps4'=241.750V;0;0 'volt_ps5'=241.750V;0;0 
    'volt_ps6'=242.750V;0;0 'amp_ps1'=1.688A;0;0 'amp_ps2'=1.641A;0;0 'amp_ps3'=0.188A;0;0 
    'amp_ps4'=1.516A;0;0 'amp_ps5'=1.500A;0;0 'amp_ps6'=0.219A;0;0
    
    
  6. Edit the main.mk file to define the command, etc... (the perfdata_format and monitoring_host I got from a previous emailer to the list, not sure if they are needed)
    
    all_hosts = [
     'dell-m1000e-01|snmp|m1000e|nonpub',
     'dell-r710-01|linsrv|kvm|omsa|nonpub',
     'dell-2950-01|linsrv|omsa|nonpub',
     'hp-srv-01|winsrv|smb', ]
    
    # Are you using PNP4Nagios and MRPE checks? This will make PNP
    # choose the correct template for standard Nagios checks:
    perfdata_format = "pnp"
    #set the monitoring host
    monitoring_host = "nagios"
    
    # SNMP Community
    snmp_default_community = "someCommunityRO"
    
    snmp_communities = [
      ( "MySecretCommunity", ["nonpub"], ALL_HOSTS ),
    ]
    
    extra_nagios_conf += r"""
    
    # ARG1: community string
    define command {
        command_name    check_openmanage
        command_line    $USER2$/check_openmanage -H $HOSTADDRESS$ -p -C $ARG1$
    }
    
    define command {
        command_name    check_dell_bladechassis
        command_line    $USER2$/check_dell_bladechassis -H $HOSTADDRESS$ -p -C $ARG1$
    }
    
    """
    
    legacy_checks = [
      # On all hosts with the tag 'omsa' check Dell OpenManage for status 
      # service description "Dell OMSA", process performance data
      ( ( "check_openmanage!MySecretCommunity", "Dell OMSA", True), [ "omsa" ], ALL_HOSTS ),
      # similar for m1000e
      ( ( "check_dell_bladechassis!MySecretCommunity", "Dell Blade Chassis", True), [ "m1000e" ], ALL_HOSTS ),
    
    ]
    
    
  7. That should be it, reinventory your M1000e and reload
    
    $ check_mk -II dell-m1000e-01
    $ check_mk -O
    
    
  8. The php code has a bug that can be fixed using the below patch (see the first comment for details)
    
    --- a/check_dell_bladechassis.php 2009-08-04 07:00:15.000000000 -0500
    +++ b/check_dell_bladechassis.php 2011-12-21 14:44:25.488132187 -0600
    @@ -41,7 +41,7 @@
      
      $opt[$count] = "--slope-mode --vertical-label \"$vlabel\" --title \"$def_title: $title\" ";
      
    -        $def[$count] .= "DEF:var$i=$rrdfile:$DS[$i]:AVERAGE " ;
    +        $def[$count] = "DEF:var$i=$rrdfile:$DS[$i]:AVERAGE " ;
             $def[$count] .= "AREA:var$i#$PWRcolor:\"$NAME[$i]\" " ;
             $def[$count] .= "LINE:var$i#000000: " ;
     
    @@ -62,7 +62,7 @@
      
      $opt[$count] = "-X0 --lower-limit 0 --slope-mode --vertical-label \"$vlabel\" --title \"$def_title: $title\" ";
      
    -        $def[$count] .= "DEF:var$i=$rrdfile:$DS[$i]:AVERAGE " ;
    +        $def[$count] = "DEF:var$i=$rrdfile:$DS[$i]:AVERAGE " ;
             $def[$count] .= "AREA:var$i#$AMPcolor:\"$NAME[$i]\" " ;
             $def[$count] .= "LINE:var$i#000000: " ;
     
    @@ -75,6 +75,7 @@
         if(preg_match('/^volt_/',$NAME[$i])){
      if ($visited_volt == 0) {
          ++$count;
    +     $def[$count] = '';
          $visited_volt = 1;
      }
      
    @@ -87,6 +88,7 @@
      
      $def[$count] .= "DEF:var$i=$rrdfile:$DS[$i]:AVERAGE " ;
      $def[$count] .= "LINE:var$i#".$colors[$v++].":\"$NAME[$i]\" " ;
    +
      $def[$count] .= "GPRINT:var$i:LAST:\"%3.2lf $UNIT[$i] last \" ";
      $def[$count] .= "GPRINT:var$i:MAX:\"%3.2lf $UNIT[$i] max \" ";
      $def[$count] .= "GPRINT:var$i:AVERAGE:\"%3.2lf $UNIT[$i] avg \\n\" ";
    @@ -96,6 +98,7 @@
         if(preg_match('/^amp_/',$NAME[$i])){
      if ($visited_amp == 0) {
          ++$count;
    +     $def[$count] = '';
          $visited_amp = 1;
      }
     
    

Hope this helps, and comments are welcome.