Tuesday, January 31, 2012

CentOS 6.2 Installation issue with Dell Precision Workstation T7500

Attempting to kickstart a new Dell Precision Workstation T7500 with an nVidia Quadro 2000 with the CentOS6.2 DVD media resulted in a kernel panic. All signs pointed to the community built nouveau module "nouveau_probe_i2c_addr panic"

I'm not so worried about the nouveau module working, since the kickstart process installs the official nVidia binary release.

To work around the issue and allow the system to boot fully into the installer, simply add "rblacklist=nouveau" to the argument list:


> vmlinuz initrd=initrd.img ks=http://srv1/ks/wks-el6.cfg rdblacklist=nouveau reboot=pci

Wednesday, January 18, 2012

How to Configure Nagios Check_MK to Report Number of Package Updates Need on Client

The Check_mk Updates plugin was posted to the Check_mk mailing list by Jonathan Mills. The following Blog entry covers the steps I took to integrate it in my environment. I opted to distribute the two plugin files to the clients via puppet rather than RPM.
  • OMD version: 0.51.20111117
  • Puppet version: 2.6.12
  • Server is CentOS 5.7 x86_64
  • Clients are CentOS 5.7, CentOS 6.2, RHEL5.7, Fedora 16 ix86,x86_64
Refrences This check currently only works with Yum based clients (tested on CentOS 5 and 6, RHEL5 and Fedora 16) and requires the yum-security package (EL5) or yum-plugin-securities (EL6) The plugin attempts to identify security and non-security packages that are pending install. For RHEL, it's simple to get this info via "yum --security check-update"

$ sudo yum --security check-update
Loaded plugins: dellsysid, rhnplugin, security
Limiting package lists to security relevant ones
Needed 6 of 17 packages, for security

kernel.x86_64                        2.6.18-274.17.1.el5             rhel-x86_64-client-5
kernel-devel.x86_64                  2.6.18-274.17.1.el5             rhel-x86_64-client-5
kernel-headers.x86_64                2.6.18-274.17.1.el5             rhel-x86_64-client-5
libxml2.i386                         2.6.26-2.1.12.el5_7.2             rhel-x86_64-client-5
libxml2.x86_64                       2.6.26-2.1.12.el5_7.2             rhel-x86_64-client-5
libxml2-python.x86_64                2.6.26-2.1.12.el5_7.2             rhel-x86_64-client-5
For CentOS and most likely Scientific Linux, the security errata are not provided with the repos, so the above command will always report 0 security updates. This is solved in the plugin by parsing the results of the -v (verbose) output.
  1. Add the client side scripts to the puppet server (Puppet isn't necessary, you can install the RPM provided in the tar file on the check_mk post)
    • Create the file directories under site
      
      $ mkdir -p var/lib/puppet/files/site/etc/check_mk
      $ mkdir -p var/lib/puppet/files/site/usr/lib/check_mk_agent/plugins 
      
    • Create the check_updates.cfg etc file
      
      $ vim var/lib/puppet/files/site/etc/check_mk/check_updates.cfg 
      
      
      
      # +------------------------------------------------------------------+
      # |             ____ _               _        __  __ _  __           |
      # |            / ___| |__   ___  ___| | __   |  \/  | |/ /           |
      # |           | |   | '_ \ / _ \/ __| |/ /   | |\/| | ' /            |
      # |           | |___| | | |  __/ (__|   <    | |  | | . \            |
      # |            \____|_| |_|\___|\___|_|\_\___|_|  |_|_|\_\           |
      # |                                                                  |
      # | Copyright Mathias Kettner 2010             mk@mathias-kettner.de |
      # +------------------------------------------------------------------+
      #
      # This file is part of Check_MK.
      # The official homepage is at http://mathias-kettner.de/check_mk.
      #
      # check_mk is free software;  you can redistribute it and/or modify it
      # under the  terms of the  GNU General Public License  as published by
      # the Free Software Foundation in version 2.  check_mk is  distributed
      # in the hope that it will be useful, but WITHOUT ANY WARRANTY;  with-
      # out even the implied warranty of  MERCHANTABILITY  or  FITNESS FOR A
      # PARTICULAR PURPOSE. See the  GNU General Public License for more de-
      # ails.  You should have  received  a copy of the  GNU  General Public
      # License along with GNU Make; see the file  COPYING.  If  not,  write
      # to the Free Software Foundation, Inc., 51 Franklin St,  Fifth Floor,
      # Boston, MA 02110-1301 USA.
      
      # check_updates.cfg
      # This file configures mk_check_updates.
      
      # interval (seconds) between runs of 'yum check-update'
      INTERVAL=7200
      
      # path to log file
      LOG="/var/log/check_updates.log"
      
    • Create the mk_check_updates script (The script has updates that I made to resolve some issues related to the priorities yum plugin and yum output beginning with Keeping or Removing, so it's slightly different than the original source)
      
      $ vim var/lib/puppet/files/site/usr/lib/check_mk_agent/plugins/mk_check_updates 
      
      
      
      #!/bin/bash
      #
      # OUTPUT:
      # (security) (non-security) (runtime) (check age)
      # <<<updates>>>
      # 7 40 7 209
      
      # Unix time (seconds since Unix epoch)
      START=$(date +%s)
      
      TIME=
      AGE=
      
      INTERVAL=86400                          # default interval once a day
      LOG="/var/log/check_updates.log"        # default path to log file
      
      # Source config file if it exists
      if [ -e "/etc/check_mk/check_updates.cfg" ]; then
          . /etc/check_mk/check_updates.cfg
      fi
      
      # function run_check_update
      run_check_update () {
      if which yum >/dev/null; then
      
        if [ ! -e "/var/run/yum.pid" ]; then
      
          cat /dev/null > $LOG
      
          # Check for security RPMS
          yum -v --security check-update | egrep '(i.86|x86_64|noarch)' | egrep -v '\(priority\)' |\
       egrep -v '(^Keeping|^Removing|^Nothing|^Excluding|^Looking)' | sed 's/^.*--> //g' | while read L
          do
      
            RPM=$(echo $L | awk '{print $1}')
            Q=$(echo ${L} | grep 'non-security' > /dev/null; echo $?)
            if [ $Q -eq 0 ]; then
              echo "non-security $RPM" >> $LOG
            else
              echo "security $RPM" >> $LOG
            fi
      
          done
      
        fi
      fi
      }
      
      # function timeyet
      timeyet () {
      LAST=$(stat -c '%Y' $LOG)
      NOW=$(date +%s)
      AGE=$((NOW - LAST))
      [ $AGE -gt $INTERVAL ] && TIME=1 || TIME=0
      }
      
      # See if it's time to run 'yum check-updates' yet
      if [ ! -e $LOG ]; then
        touch $LOG
        run_check_update
        timeyet
      else
        timeyet
        if [ $TIME = 1 ]; then
          run_check_update
          timeyet
        fi
      fi
      
      # Gather results from log file
      SEC=$(grep '^security' $LOG | wc -l)
      NON=$(grep '^non-security' $LOG | wc -l)
      
      # Unix time (seconds since epoch)
      END=$(date +%s)
      
      RUNTIME=$((END - START))
      
      echo '<<<updates>>>'
      echo $SEC" "$NON" "$RUNTIME" "$AGE
      exit 0
      
      
    • Add the scripts to git
      
      $ git add var/lib/puppet/files/site/usr/lib/check_mk_agent/plugins/mk_check_updates
      $ git add var/lib/puppet/files/site/etc/check_mk/check_updates.cfg 
      $ git commit -a -m "Adding check_mk client side scripts to report yum updates"
      $ git push
      
    • Add the scripts to the check_mk class to ensure that the clients get the code
      
      $ vim etc/puppet/manifests/classes/check_mk.pp
      
      
      
      # etc/puppet/manifests/classes/check_mk.pp
      
      class check_mk {
         case $operatingsystem {
            "centos",
            "fedora",
            "redhat": {
               package {["check_mk-agent", "check_mk-agent-logwatch"]:
                  ensure   => latest,
                  notify   => Service["xinetd"],
               }
               service { "xinetd":
                  ensure     => running,
                  enable     => true,
               }
               file { "/etc/check_mk/check_updates.cfg":
                  owner => "root",
                  group => "root",
                  mode => 755,
                  source => "puppet:///site/etc/check_mk/check_updates.cfg",
               }
               file { "/usr/lib/check_mk_agent/plugins/mk_check_updates":
                  owner => "root",
                  group => "root",
                  mode => 755,
                  source => "puppet:///site/usr/lib/check_mk_agent/plugins/mk_check_updates",
               }
           }
            default: { }
         }
      }
      
    • Ensure that the check_mk class is included in the node definitions (currently included in the baseclass template)
    • Git commit the changes to check_mk.pp class and push to the git server
  2. Install the python script on the nagios server (note user defined checks go in local/share/check_mk/checks, if you put them into $SITE/share.... they won't survive the next OMD upgrade)
    
    $ su - sitename
    $ vim local/share/check_mk/checks/updates
    
    
    
    #!/usr/bin/python
    # -*- encoding: utf-8; py-indent-offset: 4 -*-
    
    # Jonathan Mills 10/2011
    
    # Example output from agent:
    # [security] [non-security] [runtime (seconds)] [age of results (seconds)]
    # <<<updates>>>
    # 7 40 0 13
    #
    
    updates_default_values = (5, 20)
    
    # inventory
    def inventory_updates(checktype, info):
        #if len(info) >= 1 and len(info[0]) >= 1:
        #    return [ (None, None) ]
        inventory = []
        inventory.append( (None, "updates_default_values") )
        return inventory
    
    
    # check
    def check_updates(_no_item, params, info):
        # unpack check parameters
        min_num_sec, min_num_nonsec = params
    
        for line in info:
            perfdata = []
            sec = int(line[0])
            nonsec = int(line[1])
            age = int(line[3])
            infotext = "%s Security Updates, %s Non-Critical Updates  (Last Checked %s seconds ago)" % (sec, nonsec, age)
            perfdata.append( ( "Runtime (sec)", int(line[2]) ) )
            if sec > min_num_sec:
                return (2, "CRITICAL - " + infotext, perfdata)
            elif nonsec > min_num_nonsec:
                return (1, "WARNING - " + infotext, perfdata)
            else:
                return (0, "OK - " + infotext, perfdata)
    
    # declare the check to Check_MK
    check_info['updates'] = (check_updates, "Updates", 1, inventory_updates)
    
  3. Add a new time period 'nightly' to nagios that can be used to limit this check to running daily from 3AM to 4AM
    
    $ vim etc/nagios/conf.d/timeperiods.cfg 
    
    
    ###############################################################################
    # TIMEPERIODS.CFG - SAMPLE TIMEPERIOD DEFINITIONS
    #
    # NOTES: This config file provides you with some example timeperiod definitions
    #        that you can reference in host, service, contact, and dependency
    #        definitions.
    #
    #        You don't need to keep timeperiods in a separate file from your other
    #        object definitions.  This has been done just to make things easier to
    #        understand.
    #
    ###############################################################################
    
    # This defines a timeperiod where all times are valid for checks,
    # notifications, etc.  The classic "24x7" support nightmare. :-)
    define timeperiod{
        timeperiod_name 24x7
        alias           24 Hours A Day, 7 Days A Week
        sunday          00:00-24:00
        monday          00:00-24:00
        tuesday         00:00-24:00
        wednesday       00:00-24:00
        thursday        00:00-24:00
        friday          00:00-24:00
        saturday        00:00-24:00
    }
    
    # 'workhours' timeperiod definition
    define timeperiod{
           timeperiod_name workhours
           alias           Normal Work Hours
           monday          08:00-17:00
           tuesday         08:00-17:00
           wednesday       08:00-17:00
           thursday        08:00-17:00
           friday          08:00-17:00
    }
    
    # 'none' timeperiod definition
    define timeperiod{
        timeperiod_name  none
        alias            No Time Is A Good Time
    }
    
    # 'nightly' timeperiod definition
    define timeperiod{
             timeperiod_name         nightly
             alias                   Nightly Check
             sunday                  03:00-04:00  ; Every Sunday of every week
             monday                  03:00-04:00  ; Every Monday of every week
             tuesday                 03:00-04:00  ; Every Tuesday of every week
             wednesday               03:00-04:00  ; Every Wednesday of every week
             thursday                03:00-04:00  ; Every Thursday of every week
             friday                  03:00-04:00  ; Every Friday of every week
             saturday                03:00-04:00  ; Every Saturday of every week
    }
    
    
  4. Add the new check to check_mk main.mk file
    
    $ vim etc/check_mk/main.mk
    
    
    
    # check-updates (OMD 0.52 requires user defined vars to prepend and underscore)
    _updates_default_values = (6, 20) # check-updates: critical when 6 or more sec updates, warning when 20 or more non-sec updates
    
    extra_service_conf["check_period"] = [
      ( "nightly", ALL_HOSTS, [ "Updates" ] ), # check-updates: Only check for updates from 3 to 4AM as set in timeperiods.cfg
    ]
    
    extra_host_conf["max_check_attempts"] = [
      ( "1", ALL_HOSTS, [ "Updates" ] ), # check-updates: Only check for updates once
    ]
    
    # Enable notifications for specific services
    extra_service_conf["notifications_enabled"] = [
      ( "1", ALL_HOSTS, ["Check_MK"]),
      ( "0", ALL_HOSTS, ["Updates"]), # check-updates: Don't notify for security OS updates
      ( "1", ALL_HOSTS, ["Memory used"]),
      ( "1", ALL_HOSTS, ["IPMI Sensor Summary","fs_*"]),
      ( "1", ["linsrv"], ["IPMI Sensor Summary","ambient_temp"]),
      ( "1", ALL_HOSTS, ["Multipath *"]),
      ( "1", ["kvm"], ALL_HOSTS, ["CPU load"]),
      ( "1", ["kvm"], ALL_HOSTS, ["CPU utilization"]),
      ( "1", ["mailsrv"], ["Postfix Queue"]),
      ( "1", ["linsrv"], ALL_HOSTS, ["Dell OMSA"]),
      ( "0", ALL_HOSTS, ALL_SERVICES), # and disable notifications for everything else
    ]
    
    service_groups = [
      ( "updates", ALL_HOSTS, [ "Updates" ] ), # check-updates: Create updates service group to make viewing in web interface easier
    ]
    
    define_servicegroups = {
       "updates" : "RHEL/CentOS Yum Updates", # check-updates: Can now statically link to a service group web view: http://nagios.server/sitename/check_mk/view.py?view_name=servicegroup&servicegroup=updates
    }
    
    
  5. Rerun the inventory for the nodes
    
    $ check_mk -II node-01
    ...
    
    or for all nodes
    $ check_mk -II
    
  6. Reload the services
    
    $ check_mk -O
    
    
  7. Check the web page for the nodes, alternatively you can go straight to the Updates overview page:
    https://nagios.server/sitename/check_mk/view.py?view_name=servicegroup&servicegroup=updates