Wednesday, January 18, 2012

How to Configure Nagios Check_MK to Report Number of Package Updates Need on Client

The Check_mk Updates plugin was posted to the Check_mk mailing list by Jonathan Mills. The following Blog entry covers the steps I took to integrate it in my environment. I opted to distribute the two plugin files to the clients via puppet rather than RPM.
  • OMD version: 0.51.20111117
  • Puppet version: 2.6.12
  • Server is CentOS 5.7 x86_64
  • Clients are CentOS 5.7, CentOS 6.2, RHEL5.7, Fedora 16 ix86,x86_64
Refrences This check currently only works with Yum based clients (tested on CentOS 5 and 6, RHEL5 and Fedora 16) and requires the yum-security package (EL5) or yum-plugin-securities (EL6) The plugin attempts to identify security and non-security packages that are pending install. For RHEL, it's simple to get this info via "yum --security check-update"

$ sudo yum --security check-update
Loaded plugins: dellsysid, rhnplugin, security
Limiting package lists to security relevant ones
Needed 6 of 17 packages, for security

kernel.x86_64                        2.6.18-274.17.1.el5             rhel-x86_64-client-5
kernel-devel.x86_64                  2.6.18-274.17.1.el5             rhel-x86_64-client-5
kernel-headers.x86_64                2.6.18-274.17.1.el5             rhel-x86_64-client-5
libxml2.i386                         2.6.26-2.1.12.el5_7.2             rhel-x86_64-client-5
libxml2.x86_64                       2.6.26-2.1.12.el5_7.2             rhel-x86_64-client-5
libxml2-python.x86_64                2.6.26-2.1.12.el5_7.2             rhel-x86_64-client-5
For CentOS and most likely Scientific Linux, the security errata are not provided with the repos, so the above command will always report 0 security updates. This is solved in the plugin by parsing the results of the -v (verbose) output.
  1. Add the client side scripts to the puppet server (Puppet isn't necessary, you can install the RPM provided in the tar file on the check_mk post)
    • Create the file directories under site
      
      $ mkdir -p var/lib/puppet/files/site/etc/check_mk
      $ mkdir -p var/lib/puppet/files/site/usr/lib/check_mk_agent/plugins 
      
    • Create the check_updates.cfg etc file
      
      $ vim var/lib/puppet/files/site/etc/check_mk/check_updates.cfg 
      
      
      
      # +------------------------------------------------------------------+
      # |             ____ _               _        __  __ _  __           |
      # |            / ___| |__   ___  ___| | __   |  \/  | |/ /           |
      # |           | |   | '_ \ / _ \/ __| |/ /   | |\/| | ' /            |
      # |           | |___| | | |  __/ (__|   <    | |  | | . \            |
      # |            \____|_| |_|\___|\___|_|\_\___|_|  |_|_|\_\           |
      # |                                                                  |
      # | Copyright Mathias Kettner 2010             mk@mathias-kettner.de |
      # +------------------------------------------------------------------+
      #
      # This file is part of Check_MK.
      # The official homepage is at http://mathias-kettner.de/check_mk.
      #
      # check_mk is free software;  you can redistribute it and/or modify it
      # under the  terms of the  GNU General Public License  as published by
      # the Free Software Foundation in version 2.  check_mk is  distributed
      # in the hope that it will be useful, but WITHOUT ANY WARRANTY;  with-
      # out even the implied warranty of  MERCHANTABILITY  or  FITNESS FOR A
      # PARTICULAR PURPOSE. See the  GNU General Public License for more de-
      # ails.  You should have  received  a copy of the  GNU  General Public
      # License along with GNU Make; see the file  COPYING.  If  not,  write
      # to the Free Software Foundation, Inc., 51 Franklin St,  Fifth Floor,
      # Boston, MA 02110-1301 USA.
      
      # check_updates.cfg
      # This file configures mk_check_updates.
      
      # interval (seconds) between runs of 'yum check-update'
      INTERVAL=7200
      
      # path to log file
      LOG="/var/log/check_updates.log"
      
    • Create the mk_check_updates script (The script has updates that I made to resolve some issues related to the priorities yum plugin and yum output beginning with Keeping or Removing, so it's slightly different than the original source)
      
      $ vim var/lib/puppet/files/site/usr/lib/check_mk_agent/plugins/mk_check_updates 
      
      
      
      #!/bin/bash
      #
      # OUTPUT:
      # (security) (non-security) (runtime) (check age)
      # <<<updates>>>
      # 7 40 7 209
      
      # Unix time (seconds since Unix epoch)
      START=$(date +%s)
      
      TIME=
      AGE=
      
      INTERVAL=86400                          # default interval once a day
      LOG="/var/log/check_updates.log"        # default path to log file
      
      # Source config file if it exists
      if [ -e "/etc/check_mk/check_updates.cfg" ]; then
          . /etc/check_mk/check_updates.cfg
      fi
      
      # function run_check_update
      run_check_update () {
      if which yum >/dev/null; then
      
        if [ ! -e "/var/run/yum.pid" ]; then
      
          cat /dev/null > $LOG
      
          # Check for security RPMS
          yum -v --security check-update | egrep '(i.86|x86_64|noarch)' | egrep -v '\(priority\)' |\
       egrep -v '(^Keeping|^Removing|^Nothing|^Excluding|^Looking)' | sed 's/^.*--> //g' | while read L
          do
      
            RPM=$(echo $L | awk '{print $1}')
            Q=$(echo ${L} | grep 'non-security' > /dev/null; echo $?)
            if [ $Q -eq 0 ]; then
              echo "non-security $RPM" >> $LOG
            else
              echo "security $RPM" >> $LOG
            fi
      
          done
      
        fi
      fi
      }
      
      # function timeyet
      timeyet () {
      LAST=$(stat -c '%Y' $LOG)
      NOW=$(date +%s)
      AGE=$((NOW - LAST))
      [ $AGE -gt $INTERVAL ] && TIME=1 || TIME=0
      }
      
      # See if it's time to run 'yum check-updates' yet
      if [ ! -e $LOG ]; then
        touch $LOG
        run_check_update
        timeyet
      else
        timeyet
        if [ $TIME = 1 ]; then
          run_check_update
          timeyet
        fi
      fi
      
      # Gather results from log file
      SEC=$(grep '^security' $LOG | wc -l)
      NON=$(grep '^non-security' $LOG | wc -l)
      
      # Unix time (seconds since epoch)
      END=$(date +%s)
      
      RUNTIME=$((END - START))
      
      echo '<<<updates>>>'
      echo $SEC" "$NON" "$RUNTIME" "$AGE
      exit 0
      
      
    • Add the scripts to git
      
      $ git add var/lib/puppet/files/site/usr/lib/check_mk_agent/plugins/mk_check_updates
      $ git add var/lib/puppet/files/site/etc/check_mk/check_updates.cfg 
      $ git commit -a -m "Adding check_mk client side scripts to report yum updates"
      $ git push
      
    • Add the scripts to the check_mk class to ensure that the clients get the code
      
      $ vim etc/puppet/manifests/classes/check_mk.pp
      
      
      
      # etc/puppet/manifests/classes/check_mk.pp
      
      class check_mk {
         case $operatingsystem {
            "centos",
            "fedora",
            "redhat": {
               package {["check_mk-agent", "check_mk-agent-logwatch"]:
                  ensure   => latest,
                  notify   => Service["xinetd"],
               }
               service { "xinetd":
                  ensure     => running,
                  enable     => true,
               }
               file { "/etc/check_mk/check_updates.cfg":
                  owner => "root",
                  group => "root",
                  mode => 755,
                  source => "puppet:///site/etc/check_mk/check_updates.cfg",
               }
               file { "/usr/lib/check_mk_agent/plugins/mk_check_updates":
                  owner => "root",
                  group => "root",
                  mode => 755,
                  source => "puppet:///site/usr/lib/check_mk_agent/plugins/mk_check_updates",
               }
           }
            default: { }
         }
      }
      
    • Ensure that the check_mk class is included in the node definitions (currently included in the baseclass template)
    • Git commit the changes to check_mk.pp class and push to the git server
  2. Install the python script on the nagios server (note user defined checks go in local/share/check_mk/checks, if you put them into $SITE/share.... they won't survive the next OMD upgrade)
    
    $ su - sitename
    $ vim local/share/check_mk/checks/updates
    
    
    
    #!/usr/bin/python
    # -*- encoding: utf-8; py-indent-offset: 4 -*-
    
    # Jonathan Mills 10/2011
    
    # Example output from agent:
    # [security] [non-security] [runtime (seconds)] [age of results (seconds)]
    # <<<updates>>>
    # 7 40 0 13
    #
    
    updates_default_values = (5, 20)
    
    # inventory
    def inventory_updates(checktype, info):
        #if len(info) >= 1 and len(info[0]) >= 1:
        #    return [ (None, None) ]
        inventory = []
        inventory.append( (None, "updates_default_values") )
        return inventory
    
    
    # check
    def check_updates(_no_item, params, info):
        # unpack check parameters
        min_num_sec, min_num_nonsec = params
    
        for line in info:
            perfdata = []
            sec = int(line[0])
            nonsec = int(line[1])
            age = int(line[3])
            infotext = "%s Security Updates, %s Non-Critical Updates  (Last Checked %s seconds ago)" % (sec, nonsec, age)
            perfdata.append( ( "Runtime (sec)", int(line[2]) ) )
            if sec > min_num_sec:
                return (2, "CRITICAL - " + infotext, perfdata)
            elif nonsec > min_num_nonsec:
                return (1, "WARNING - " + infotext, perfdata)
            else:
                return (0, "OK - " + infotext, perfdata)
    
    # declare the check to Check_MK
    check_info['updates'] = (check_updates, "Updates", 1, inventory_updates)
    
  3. Add a new time period 'nightly' to nagios that can be used to limit this check to running daily from 3AM to 4AM
    
    $ vim etc/nagios/conf.d/timeperiods.cfg 
    
    
    ###############################################################################
    # TIMEPERIODS.CFG - SAMPLE TIMEPERIOD DEFINITIONS
    #
    # NOTES: This config file provides you with some example timeperiod definitions
    #        that you can reference in host, service, contact, and dependency
    #        definitions.
    #
    #        You don't need to keep timeperiods in a separate file from your other
    #        object definitions.  This has been done just to make things easier to
    #        understand.
    #
    ###############################################################################
    
    # This defines a timeperiod where all times are valid for checks,
    # notifications, etc.  The classic "24x7" support nightmare. :-)
    define timeperiod{
        timeperiod_name 24x7
        alias           24 Hours A Day, 7 Days A Week
        sunday          00:00-24:00
        monday          00:00-24:00
        tuesday         00:00-24:00
        wednesday       00:00-24:00
        thursday        00:00-24:00
        friday          00:00-24:00
        saturday        00:00-24:00
    }
    
    # 'workhours' timeperiod definition
    define timeperiod{
           timeperiod_name workhours
           alias           Normal Work Hours
           monday          08:00-17:00
           tuesday         08:00-17:00
           wednesday       08:00-17:00
           thursday        08:00-17:00
           friday          08:00-17:00
    }
    
    # 'none' timeperiod definition
    define timeperiod{
        timeperiod_name  none
        alias            No Time Is A Good Time
    }
    
    # 'nightly' timeperiod definition
    define timeperiod{
             timeperiod_name         nightly
             alias                   Nightly Check
             sunday                  03:00-04:00  ; Every Sunday of every week
             monday                  03:00-04:00  ; Every Monday of every week
             tuesday                 03:00-04:00  ; Every Tuesday of every week
             wednesday               03:00-04:00  ; Every Wednesday of every week
             thursday                03:00-04:00  ; Every Thursday of every week
             friday                  03:00-04:00  ; Every Friday of every week
             saturday                03:00-04:00  ; Every Saturday of every week
    }
    
    
  4. Add the new check to check_mk main.mk file
    
    $ vim etc/check_mk/main.mk
    
    
    
    # check-updates (OMD 0.52 requires user defined vars to prepend and underscore)
    _updates_default_values = (6, 20) # check-updates: critical when 6 or more sec updates, warning when 20 or more non-sec updates
    
    extra_service_conf["check_period"] = [
      ( "nightly", ALL_HOSTS, [ "Updates" ] ), # check-updates: Only check for updates from 3 to 4AM as set in timeperiods.cfg
    ]
    
    extra_host_conf["max_check_attempts"] = [
      ( "1", ALL_HOSTS, [ "Updates" ] ), # check-updates: Only check for updates once
    ]
    
    # Enable notifications for specific services
    extra_service_conf["notifications_enabled"] = [
      ( "1", ALL_HOSTS, ["Check_MK"]),
      ( "0", ALL_HOSTS, ["Updates"]), # check-updates: Don't notify for security OS updates
      ( "1", ALL_HOSTS, ["Memory used"]),
      ( "1", ALL_HOSTS, ["IPMI Sensor Summary","fs_*"]),
      ( "1", ["linsrv"], ["IPMI Sensor Summary","ambient_temp"]),
      ( "1", ALL_HOSTS, ["Multipath *"]),
      ( "1", ["kvm"], ALL_HOSTS, ["CPU load"]),
      ( "1", ["kvm"], ALL_HOSTS, ["CPU utilization"]),
      ( "1", ["mailsrv"], ["Postfix Queue"]),
      ( "1", ["linsrv"], ALL_HOSTS, ["Dell OMSA"]),
      ( "0", ALL_HOSTS, ALL_SERVICES), # and disable notifications for everything else
    ]
    
    service_groups = [
      ( "updates", ALL_HOSTS, [ "Updates" ] ), # check-updates: Create updates service group to make viewing in web interface easier
    ]
    
    define_servicegroups = {
       "updates" : "RHEL/CentOS Yum Updates", # check-updates: Can now statically link to a service group web view: http://nagios.server/sitename/check_mk/view.py?view_name=servicegroup&servicegroup=updates
    }
    
    
  5. Rerun the inventory for the nodes
    
    $ check_mk -II node-01
    ...
    
    or for all nodes
    $ check_mk -II
    
  6. Reload the services
    
    $ check_mk -O
    
    
  7. Check the web page for the nodes, alternatively you can go straight to the Updates overview page:
    https://nagios.server/sitename/check_mk/view.py?view_name=servicegroup&servicegroup=updates

7 comments:

Unknown said...

I upgraded to 0.52 the other day, the upgrade finished with the following error:

Updating precompiled host checks for Check_MK...Invalid configuration variable 'updates_default_values'
--> Found 1 invalid variables
If you use own helper variables, please prefix them with _.
Failed

If you are already running 0.52, the instructions above will work, simply use _updates_default_values in main.mk instead of updates_default_values

If upgrading to 0.52 and you already have that variable in main.mk, add the underscore to the variable name prior to running the upgrade.

Unknown said...

I've updated the original post to include the updated variable name

Steve said...

Got this running on a few systems here. Is there an easy way to force the script to run regardless of the time interval?

Right now if I update a system I remove the /var/log/check_updates.log file and re-run the mk_check_updates script to get the new status in check_mk.

Unknown said...

I haven't found a way to do that, other than the same method you are already using, deleting the log file.

Would be nice to be able to force an update after patching a large number of systems.

darkfader said...

Instead of deleting the log, etc.
How about resetting the 7200s timer if yum.log has gotten an update in < 7200s and yum isn't running any more?

Anonymous said...

Hi,

How do you get the errata if you only have the centos repos? yum -v --security check-update gives me verbosity but even if I know that i have security updates it shows 0.

Best Tony

Unknown said...

You do realize this runs every time your check_agent is ran, no matter what you set the check_interval and time periods, right?