Wednesday, March 23, 2011

Install Luster Monitoring Tool (LMT) on CentOS 5.5

In this article, I document the steps to build and install LMT and it's dependency Cerebro. The configuration of Cerebro can get pretty complex, in this example we make it simple to focus on LMT.

I don't cover MySQL configuration yet but plan to do so in the near future.

Cerebro and LMT Build Instructions

The build and install OS are CentOS 5.5 x86_64 systems.
  1. Download and build cerebro (http://sourceforge.net/projects/cerebro/files/cerebro/) on your favorite build machine (make sure to set up your ~/rpmbuild directory structure and your ~/.rpmmacros file).
    • Download the latest source code (1.12 at this time) to ~/rpmbuild/SOURCES/
    • Download the src.rpm (I found it under the version 1.10 tree) and extract
      
      $ mkdir ~/sources/cerebro
      $ cd ~/sources/cerebro
      $ rpm2cpio cerebro-1.10-1.src.rpm | cpio -idvm
      $ mv cerebro.spec ~/rpmbuild/SPECS/
      
      
    • Modify the cerebro.spec file as follows for version 1.12 (unified diff format)
      
      --- cerebro.spec 2010-04-07 16:17:35.000000000 -0500
      +++ cerebro.spec.new 2011-03-23 14:25:02.654373643 -0500
      @@ -1,12 +1,12 @@
       Name:    cerebro 
      -Version: 1.10
      +Version: 1.12
       Release: 1
       
       Summary: Cerebro cluster monitoring tools and libraries
       Group: System Environment/Base
       License: GPL
      -Source: cerebro-1.10.tar.gz
      -BuildRoot: %{_tmppath}/cerebro-1.10
      +Source: cerebro-1.12.tar.gz
      +BuildRoot: %{_tmppath}/cerebro-1.12
       
       %description
       Cerebro is a collection of cluster monitoring tools and libraries.
      @@ -90,7 +90,7 @@
       Event module to monitor node up/down.
       
       %prep
      -%setup  -q -n cerebro-1.10
      +%setup  -q -n cerebro-1.12
       
       %build
       %configure --program-prefix=%{?_program_prefix:%{_program_prefix}} \
      @@ -157,6 +157,7 @@
       %defattr(-,root,root)
       %doc README NEWS ChangeLog DISCLAIMER DISCLAIMER.UC COPYING
       %config(noreplace) %{_sysconfdir}/init.d/cerebrod
      +%config(noreplace) %{_sysconfdir}/cerebro.conf
       %{_includedir}/*
       %dir %{_libdir}/cerebro
       %{_libdir}/libcerebro*
      
    • Before building the rpm I had to comment out the %_vendor string in my .rpmmacros file, otherwise the configure kept adding the vendor to the --target switch
    • Build the rpm, this will build several rpms, for Lustre Monitoring Tool all we need is the cerebro package
      
      $ rpmbuild -ba --sign ~/rpmbuild/SPECS/cerebro.spec
      
      
    • Look at the package info
      
      $ rpm -qpi ~/rpmbuild/RPMS/x86_64/cerebro-1.12-1.x86_64.rpm 
      Name        : cerebro                      Relocations: (not relocatable)
      Version     : 1.12                              Vendor: (none)
      Release     : 1                             Build Date: Wed 23 Mar 2011 02:12:09 PM CDT
      Install Date: (not installed)               Build Host: buildhost01
      Group       : System Environment/Base       Source RPM: cerebro-1.12-1.src.rpm
      Size        : 1039859                          License: GPL
      Signature   : DSA/SHA1, Wed 23 Mar 2011 02:12:09 PM CDT, Key ID xxxx
      Summary     : Cerebro cluster monitoring tools and libraries
      Description :
      Cerebro is a collection of cluster monitoring tools and libraries.
      
      
    • Take a look at the contents of the rpm
      
      $ rpm -qpl ~/rpmbuild/RPMS/x86_64/cerebro-1.12-1.x86_64.rpm 
      /etc/cerebro.conf
      /etc/init.d/cerebrod
      /usr/include/cerebro
      /usr/include/cerebro.h
      ...
      
  2. LMT RPM build
    • Temporarily install cerebro to satisfy the build requirement
      
      $ sudo rpm -Uvh ~/rpmbuild/RPMS/x86_64/cerebro-1.12-1.x86_64.rp
      
      
    • Install lua-devel package from Epel
      
      $ sudo yum install lua-devel
      
      =============================================================================================
       Package                Arch                Version                  Repository         Size
      =============================================================================================
      Installing:
       lua-devel              i386                5.1.4-4.el5              epel               18 k
       lua-devel              x86_64              5.1.4-4.el5              epel               18 k
      Installing for dependencies:
       lua                    i386                5.1.4-4.el5              epel              228 k
       lua                    x86_64              5.1.4-4.el5              epel              229 k
      
      
    • Download the lmt src rpm
      
      $ mkdir ~/sources/lmt
      $ cd ~/sources/lmt
      $ wget http://lmt.googlecode.com/files/lmt-3.1.2-1.src.rpm
      
      $ rpmbuild --rebuild --sign lmt-3.1.2-1.src.rpm
      
      
      $ ls -l ~/rpmbuild/RPMS/x86_64/lmt-*
      lmt-3.1.2-1.el5.myrepo.x86_64.rpm
      lmt-server-3.1.2-1.el5.myrepo.x86_64.rpm
      lmt-server-agent-3.1.2-1.el5.myrepo.x86_64.rpm
      
      
    • LMT-GUI RPM build
    • Install the prerequisite java-devel
      
      $ sudo yum install java-devel
      
      =======================================================================================================
       Package                        Arch         Version                       Repository             Size
      =======================================================================================================
      Installing:
       java-1.6.0-openjdk-devel       x86_64       1:1.6.0.0-1.16.b17.el5        centos5-updates        12 M
      
      Transaction Summary
      =======================================================================================================
      
    • Download the lmt-gui src rpm and build
      
      $ mkdir ~/sources/lmt-gui
      $ cd ~/sources/lmt-gui
      $ wget http://lmt.googlecode.com/files/lmt-gui-3.0.0-1.src.rpm
      
      $ rpmbuild --rebuild --sign lmt-gui-3.0.0-1.src.rpm 
      
      
      
      $ rpm -qpi ~/rpmbuild/RPMS/x86_64/lmt-gui-3.0.0-1.el5.myrepo.x86_64.rpm 
      Name        : lmt-gui                      Relocations: (not relocatable)
      Version     : 3.0.0                             Vendor: (none)
      Release     : 1.el5.myrepo                 Build Date: Wed 23 Mar 2011 02:44:25 PM CDT
      Install Date: (not installed)               Build Host: build01
      Group       : Applications/System           Source RPM: lmt-gui-3.0.0-1.el5.myrepo.src.rpm
      Size        : 2347300                          License: GPL
      Signature   : DSA/SHA1, Wed 23 Mar 2011 02:44:25 PM CDT, Key ID xxxx
      Packager    : Jim Garlick 
      URL         : http://code.google.com/p/lmt
      Summary     : Lustre Montitoring Tools Client
      Description :
      Lustre Monitoring Tools (LMT) GUI Client
      
      
    • Next I copy the RPMs to our local repository
      
      $ cd ~/rpmbuild/RPMS/x86_64/
      $ cp -a lmt-* cerebro-1.12-1.x86_64.rpm /share/repo/mirror/myrepo/el5/x86_64/RPMS/
      
      $ cd ../../SRPMS
      $ cp -a cerebro-* /share/repo/mirror/myrepo/el5/SRPMS/
      $ cd ~/sources
      $ cp -a lmt/lmt-3.1.2-1.src.rpm lmt-gui/lmt-gui-3.0.0-1.src.rpm /share/repo/mirror/myrepo/el5/SRPMS/
      
    • Rebuild the repodata for the repository
      
      $ createrepo /share/repo/mirror/myrepo/el5/x86_64/
      

Cerebro and LMT Install Instructions

  1. Install cerebro and lmt-server-agent on the mds's and oss's
    
    $ for n in mds-{0..1} oss-{0..2}; do ssh root@lustre-$n yum install -y cerebro lmt-server-agent ; done
    
  2. Install cerebro and lmt-server on the management server
    
    $ ssh root@management-server yum -y install cerebro lmt-server
    
  3. Modify the /etc/cerebro.conf file to look like this (by default the entire file is comments, append this to the end)
    • On the Lustre servers
      
      cerebro_metric_server 192.168.0.10
      cerebro_event_server 192.168.0.10
      cerebrod_heartbeat_frequency 10 20
      cerebrod_speak on
      cerebrod_speak_message_config 192.168.0.10
      cerebrod_listen off
      
    • On the management server
      
      cerebrod_heartbeat_frequency 10 20
      cerebrod_speak on
      cerebrod_speak_message_config 192.168.0.10
      cerebrod_listen on
      cerebrod_listen_message_config 192.168.0.10
      cerebrod_metric_controller on
      cerebro_metric_server 192.168.0.10
      cerebrod_event_server on
      cerebro_event_server 192.168.0.10
      
  4. Configure the daemon to start on the servers and management server
    
    $ for n in mds-{0..1} oss-{0..2}; do ssh root@lustre-$n "/sbin/chkconfig cerebrod on && /sbin/service cerebrod start" ; done
    
    $ ssh root@managment-server "/sbin/chkconfig cerebrod on && /sbin/service cerebrod start"
    
    
  5. Login to the management server and verify that the server see's all of the servers (this can be run from any of the servers, not just the management server)
    
    $ /usr/sbin/cerebro-stat -m updown_state
    
    MODULE DIR = /usr/lib64/cerebro
    mgmt-srv: 1
    lustre-mds-0: 1
    lustre-mds-1: 1
    lustre-oss-0: 1
    lustre-oss-1: 1
    lustre-oss-2: 1
    
  6. Now run the -l switch to see the available metrics (lmt_mdt, lmt_ost and lmt_osc are added by the lmt-server package)
    
    $ /usr/sbin/cerebro-stat -l
    
    MODULE DIR = /usr/lib64/cerebro
    metric_names
    cluster_nodes
    lmt_mdt
    updown_state
    lmt_ost
    lmt_osc
    
  7. Run the ltop (will default to the first Lustre file system found unless otherwise specified) command on the management node to view a toplike output for OSTs
    
    $ ltop
    
    Filesystem: lustre
        Inodes:    209.344m total,     77.286m used ( 37%),    132.057m free
         Space:     42.978t total,     15.931t used ( 37%),     27.047t free
       Bytes/s:  0.000g read,       0.000g write,                 1 IOPS
       MDops/s:  4 open,        2 close,     285 getattr, 0 setattr
                     0 link,        0 unlink,      0 mkdir,         0 rmdir
                     1 statfs, 5 rename,      0 getxattr
    >OST S        OSS   Exp   CR rMB/s wMB/s  IOPS   LOCKS  LGR  LCR %cpu %mem %spc
    0000 F stre-oss-0   131    0     0     0     0  515290   87    0    0  100   41
    0001 F stre-oss-0   131    0     0     0     0  528633  106    0    0  100   41
    0002 F stre-oss-1   131    0     0     0     0  509573   16    0    0  100   35
    0003 F stre-oss-1   131    0     0     0     0  518495   21    0    0  100   36
    0004 F stre-oss-2   131    0     0     0     0  533299   49    0    0  100   34
    0005 F stre-oss-2   131    0     0     0     0  527621   61    0    0  100   35
    

Friday, March 18, 2011

Using check_openmanage with check_mk

Here's my guide to installing check_openmanage in an OMD site, in case it helps anyone:

This was done on the following system:
Unless otherwise specified all paths are relative to the site owners home (ex: /opt/omd/sites/mysite)
  1. Make sure your dell servers had the following SNMP packages installed prior to installing OMSA (if not, it's easy to 'yum remove srvadmin-\*' 'yum install srvadmin-all': net-snmp, net-snmp-libs, net-snmp-utils
    • Start the OMSA services 'srvadmin-services.sh start' and then check 'srvadmin-services.sh status' to verify that the snmpd component is running
    • Ensure that snmpd is running and configured
    • Configure the firewall to allow access from your OMD server to udp port 161
  2. change users on your OMD server to the site user: $ su - mysite
  3. Download the latest check_openmanage from http://folk.uio.no/trondham/software/check_openmanage.html to ~/tmp and extract
  4. copy the check_openmanage script to local/lib/nagios/plugins (this defaults to $USER2$ in your commands)
    
    $ cp tmp/check_openmanage-3.6.5/check_openmanage local/lib/nagios/plugins/
    $ chmod +x local/lib/nagios/plugins/check_openmanage
    
  5. copy the PNP4Nagios template
    
    $ cp tmp/check_openmanage-3.6.5/check_openmanage.php etc/pnp4nagios/templates/
    
  6. If you are running CentOS 5.5/RHEL 5.5 or earlier (it's unclear whether or not this will be an issue in EL5.6), and you want performance graphs, you'll need to edit the check_openmanage.php template (see this bug: https://bugs.op5.com/bug_view_advanced_page.php?bug_id=4008), comment out the original condition and replace:
    
    $ vi etc/pnp4nagios/templates/check_openmanage.php
    
    ##    if(preg_match('/^enclosure_(?.+?)_temp_\d+$/', $NAME[$i], $matches)
    ##       || preg_match('/^e(?.+?)t\d+$/', $NAME[$i], $matches)){
    # This is the fixed line for CentOS 5.5 and prior
         if(preg_match('/^enclosure_(.+?)_temp_\d+$/', $NAME[$i], $matches)){
    
  7. Test check_openmanage to see that it can successfully query a node (ack, I need to update my driver)
    
    local/lib/nagios/plugins/check_openmanage -H dell-r710-01 -p -C MySecretCommunity
    
    Controller 1 [SAS 5/E Adapter]: Driver '3.04.13rh' is out of date|fan_0_system_board_fan_1_rpm=3600;0;0 fan_1_system_board_fan_3_rpm=3600;0;0 fan_2_system_board_fan_4_rpm=3600;0;0 fan_3_system_board_fan_5_rpm=3600;0;0 fan_4_system_board_fan_2_rpm=3600;0;0 pwr_mon_0_ps_1_current=0.6;0;0 pwr_mon_1_ps_2_current=0.4;0;0 pwr_mon_2_system_board_system_level=182;0;0 temp_0_system_board_ambient=21;42;47
    
    
  8. Edit the main.mk file to add tags to the OMSA hosts and the check command (the perfdata_format and monitoring_host I got from a previous emailer to the list, not sure if they are needed)
    
    all_hosts = [ 'dell-r710-01|linsrv|kvm|omsa|nonpub', 'dell-2950-01|linsrv|omsa|nonpub', 'hp-srv-01|winsrv|smb', ]
    
    # Are you using PNP4Nagios and MRPE checks? This will make PNP
    # choose the correct template for standard Nagios checks:
    perfdata_format = "pnp"
    #set the monitoring host
    monitoring_host = "nagios"
    
    # SNMP Community
    snmp_default_community = "someCommunityRO"
    
    snmp_communities = [
      ( "MySecretCommunity", ["nonpub"], ALL_HOSTS ),
    ]
    
    # other main.mk stuff
    
    extra_nagios_conf += r"""
    
    # ARG1: community string
    define command {
        command_name    check_openmanage
        command_line    $USER2$/check_openmanage -H $HOSTADDRESS$ -p -C $ARG1$
    }
    
    """
    
    legacy_checks = [
      # On all hosts with the tag 'omsa' check Dell OpenManage for status 
      # service description "Dell OMSA", process performance data
      ( ( "check_openmanage!MySecretCommunity", "Dell OMSA", True), [ "omsa" ], ALL_HOSTS ),
    ]
    
  9. That should be it, simply reload and your new check should start working for all hosts tagged with 'omsa'
    
    $ check_mk -O
    
    
To make it cleaner, the legacy_check should be able to determine the community string based on the settings in snmp_default_community and snmp_communities

I've only been testing check_mk for a few days now and am not sure how to do that. (suggestions?)

Hope this helps, and comments are welcome.