{"title": "CentOS\u642d\u5efaNagios\u76d1\u63a7", "update_time": "2013-04-14 20:32:18", "tags": "centos nagios", "pid": "236", "icon": "linux.png"}
## Nagios服务端 1.安装软件包 ``` yum install -y httpd ``` 2.下载nagios ``` wget http://syslab.comsenz.com/downloads/linux/nagios-3.0.5.tar.gz wget http://syslab.comsenz.com/downloads/linux/nagios-plugins-1.4.13.tar.gz wget http://syslab.comsenz.com/downloads/linux/nrpe-2.12.tar.gz ``` 3.添加nagios账号 ``` useradd nagios ``` 4.编译安装nagios ``` mkdir /opt/hadoop/ tar -xzvf nagios-3.0.5.tar.gz cd nagios-3.0.5 ./configure --prefix=/opt/hadoop/nagios make all make fullinstall mkdir /opt/hadoop/nagios/etc mkdir /opt/hadoop/nagios/etc/objects cp ./sample-config/cgi.cfg /opt/hadoop/nagios/etc/ cp ./sample-config/nagios.cfg /opt/hadoop/nagios/etc/ cp ./sample-config/resource.cfg /opt/hadoop/nagios/etc/ cp ./sample-config/template-object/commands.cfg /opt/hadoop/nagios/etc/objects/ cp ./sample-config/template-object/contacts.cfg /opt/hadoop/nagios/etc/objects/ cp ./sample-config/template-object/timeperiods.cfg /opt/hadoop/nagios/etc/objects/ cp ./sample-config/template-object/templates.cfg /opt/hadoop/nagios/etc/objects/ cp ./sample-config/template-object/localhost.cfg /opt/hadoop/nagios/etc/objects/ touch /opt/hadoop/nagios/var/nagios.log chmod -R 755 /opt/hadoop/nagios/etc/ chown -R nagios:nagios /opt/hadoop/nagios ``` 5.编译安装nagios-plugins ``` tar zxvf nagios-plugins-1.4.13.tar.gz cd nagios-plugins-1.4.13 ./configure --prefix=/opt/hadoop/nagios --with-nagios-user=nagios --with-nagios-group=nagios make && make install|| ``` 检查是否已经安装成功,看这个目录下是否有插件文件 ``` ls /opt/hadoop/nagios/libexec/ ``` 6.安装nrpe ``` tar zxvf nrpe-2.12.tar.gz cd nrpe-2.12 ./configure --prefix=/opt/hadoop/nagios --enable-ssl --enable-command-args make all make install-plugin make install-daemon make install-daemon-config ``` 7.配置httpd 添加web账号 ``` htpasswd -c /opt/hadoop/nagios/etc/htpasswd.users nagiosadmin ``` ## Nagios客户端 1.准备软件包 ``` wget http://syslab.comsenz.com/downloads/linux/nagios-plugins-1.4.13.tar.gz wget http://syslab.comsenz.com/downloads/linux/nrpe-2.12.tar.gz ``` 2.添加nagios账号,准备安装目录 ``` mkdir /opt/hadoop/nagios useradd nagios ``` 3.编译安装nrpe ``` tar -xzvf nrpe-2.12.tar.gz cd nrpe-2.12 ./configure --prefix=/opt/hadoop/nagios --enable-ssl --enable-command-args make all make install-plugin make install-daemon make install-daemon-config ``` 4.安装nagios-plugin ``` tar -xzvf nagios-plugins-1.4.13.tar.gz cd nagios-plugins-1.4.13 ./configure --prefix=/opt/hadoop/nagios --with-nagios-user=nagios --with-nagios-group=nagios make && make install ``` 检查是否已经安装成功,看这个目录下是否有插件文件 ``` ls /opt/hadoop/nagios/libexec/ ``` 5.配置nrpe ``` vim /opt/hadoop/nagios/etc/nrpe.cfg 找到”allowed_hosts=127.0.0.1” 改成 “allowed_hosts=127.0.0.1,10.130.2.72”,后边的IP是nagios服务端IP 找到” dont_blame_nrpe=0” 改成 “dont_blame_nrpe=1” ``` 6.一段nrpe启停脚本,放在/etc/init.d/nrpe里 ``` #!/bin/bash # # chkconfig: 2345 55 25 # description: NRPE Daemon # # source function library . /etc/rc.d/init.d/functions RETVAL=0 prog='nrpe' NRPE_CFG='/opt/hadoop/nagios/etc/nrpe.cfg' NRPE_PRG='/opt/hadoop/nagios/bin/nrpe' NRPE_OPT='-d' PID_FILE='/var/run/nrpe.pid' start() { echo -n $"Starting $prog: " [ -f $PID_FILE ] && rm -f $PID_FILE $NRPE_PRG -c $NRPE_CFG $NRPE_OPT pid=`ps aux | grep -v grep | grep $NRPE_PRG | awk '{print $2}'` echo $pid > $PID_FILE if ps aux | grep -v grep | grep -q $NRPE_PRG ; then RETVAL=0 success else RETVAL=1 failure fi echo } stop() { echo -n $"Stopping $prog: " ps --pid=`cat $PID_FILE` &>/dev/null if [ $? -eq 0 ] ; then kill -9 `cat $PID_FILE` RETVAL=0 fi success echo RETVAL=0 } case "$1" in start) start ;; stop) stop ;; restart) stop start ;; status) status -p $PID_FILE $prog RETVAL=$? ;; *) echo $"Usage: $0 {start|stop|restart|status}" RETVAL=1 esac exit $RETVAL ``` 6.启动nrpe ``` /etc/init.d/nrpe start ``` ## Nagios服务端添加被监控机 1.配置监控机目录 ``` mkdir /opt/hadoop/nagios/etc/servers vim /opt/hadoop/nagios/etc/nagios.cfg 追加cfg_dir=/opt/hadoop/nagios/etc/servers ``` 2.添加配置的机器 ``` vim /opt/hadoop/nagios/etc/servers/10.130.2.22.cfg define host{ use linux-server host_name 10.130.2.22 alias 10.130.2.22 address 10.130.2.22 } define service{ use generic-service host_name 10.130.2.22 service_description check_ping check_command check_ping!100.0,20%!200.0,50% max_check_attempts 5 normal_check_interval 1 } define service{ use generic-service host_name 10.130.2.22 service_description check_ssh check_command check_ssh max_check_attempts 5 normal_check_interval 1 } ``` 3.reload nagios服务端使配置生效 ``` service nagios reload ``` 重新加载nagios后就可以在nagios的界面上看到新的被监控的机器了 4.添加使用nrpe的监控 ``` 在/opt/hadoop/nagios/etc/objects/commands.cfg里增加如下行 define command{ command_name check_nrpe command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ } ``` 在服务器监控配置文件中加入如下行,确保被监控机的nrpe服务是开的 ``` define service{ use generic-service host_name 10.130.2.22 service_description check_load check_command check_nrpe!check_load max_check_attempts 5 normal_check_interval 1 } ``` 重新加载nagios使配置生效。 ``` service nagios reload ``` 5.自定义监控脚本 编写脚本check_diskmount.sh ``` vim /opt/hadoop/nagios/libexec/check_diskmount.sh #!/bin/bash num=`cat /proc/mounts | grep '/disk' | wc -l` if [ $num -eq 12 ] ; then echo "OK - mount disk is $num" exit 0 else echo "Critical - mount disk is $num" exit 1 fi ``` 加上可执行权限 ``` chmod +x /opt/hadoop/nagios/libexec/check_diskmount.sh ``` 在被监控机的nrpe里加入自定义脚本路径 ``` vim /opt/hadoop/nagios/etc/nrpe.cfg command[check_diskmount]=/opt/hadoop/nagios/libexec/check_diskmount.sh ``` 重启nrpe ``` /etc/init.d/nrpe restart ``` 在nagios服务端加入配置 ``` vim /opt/hadoop/nagios/etc/servers/10.130.2.22.cfg define service{ use generic-service host_name s9xplan2.isv.cm6 service_description check_diskmount check_command check_nrpe!check_diskmount max_check_attempts 3 normal_check_interval 1 } ``` 重新加载nagios,使得配置生效 ``` service nagios reload ```