Cyberithub

Solved: nrpe.service: main process exited, code=exited, status=1/FAILURE

Advertisements

Last night I was working on my systems then suddenly saw a lot of Nagios Alerts on Dashboard showing "Socket Timeout" error. Then I went to my Server from where I was getting the nagios alerts and then tried to check the nrpe service status. It seems nrpe was not running then I started troubleshooting the issue and thought to put the steps in this article which might help you as well in case you are also facing the same issue.

Solved: nrpe.service: main process exited, code=exited, status=1/FAILURE 2

nrpe.service: main process exited, code=exited, status=1/FAILURE

Also Read: Solved: nrpe.service: main process exited, code=exited, status=2/INVALIDARGUMENT

You might have observed "main process exited, code=exited, status=1/FAILURE" error while trying to start or restart the nrpe service in your Linux based systems. When you run systemctl status nrpe command to check the status, then output will show something like below.

[root@localhost ~]# systemctl status nrpe
● nrpe.service - Nagios Remote Program Executor
Loaded: loaded (/usr/lib/systemd/system/nrpe.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2020-09-21 21:45:35 UTC; 2s ago
Docs: http://www.nagios.org/documentation
Process: 27745 ExecStopPost=/bin/rm -f /var/run/nrpe/nrpe.pid (code=exited, status=0/SUCCESS)
Process: 27743 ExecStart=/usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -f $NRPE_SSL_OPT (code=exited, status=1/FAILURE)
Main PID: 27743 (code=exited, status=1/FAILURE)

Sep 21 21:45:35 localhost systemd[1]: Started Nagios Remote Program Executor.
Sep 21 21:45:35 localhost systemd[1]: Starting Nagios Remote Program Executor...
Sep 21 21:45:35 localhost systemd[1]: nrpe.service: main process exited, code=exited, status=1/FAILURE
Sep 21 21:45:35 localhost systemd[1]: Unit nrpe.service entered failed state.
Sep 21 21:45:35 localhost systemd[1]: nrpe.service failed.

NOTE:

Please note that here I am using root user to run all the below commands. You can use any user with sudo access to run all these commands. For more information Please check Step by Step: How to Add User to Sudoers to provide sudo access to the User.

Well, this error could occur due to multiple possible issue so it is important to discuss all the possible scenarios that could result in this error. One of the Common Scenario that one can think of is the Permission issue to create the nrpe.pid file. You can check the path of nrpe.pid file mentioned in  /etc/nagios/nrpe.cfg file using below command.

[root@localhost ~]# cat /etc/nagios/nrpe.cfg | grep nrpe.pid
pid_file=/etc/nagios/nrpe.pid

To verify the permission issue you need to go to /etc/nagios directory and check the permission of the file nrpe.pid using ls -lrt nrpe.pid command. You need to make sure that this file has correct permissions.

[root@localhost ~]# cd /etc/nagios/
[root@localhost nagios]# ls -lrt nrpe.pid
-rw-r--r--. 1 root root 5 Sep 21 21:51 nrpe.pid

If the file is having the correct permission then you can try changing the path of the pid_file once to /var/run/nagios and then restart the service to check if this helps. You can open the file using vi editor by running vi /etc/nagios/nrpe.cfg command as shown below and after editing the file you can save and exit by pressing Esc and then :wq!

[root@localhost ~]# vi /etc/nagios/nrpe.cfg
pid_file=/var/run/nagios/nrpe.pid

Then restart the nrpe service by using systemctl restart nrpe command as shown below.

[root@localhost ~]# systemctl restart nrpe

Then check the status again by using systemctl status nrpe command as shown below.

[root@localhost ~]# systemctl status nrpe
● nrpe.service - Nagios Remote Program Executor
Loaded: loaded (/usr/lib/systemd/system/nrpe.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2020-09-21 21:45:35 UTC; 2s ago
Docs: http://www.nagios.org/documentation
Process: 27745 ExecStopPost=/bin/rm -f /var/run/nrpe/nrpe.pid (code=exited, status=0/SUCCESS)
Process: 27743 ExecStart=/usr/sbin/nrpe -c /etc/nagios/nrpe.cfg -f $NRPE_SSL_OPT (code=exited, status=1/FAILURE)
Main PID: 27743 (code=exited, status=1/FAILURE)

Sep 21 21:45:35 localhost systemd[1]: Started Nagios Remote Program Executor.
Sep 21 21:45:35 localhost systemd[1]: Starting Nagios Remote Program Executor...
Sep 21 21:45:35 localhost systemd[1]: nrpe.service: main process exited, code=exited, status=1/FAILURE
Sep 21 21:45:35 localhost systemd[1]: Unit nrpe.service entered failed state.
Sep 21 21:45:35 localhost systemd[1]: nrpe.service failed.

If it still does not help then you can check the journalctl error to find out the root cause by using journalctl -xfeu nrpe command as shown below. You can check journalctl command Man Page to Know more about all the available options.

[root@localhost ~]# journalctl -xfeu nrpe
-- Logs begin at Fri 2020-03-13 04:24:07 UTC, end at Mon 2020-09-21 21:22:01 UTC. --
Sep 21 21:20:48 localhost systemd[1]: Started Nagios Remote Program Executor.
-- Subject: Unit nrpe.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit nrpe.service has finished starting up.
--
-- The start-up result is done.
Sep 21 21:20:48 localhost systemd[1]: Starting Nagios Remote Program Executor...
-- Subject: Unit nrpe.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit nrpe.service has begun starting up.
Sep 21 21:20:48 localhost systemd[1]: nrpe.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Sep 21 21:20:48 localhost systemd[1]: Unit nrpe.service entered failed state.
Sep 21 21:20:48 localhost systemd[1]: nrpe.service failed.

You can also try to find the error from systemd-analyze output using systemd-analyze blame | grep -i nrpe command as shown below.

[root@localhost ~]# systemd-analyze blame | grep -i nrpe

If you are running Kubernetes Cluster in your System then you need to check the Node status by running kubectl get nodes command as shown below.

[root@localhost ~]# kubectl get nodes
NAME          STATUS   ROLES  AGE VERSION
192.168.0.103 NotReady master 29d v1.14.5

In my case I saw that Node is showing in "NotReady" State so i checked the status of my kubectl service and found some errors there. So i just restarted the service by using systemctl restart kubelet command. Then I again checked the status of my nodes using kubectl get nodes command and found that it came back to "Ready" State.

[root@localhost ~]# kubectl get nodes
NAME          STATUS ROLES  AGE VERSION
192.168.0.103 Ready  master 29d v1.14.5

Once it came back to Ready state I checked my Nagios Server again and observed that all the alerts are cleared. Then i tried starting the nrpe service again and found that nrpe service now started successfully.

In your case, even after checking all these logs if you still does not find the root cause then my recommendation is to find out the changes that is being done after which nrpe is not starting if it was running fine before. Finally if everything fails then the simplest solution is to reboot the system once to check if that helps.

Hope you enjoyed this debugging session on error "nrpe.service: main process exited, code=exited, status=1/FAILURE". Please let me know your feedback on Comment Box.

 

 

 

Recommended Posts:-

8 Easy Ways to check Ubuntu Version using Bash Command Line

How to Install Let's Encrypt(Certbot) on RHEL/CentOS 8 Using 10 Easy Steps

33 Practical Examples of ulimit command in Linux/Unix for Professionals

5 Easy Steps to Install Openssh-Server on Ubuntu 20.04 to Enable SSH 

Unix/Linux Find Files and Directories Owned By a Particular User(5 Useful Examples)

15 Practical Bash For Loop Examples in Linux/Unix for Professionals

6 Popular Methods to List All Running Services Under Systemd in Linux

How to Limit CPU Limit of a Process Using CPULimit in Linux (RHEL/CentOS 7/8)

Leave a Comment