In today’s IT infrastructure, ensuring that servers operate within their capacity limits is crucial for maintaining system health and performance. One of the key aspects of server management is memory utilization monitoring. High memory usage can lead to slower application response times, system instability, or even crashes. To address this, we introduce an efficient memory monitoring solution: a Bash script designed to trigger threshold alerts, seamlessly integrating with Nagios, a widely used monitoring tool.
This script (https://github.com/tecrahul/nagios-plugins/blob/main/check_memory.sh) is a testament to the power of open-source collaboration. Inspired by the work available on our GitHub, this article expands upon the script’s utility, offering a guide on how to implement and customize it for your Nagios monitoring environment.
Why Monitor Memory Usage?
Memory is a finite resource that applications and processes consume over time. Without proper monitoring, a system can run out of memory, leading to paging or swapping, which severely degrades performance. Monitoring memory usage helps identify potential issues before they escalate, ensuring that applications run smoothly and reliably.
The Script Explained
The Bash script is designed to check the system’s memory usage against predefined warning and critical thresholds. If memory usage exceeds these thresholds, the script triggers alerts, allowing system administrators to take preemptive actions to mitigate any potential issues. The script supports output in different units (Bytes, Kilobytes, Megabytes, Gigabytes) for flexible monitoring across various system configurations.
#!/bin/bash
# ==============================================================================
# SCRIPT: check_memory.sh
# AUTHOR: Rahul Kumar
# COPYRIGHT: tecadmin.net
# DESCRIPTION:
# This script is designed to monitor and report on system memory usage. It
# allows for warning and critical thresholds to be set for memory usage
# percentages, providing alerts based on the specified criteria. The script
# supports output in different units (Bytes, Kilobytes, Megabytes, Gigabytes)
# for flexible monitoring requirements. This utility is particularly useful
# for system administrators and monitoring tools like Nagios to keep an eye
# on system health and perform proactive maintenance.
#
# USAGE:
# ./check_memory.sh [ -w ] [ -c ] [ -u ]
# -w, --warning=INTEGER[%] Warning threshold as a percentage of used memory.
# -c, --critical=INTEGER[%] Critical threshold as a percentage of used memory.
# -u, --unit=UNIT Unit to use for output (b, K, M, G). Default: M
#
# EXAMPLES:
# ./check_memory.sh -w 80 -c 90 -u M
# This command sets a warning threshold at 80% memory usage and a critical
# threshold at 90%, with output in Megabytes.
# ==============================================================================
PROGNAME="check_memory"
VERSION='1.0'
FREECMD='/usr/bin/free'
UNIT='M' # Default unit
WARNING_THRESHOLD=80
CRITICAL_THRESHOLD=90
# Function to show usage
usage() {
echo "Usage: $0 [ -w ] [ -c ] [ -u ]"
echo " -w, --warning=INTEGER[%] Warning threshold as a percentage of used memory."
echo " -c, --critical=INTEGER[%] Critical threshold as a percentage of used memory."
echo " -u, --unit=UNIT Unit to use for output (b, K, M, G). Default: $UNIT"
exit 3
}
# Parse command line options
while getopts ":w:c:u:" opt; do
case $opt in
w) WARNING_THRESHOLD="$OPTARG" ;;
c) CRITICAL_THRESHOLD="$OPTARG" ;;
u) UNIT="$OPTARG" ;;
\?) usage ;;
esac
done
# Function to convert memory to the specified unit
convert_memory() {
local memory=$1
case $UNIT in
b) echo $memory ;;
K) echo $((memory / 1024)) ;;
M) echo $((memory / 1024 / 1024)) ;;
G) echo $((memory / 1024 / 1024 / 1024)) ;;
*) echo "Error: Unknown unit $UNIT. Must be one of 'b', 'K', 'M', 'G'."; exit 3 ;;
esac
}
# Extract memory data
total_bytes=$(grep MemTotal /proc/meminfo | awk '{print $2 * 1024}')
free_bytes=$(grep MemFree /proc/meminfo | awk '{print $2 * 1024}')
buffers_bytes=$(grep Buffers /proc/meminfo | awk '{print $2 * 1024}')
cached_bytes=$(grep "^Cached" /proc/meminfo | awk '{print $2 * 1024}')
available_bytes=$((free_bytes + buffers_bytes + cached_bytes))
# Convert to specified unit
total=$(convert_memory $total_bytes)
available=$(convert_memory $available_bytes)
# Calculate used memory
used=$(convert_memory $((total_bytes - available_bytes)))
# Calculate usage percentage
usage_percentage=$((100 - (available * 100 / total)))
# Compare usage against thresholds
if [ "$usage_percentage" -ge "$CRITICAL_THRESHOLD" ]; then
echo "CRITICAL: Memory usage is above critical threshold ($CRITICAL_THRESHOLD%). $used$UNIT used ($usage_percentage% of total)."
exit 2
elif [ "$usage_percentage" -ge "$WARNING_THRESHOLD" ]; then
echo "WARNING: Memory usage is above warning threshold ($WARNING_THRESHOLD%). $used$UNIT used ($usage_percentage% of total)."
exit 1
else
echo "OK: Memory usage is within bounds. $used$UNIT used ($usage_percentage% of total)."
exit 0
fi
Key Features
- Threshold Alerts: Define custom warning and critical levels for memory usage as a percentage of total memory.
- Flexible Units: Display memory usage in units that best fit your monitoring needs.
- Nagios Integration: Designed to work seamlessly with Nagios, making it easy to incorporate into existing monitoring setups.
Implementation Guide
- Download the Script: Clone or download the script from GitHub repository.
- Permissions: Ensure the script is executable by running:
chmod +x check_memory.sh
- Nagios Configuration: Integrate the script into your Nagios monitoring environment. Define a command in your Nagios configuration that points to the script, and set up service checks for your hosts to utilize this command.
- NRPE Configuration: To monitor memory on remote Linux hosts, You can also use this script with NRPE client. Then the Nagios server can execute script remotely.
- Testing: Test the script manually to ensure it triggers alerts as expected. Adjust the thresholds and units as necessary to fine-tune the monitoring.
Integration with Nagios
To integrate the script with Nagios server to monitor Memory of local instance. You can define a new command in your Nagios configuration:
define command{
command_name check_memory_usage
command_line /path/to/check_memory.sh -w $ARG1$ -c $ARG2$ -u $ARG3$
}
Then, use this command in your service definitions to monitor memory usage on your Nagios host. Replace /path/to/ with the actual path to the script.
The next step will help you to monitor Memory of remote hosts via NRPE client.
Integration with NRPE Client
To integrate the memory monitoring script with Nagios through NRPE (Nagios Remote Plugin Executor), you’ll need to add a command definition to the nrpe.cfg file on the remote host where NRPE is running. This definition will instruct NRPE on how to execute the script when requested by the Nagios server.
- NRPE Client Configuration: Add the below entry for the nrpe.cfg file to incorporate the memory monitoring script:
command[check_memory]=/usr/lib/nagios/plugins/check_memory.sh -w 80 -c 90 -u M
- Configure Nagios Server: On your Nagios server, define a service that uses the check_nrpe command to request the execution of check_memory on the remote host. An example service definition might look like this:
define service{ use generic-service host_name remote_host_name service_description Memory Usage check_command check_nrpe!check_memory }
Replace remote_host_name with the name of the host as defined in your Nagios configuration.
Conclusion
Effective memory monitoring is a cornerstone of maintaining system performance and stability. By leveraging this Bash script, system administrators can proactively manage memory resources, ensuring that servers remain healthy and responsive. The script’s integration with Nagios enhances its utility, providing a robust solution for memory monitoring in any IT infrastructure.
As we continue to rely on complex systems to support our applications and services, tools like this Bash script become invaluable in our monitoring toolkit. Its simplicity, coupled with the power of Nagios, offers a straightforward yet effective approach to monitoring memory usage, helping to prevent potential system issues before they arise.