Docker volumes are an essential part of the containerized architecture, often used to persist data across container lifecycles. To safeguard against data loss, it’s vital to back up these volumes regularly. This article provides a shell script to automate the process of daily backups of Docker volumes, uploading them to AWS S3, and cleaning up old backups.
Prerequisites
- Docker installed and running.
- AWS Command Line Interface (CLI) installed and configured with appropriate permissions.
- The `jq` tool installed to process JSON content (often used to parse Docker command outputs).
Script Overview
The script will do the following:
- Loop over each Docker volume.
- Create a backup of the volume using docker cp.
- Compress the backup.
- Upload the compressed backup to an S3 bucket.
- Remove local backups older than 30 days.
- Remove S3 backups older than 30 days.
The Script
#!/bin/bash
set -euo pipefail
# Set variables
BACKUP_DIR="/path/to/backup/dir"
S3_BUCKET="s3://your-bucket-name"
DAYS_TO_KEEP=30
LOG_FILE="/var/log/backup_volumes.log"
TIMESTAMP=$(date +%Y%m%d%H%M%S)
# Ensure backup directory exists
mkdir -p "${BACKUP_DIR}"
# Function to log messages
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "${LOG_FILE}"
}
# Function to clean up temporary container
cleanup_container() {
local container_id=$1
if [[ -n "{container_id}" ]]; then
docker rm -v "{container_id}" >/dev/null 2>&1 || log "Warning: Failed to remove container ${container_id}"
fi
}
# Function to validate S3 bucket existence
check_s3_bucket() {
if ! aws s3 ls "${S3_BUCKET}" >/dev/null 2>&1; then
log "Error: S3 bucket ${S3_BUCKET} does not exist or is inaccessible"
exit 1
fi
}
# Validate dependencies
command -v docker >/dev/null 2>&1 || { log "Error: Docker is not installed"; exit 1; }
command -v aws >/dev/null 2>&1 || { log "Error: AWS CLI is not installed"; exit 1; }
command -v tar >/dev/null 2>&1 || { log "Error: tar is not installed"; exit 1; }
# Check S3 bucket
check_s3_bucket
log "Starting backup of Docker volumes"
# Process each Docker volume
while IFS= read -r volume; do
[[ -z "${volume}" ]] && continue # Skip empty volume names
log "Backing up volume: {volume}"
backup_name="${volume}_${TIMESTAMP}.tar.gz"
temp_dir="${BACKUP_DIR}/${volume}_${TIMESTAMP}"
container_id=""
# Create temporary directory
mkdir -p "${temp_dir}"
# Create a backup of the volume
trap 'cleanup_container "${container_id}"; rm -rf "${temp_dir}"' ERR
container_id=$(docker run -d -v "${volume}:/volume" --name "backup_${volume}_${TIMESTAMP}" busybox true)
# Use tar directly to avoid copying to intermediate directory
docker run --rm -v "${volume}:/volume" -v "${BACKUP_DIR}:/backup" busybox \
tar -czf "/backup/${backup_name}" -C /volume . || {
log "Error: Failed to create backup for volume ${volume}"
cleanup_container "${container_id}"
rm -rf "${temp_dir}"
continue
}
cleanup_container "${container_id}"
# Upload to S3
if aws s3 cp "${BACKUP_DIR}/${backup_name}" "${S3_BUCKET}/${backup_name}" >/dev/null 2>&1; then
log "Successfully uploaded ${backup_name} to S3"
rm -f "${BACKUP_DIR}/${backup_name}" # Remove local backup after successful upload
else
log "Error: Failed to upload ${backup_name} to S3"
rm -rf "${temp_dir}"
continue
fi
rm -rf "${temp_dir}"
done < <(docker volume ls -q | sort -u)
# Remove local backups older than DAYS_TO_KEEP
log "Cleaning up local backups older than ${DAYS_TO_KEEP} days"
find "${BACKUP_DIR}" -type f -name "*.tar.gz" -mtime +${DAYS_TO_KEEP} -delete -exec log "Deleted local backup: {}" \;
# Remove S3 backups older than DAYS_TO_KEEP
log "Cleaning up S3 backups older than ${DAYS_TO_KEEP} days"
older_than_date=$(date -d "-${DAYS_TO_KEEP} days" +%Y%m%d)
aws s3api list-objects-v2 --bucket "${S3_BUCKET#*://}" --query 'Contents[].{Key:Key,LastModified:LastModified}' --output json | \
jq -r '.[] | select(.Key | endswith(".tar.gz")) | .Key + " " + (.LastModified | sub("\\.[0-9]+Z$"; "") | strptime("%Y-%m-%dT%H:%M:%S") | strftime("%Y%m%d"))' | \
while read -r key date; do
if [[ "${date}" -lt "${older_than_date}" ]]; then
aws s3 rm "${S3_BUCKET}/${key}" >/dev/null 2>&1 && log "Deleted S3 backup: ${key}" || log "Error: Failed to delete S3 backup: ${key}"
fi
done
log "Backup process completed"
Execution
Save the script to a file, for example `docker_vol_backup.sh`.
- Make the script executable:
chmod +x docker_vol_backup.sh - Schedule the script to run daily using cron:
0 2 * * * /path/to/docker_vol_backup.sh >> /path/to/logfile.log 2>&1
This cron configuration will run the script every day at 2 AM and log the output to a specified logfile.
Conclusion
Automating the backup process of Docker volumes ensures data safety and minimizes human intervention. By leveraging AWS S3, data can be stored in a scalable, secure, and accessible environment. Periodically cleaning up old backups both locally and in S3 helps manage storage costs and prevent unnecessary clutter. Always ensure you test backup and restore procedures in a safe environment before relying on them for production data.
1 Comment
It should be noted that by default, AWS does not delete these files permanently, but only marks them as deleted. They still take up space and cost money, which accumulates with time. You have to set up rules to permanently delete those files within S3. Of course this would be a no brainer for someone proficient with S3, but someone just looking for an easy way to backup data (like me), might stumble upon why the costs go up every month.