如何使用 SSH 进行自动化运维?有哪些常用的自动化工具和脚本?
SSH 自动化是现代 DevOps 和运维自动化的重要组成部分。通过 SSH 自动化,可以实现批量服务器管理、自动化部署、监控告警等功能。SSH 自动化工具1. AnsibleAnsible 是最流行的 SSH 自动化工具之一,无需在目标服务器安装代理。安装:# Ubuntu/Debiansudo apt-get install ansible# CentOS/RHELsudo yum install ansible# macOSbrew install ansible# pippip install ansible配置:# /etc/ansible/hosts[webservers]web1.example.comweb2.example.comweb3.example.com[dbservers]db1.example.comdb2.example.com[all:vars]ansible_user=adminansible_ssh_private_key_file=~/.ssh/ansible_key使用示例:# 执行命令ansible webservers -m shell -a "uptime"# 复制文件ansible webservers -m copy -a "src=/tmp/file dest=/tmp/"# 安装软件包ansible all -m apt -a "name=nginx state=present"# 执行 Playbookansible-playbook deploy.ymlPlaybook 示例:# deploy.yml---- hosts: webservers become: yes tasks: - name: Update apt cache apt: update_cache: yes - name: Install nginx apt: name: nginx state: present - name: Start nginx service service: name: nginx state: started enabled: yes - name: Copy configuration file copy: src: nginx.conf dest: /etc/nginx/nginx.conf notify: restart nginx handlers: - name: restart nginx service: name: nginx state: restarted2. FabricFabric 是一个 Python 库,用于简化 SSH 自动化任务。安装:pip install fabric使用示例:# fabfile.pyfrom fabric import Connectionfrom fabric import task@taskdef deploy(c): """Deploy application to server""" with Connection('user@server') as conn: # Update code conn.run('git pull origin main') # Install dependencies conn.run('pip install -r requirements.txt') # Restart service conn.sudo('systemctl restart myapp')@taskdef update(c, server): """Update multiple servers""" with Connection(f'user@{server}') as conn: conn.run('apt-get update && apt-get upgrade -y')@taskdef backup(c): """Backup database""" with Connection('user@server') as conn: conn.run('mysqldump -u root -p database > backup.sql') conn.get('backup.sql', './backups/')运行:# 执行单个任务fab deploy# 执行带参数的任务fab update:server=web1.example.com# 执行多个任务fab deploy backup3. SSH 批处理脚本使用 Shell 脚本实现简单的 SSH 批处理。示例脚本:#!/bin/bash# batch_ssh.shSERVERS=( "user@server1.example.com" "user@server2.example.com" "user@server3.example.com")COMMAND="uptime"for server in "${SERVERS[@]}"; do echo "=== $server ===" ssh "$server" "$COMMAND" echo ""done高级脚本:#!/bin/bash# advanced_batch_ssh.sh# 配置SERVERS_FILE="servers.txt"SSH_KEY="~/.ssh/batch_key"SSH_USER="admin"TIMEOUT=10# 函数:执行命令execute_command() { local server=$1 local command=$2 echo "Executing on $server: $command" timeout $TIMEOUT ssh -i $SSH_KEY -o StrictHostKeyChecking=no $SSH_USER@$server "$command" if [ $? -eq 0 ]; then echo "Success" else echo "Failed" fi}# 函数:并行执行parallel_execute() { local command=$1 while read -r server; do execute_command "$server" "$command" & done < "$SERVERS_FILE" wait}# 主程序case "$1" in "update") parallel_execute "apt-get update && apt-get upgrade -y" ;; "restart") parallel_execute "systemctl restart nginx" ;; "status") parallel_execute "systemctl status nginx" ;; *) echo "Usage: $0 {update|restart|status}" exit 1 ;;esac4. PexpectPexpect 是一个 Python 模块,用于自动化交互式程序。安装:pip install pexpect使用示例:import pexpectdef ssh_interactive(host, user, password, command): """Automate interactive SSH session""" ssh = pexpect.spawn(f'ssh {user}@{host}') # 处理密码提示 ssh.expect('password:') ssh.sendline(password) # 执行命令 ssh.expect('$') ssh.sendline(command) # 获取输出 ssh.expect('$') output = ssh.before.decode() print(output) ssh.close()# 使用ssh_interactive('server.example.com', 'user', 'password', 'ls -la')自动化场景场景1:批量部署#!/bin/bash# deploy.shAPP_DIR="/var/www/myapp"REPO="https://github.com/user/myapp.git"BRANCH="main"# 服务器列表SERVERS=( "web1.example.com" "web2.example.com" "web3.example.com")for server in "${SERVERS[@]}"; do echo "Deploying to $server..." ssh admin@$server << EOF cd $APP_DIR git pull origin $BRANCH npm install npm run build pm2 restart myappEOF echo "Deployment to $server completed"done场景2:批量监控#!/usr/bin/env python3# monitor.pyimport paramikoimport timeSERVERS = [ {'host': 'server1.example.com', 'user': 'admin'}, {'host': 'server2.example.com', 'user': 'admin'}, {'host': 'server3.example.com', 'user': 'admin'},]def check_server(server): """Check server status""" ssh = paramiko.SSHClient() ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) try: ssh.connect(server['host'], username=server['user']) # 检查 CPU stdin, stdout, stderr = ssh.exec_command('top -bn1 | grep "Cpu(s)"') cpu_usage = stdout.read().decode() # 检查内存 stdin, stdout, stderr = ssh.exec_command('free -m') memory = stdout.read().decode() # 检查磁盘 stdin, stdout, stderr = ssh.exec_command('df -h') disk = stdout.read().decode() print(f"=== {server['host']} ===") print(f"CPU: {cpu_usage.strip()}") print(f"Memory: {memory.strip()}") print(f"Disk: {disk.strip()}") print("") except Exception as e: print(f"Error connecting to {server['host']}: {e}") finally: ssh.close()# 主程序while True: for server in SERVERS: check_server(server) time.sleep(300) # 每5分钟检查一次场景3:自动备份#!/bin/bash# backup.shBACKUP_DIR="/backups"DATE=$(date +%Y%m%d)RETENTION_DAYS=7SERVERS=( "db1.example.com" "db2.example.com")for server in "${SERVERS[@]}"; do echo "Backing up $server..." # 创建备份目录 mkdir -p "$BACKUP_DIR/$server" # 备份数据库 ssh admin@$server "mysqldump -u root -p'password' database | gzip" > \ "$BACKUP_DIR/$server/database_$DATE.sql.gz" # 备份文件 rsync -avz --delete admin@$server:/var/www/ "$BACKUP_DIR/$server/files/" echo "Backup of $server completed"done# 清理旧备份find $BACKUP_DIR -name "*.sql.gz" -mtime +$RETENTION_DAYS -delete最佳实践1. 安全性# 使用密钥认证,禁用密码认证ssh-keygen -t ed25519 -f ~/.ssh/automation_keyssh-copy-id -i ~/.ssh/automation_key.pub user@server# 限制密钥使用# ~/.ssh/authorized_keyscommand="/usr/local/bin/automation-wrapper.sh",no-port-forwarding,no-X11-forwarding ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAI...2. 错误处理#!/bin/bash# 错误处理示例set -e # 遇到错误立即退出set -u # 使用未定义变量时报错set -o pipefail # 管道命令失败时退出# 函数:错误处理error_exit() { echo "Error: $1" >&2 exit 1}# 使用ssh user@server "command" || error_exit "SSH command failed"3. 日志记录#!/bin/bash# 日志记录示例LOG_FILE="/var/log/automation.log"log() { local message=$1 echo "[$(date '+%Y-%m-%d %H:%M:%S')] $message" | tee -a $LOG_FILE}# 使用log "Starting deployment"ssh user@server "command"log "Deployment completed"4. 配置管理# 使用配置文件# config.ini[general]user=adminkey=~/.ssh/automation_keytimeout=30[servers]web1=web1.example.comweb2=web2.example.comdb1=db1.example.com5. 幂等性确保自动化任务可以重复执行而不会产生副作用。#!/bin/bash# 幂等性示例# 检查服务是否已安装if ! systemctl is-active --quiet nginx; then apt-get install -y nginxfi# 检查配置是否已更新if ! diff -q nginx.conf /etc/nginx/nginx.conf > /dev/null; then cp nginx.conf /etc/nginx/nginx.conf systemctl reload nginxfi监控和告警1. 自动化监控#!/bin/bash# 监控脚本ALERT_EMAIL="admin@example.com"ALERT_SUBJECT="SSH Automation Alert"check_service() { local server=$1 local service=$2 if ! ssh admin@$server "systemctl is-active --quiet $service"; then send_alert "$service is down on $server" fi}send_alert() { local message=$1 echo "$message" | mail -s "$ALERT_SUBJECT" $ALERT_EMAIL}# 主程序for server in web1 web2 web3; do check_service "$server.example.com" nginx check_service "$server.example.com" mysqldone2. 集成监控工具# Prometheus + Grafana# prometheus.ymlscrape_configs: - job_name: 'ssh_automation' static_configs: - targets: ['localhost:9090']SSH 自动化可以大大提高运维效率,但需要注意安全性、可靠性和可维护性。