SSH 自动化是现代 DevOps 和运维自动化的重要组成部分。通过 SSH 自动化,可以实现批量服务器管理、自动化部署、监控告警等功能。
SSH 自动化工具
1. Ansible
Ansible 是最流行的 SSH 自动化工具之一,无需在目标服务器安装代理。
安装:
bash# Ubuntu/Debian sudo apt-get install ansible # CentOS/RHEL sudo yum install ansible # macOS brew install ansible # pip pip install ansible
配置:
bash# /etc/ansible/hosts [webservers] web1.example.com web2.example.com web3.example.com [dbservers] db1.example.com db2.example.com [all:vars] ansible_user=admin ansible_ssh_private_key_file=~/.ssh/ansible_key
使用示例:
bash# 执行命令 ansible webservers -m shell -a "uptime" # 复制文件 ansible webservers -m copy -a "src=/tmp/file dest=/tmp/" # 安装软件包 ansible all -m apt -a "name=nginx state=present" # 执行 Playbook ansible-playbook deploy.yml
Playbook 示例:
yaml# deploy.yml --- - hosts: webservers become: yes tasks: - name: Update apt cache apt: update_cache: yes - name: Install nginx apt: name: nginx state: present - name: Start nginx service service: name: nginx state: started enabled: yes - name: Copy configuration file copy: src: nginx.conf dest: /etc/nginx/nginx.conf notify: restart nginx handlers: - name: restart nginx service: name: nginx state: restarted
2. Fabric
Fabric 是一个 Python 库,用于简化 SSH 自动化任务。
安装:
bashpip install fabric
使用示例:
python# fabfile.py from fabric import Connection from fabric import task @task def deploy(c): """Deploy application to server""" with Connection('user@server') as conn: # Update code conn.run('git pull origin main') # Install dependencies conn.run('pip install -r requirements.txt') # Restart service conn.sudo('systemctl restart myapp') @task def update(c, server): """Update multiple servers""" with Connection(f'user@{server}') as conn: conn.run('apt-get update && apt-get upgrade -y') @task def backup(c): """Backup database""" with Connection('user@server') as conn: conn.run('mysqldump -u root -p database > backup.sql') conn.get('backup.sql', './backups/')
运行:
bash# 执行单个任务 fab deploy # 执行带参数的任务 fab update:server=web1.example.com # 执行多个任务 fab deploy backup
3. SSH 批处理脚本
使用 Shell 脚本实现简单的 SSH 批处理。
示例脚本:
bash#!/bin/bash # batch_ssh.sh SERVERS=( "user@server1.example.com" "user@server2.example.com" "user@server3.example.com" ) COMMAND="uptime" for server in "${SERVERS[@]}"; do echo "=== $server ===" ssh "$server" "$COMMAND" echo "" done
高级脚本:
bash#!/bin/bash # advanced_batch_ssh.sh # 配置 SERVERS_FILE="servers.txt" SSH_KEY="~/.ssh/batch_key" SSH_USER="admin" TIMEOUT=10 # 函数:执行命令 execute_command() { local server=$1 local command=$2 echo "Executing on $server: $command" timeout $TIMEOUT ssh -i $SSH_KEY -o StrictHostKeyChecking=no $SSH_USER@$server "$command" if [ $? -eq 0 ]; then echo "Success" else echo "Failed" fi } # 函数:并行执行 parallel_execute() { local command=$1 while read -r server; do execute_command "$server" "$command" & done < "$SERVERS_FILE" wait } # 主程序 case "$1" in "update") parallel_execute "apt-get update && apt-get upgrade -y" ;; "restart") parallel_execute "systemctl restart nginx" ;; "status") parallel_execute "systemctl status nginx" ;; *) echo "Usage: $0 {update|restart|status}" exit 1 ;; esac
4. Pexpect
Pexpect 是一个 Python 模块,用于自动化交互式程序。
安装:
bashpip install pexpect
使用示例:
pythonimport pexpect def ssh_interactive(host, user, password, command): """Automate interactive SSH session""" ssh = pexpect.spawn(f'ssh {user}@{host}') # 处理密码提示 ssh.expect('password:') ssh.sendline(password) # 执行命令 ssh.expect('$') ssh.sendline(command) # 获取输出 ssh.expect('$') output = ssh.before.decode() print(output) ssh.close() # 使用 ssh_interactive('server.example.com', 'user', 'password', 'ls -la')
自动化场景
场景1:批量部署
bash#!/bin/bash # deploy.sh APP_DIR="/var/www/myapp" REPO="https://github.com/user/myapp.git" BRANCH="main" # 服务器列表 SERVERS=( "web1.example.com" "web2.example.com" "web3.example.com" ) for server in "${SERVERS[@]}"; do echo "Deploying to $server..." ssh admin@$server << EOF cd $APP_DIR git pull origin $BRANCH npm install npm run build pm2 restart myapp EOF echo "Deployment to $server completed" done
场景2:批量监控
python#!/usr/bin/env python3 # monitor.py import paramiko import time SERVERS = [ {'host': 'server1.example.com', 'user': 'admin'}, {'host': 'server2.example.com', 'user': 'admin'}, {'host': 'server3.example.com', 'user': 'admin'}, ] def check_server(server): """Check server status""" ssh = paramiko.SSHClient() ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) try: ssh.connect(server['host'], username=server['user']) # 检查 CPU stdin, stdout, stderr = ssh.exec_command('top -bn1 | grep "Cpu(s)"') cpu_usage = stdout.read().decode() # 检查内存 stdin, stdout, stderr = ssh.exec_command('free -m') memory = stdout.read().decode() # 检查磁盘 stdin, stdout, stderr = ssh.exec_command('df -h') disk = stdout.read().decode() print(f"=== {server['host']} ===") print(f"CPU: {cpu_usage.strip()}") print(f"Memory: {memory.strip()}") print(f"Disk: {disk.strip()}") print("") except Exception as e: print(f"Error connecting to {server['host']}: {e}") finally: ssh.close() # 主程序 while True: for server in SERVERS: check_server(server) time.sleep(300) # 每5分钟检查一次
场景3:自动备份
bash#!/bin/bash # backup.sh BACKUP_DIR="/backups" DATE=$(date +%Y%m%d) RETENTION_DAYS=7 SERVERS=( "db1.example.com" "db2.example.com" ) for server in "${SERVERS[@]}"; do echo "Backing up $server..." # 创建备份目录 mkdir -p "$BACKUP_DIR/$server" # 备份数据库 ssh admin@$server "mysqldump -u root -p'password' database | gzip" > \ "$BACKUP_DIR/$server/database_$DATE.sql.gz" # 备份文件 rsync -avz --delete admin@$server:/var/www/ "$BACKUP_DIR/$server/files/" echo "Backup of $server completed" done # 清理旧备份 find $BACKUP_DIR -name "*.sql.gz" -mtime +$RETENTION_DAYS -delete
最佳实践
1. 安全性
bash# 使用密钥认证,禁用密码认证 ssh-keygen -t ed25519 -f ~/.ssh/automation_key ssh-copy-id -i ~/.ssh/automation_key.pub user@server # 限制密钥使用 # ~/.ssh/authorized_keys command="/usr/local/bin/automation-wrapper.sh",no-port-forwarding,no-X11-forwarding ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAI...
2. 错误处理
bash#!/bin/bash # 错误处理示例 set -e # 遇到错误立即退出 set -u # 使用未定义变量时报错 set -o pipefail # 管道命令失败时退出 # 函数:错误处理 error_exit() { echo "Error: $1" >&2 exit 1 } # 使用 ssh user@server "command" || error_exit "SSH command failed"
3. 日志记录
bash#!/bin/bash # 日志记录示例 LOG_FILE="/var/log/automation.log" log() { local message=$1 echo "[$(date '+%Y-%m-%d %H:%M:%S')] $message" | tee -a $LOG_FILE } # 使用 log "Starting deployment" ssh user@server "command" log "Deployment completed"
4. 配置管理
bash# 使用配置文件 # config.ini [general] user=admin key=~/.ssh/automation_key timeout=30 [servers] web1=web1.example.com web2=web2.example.com db1=db1.example.com
5. 幂等性
确保自动化任务可以重复执行而不会产生副作用。
bash#!/bin/bash # 幂等性示例 # 检查服务是否已安装 if ! systemctl is-active --quiet nginx; then apt-get install -y nginx fi # 检查配置是否已更新 if ! diff -q nginx.conf /etc/nginx/nginx.conf > /dev/null; then cp nginx.conf /etc/nginx/nginx.conf systemctl reload nginx fi
监控和告警
1. 自动化监控
bash#!/bin/bash # 监控脚本 ALERT_EMAIL="admin@example.com" ALERT_SUBJECT="SSH Automation Alert" check_service() { local server=$1 local service=$2 if ! ssh admin@$server "systemctl is-active --quiet $service"; then send_alert "$service is down on $server" fi } send_alert() { local message=$1 echo "$message" | mail -s "$ALERT_SUBJECT" $ALERT_EMAIL } # 主程序 for server in web1 web2 web3; do check_service "$server.example.com" nginx check_service "$server.example.com" mysql done
2. 集成监控工具
yaml# Prometheus + Grafana # prometheus.yml scrape_configs: - job_name: 'ssh_automation' static_configs: - targets: ['localhost:9090']
SSH 自动化可以大大提高运维效率,但需要注意安全性、可靠性和可维护性。