5月30日 00:10

What are the common text processing tools in Shell scripts? How to use grep, sed, awk, and cut?

Common text processing tools in Shell scripts include grep, sed, awk, and cut.

Basic Usage

bash
# Search for text in file grep "pattern" file.txt # Search multiple files grep "pattern" file1.txt file2.txt # Recursive search in directory grep -r "pattern" /path/to/directory # Case insensitive search grep -i "pattern" file.txt # Show line numbers grep -n "pattern" file.txt # Invert match (exclude) grep -v "pattern" file.txt # Show only matching filenames grep -l "pattern" *.txt # Count matching lines grep -c "pattern" file.txt

Regular Expressions

bash
# Match start of line grep "^start" file.txt # Match end of line grep "end$" file.txt # Match digits grep "[0-9]" file.txt # Match specific count grep "a\{3\}" file.txt # Match 3 a's # Use extended regular expressions grep -E "pattern1|pattern2" file.txt

Practical Applications

bash
# Find process ps aux | grep "nginx" # Find errors in logs grep "ERROR" /var/log/syslog # Find files containing specific content grep -r "TODO" ./src # Count code lines grep -c "^" *.py

sed - Stream Editor

Basic Usage

bash
# Replace text sed 's/old/new/' file.txt # Global replacement sed 's/old/new/g' file.txt # Delete lines sed '3d' file.txt # Delete line 3 sed '/pattern/d' file.txt # Delete matching lines # Print specific lines sed -n '5p' file.txt # Print line 5 sed -n '1,5p' file.txt # Print lines 1-5 # Insert and append sed '2i\new line' file.txt # Insert before line 2 sed '2a\new line' file.txt # Append after line 2

Advanced Usage

bash
# Use regular expressions sed 's/[0-9]\+//g' file.txt # Multiple replacements sed -e 's/old1/new1/g' -e 's/old2/new2/g' file.txt # In-place editing (modify original file) sed -i 's/old/new/g' file.txt # Edit with backup sed -i.bak 's/old/new/g' file.txt # Use variables var="pattern" sed "s/$var/replacement/g" file.txt

Practical Applications

bash
# Replace values in config file sed -i 's/port=8080/port=9090/' config.ini # Delete comment lines sed '/^#/d' file.txt # Delete empty lines sed '/^$/d' file.txt # Format output sed 's/\s\+/ /g' file.txt

awk - Text Processing Tool

Basic Usage

bash
# Print specific columns awk '{print $1}' file.txt # Print multiple columns awk '{print $1, $3}' file.txt # Specify delimiter awk -F: '{print $1}' /etc/passwd # Print line numbers awk '{print NR, $0}' file.txt # Conditional printing awk '$3 > 100 {print $0}' file.txt

Built-in Variables

bash
NR # Current record number (line number) NF # Number of fields in current record $0 # Complete record $1, $2 # 1st, 2nd fields FS # Field separator (default space) OFS # Output field separator RS # Record separator (default newline) ORS # Output record separator

Patterns and Actions

bash
# Pattern matching awk '/pattern/ {print $0}' file.txt # BEGIN and END blocks awk 'BEGIN {print "Start"} {print $0} END {print "End"}' file.txt # Calculate sum awk '{sum += $1} END {print sum}' file.txt # Calculate average awk '{sum += $1; count++} END {print sum/count}' file.txt

Practical Applications

bash
# Calculate total file size ls -l | awk '{sum += $5} END {print sum}' # Find maximum value awk '{if ($1 > max) max = $1} END {print max}' file.txt # Format output awk '{printf "%-10s %10s\n", $1, $2}' file.txt # Process CSV file awk -F, '{print $1, $3}' data.csv

cut - Text Cutting Tool

Basic Usage

bash
# Cut by characters cut -c 1-5 file.txt # Extract characters 1-5 cut -c 1,5,10 file.txt # Extract characters 1, 5, 10 # Cut by bytes cut -b 1-10 file.txt # Cut by fields cut -d: -f1 /etc/passwd # Extract 1st field cut -d: -f1,3 /etc/passwd # Extract 1st and 3rd fields

Practical Applications

bash
# Extract usernames cut -d: -f1 /etc/passwd # Extract IP address ifconfig | grep "inet " | cut -d: -f2 | cut -d' ' -f1 # Extract file extension echo "file.txt" | cut -d. -f2

Combined Usage Examples

Log Analysis

bash
# Count errors grep "ERROR" /var/log/app.log | wc -l # Find logs for specific time period sed -n '/2024-01-01 10:00/,/2024-01-01 11:00/p' /var/log/app.log # Extract IP addresses grep "ERROR" /var/log/app.log | awk '{print $5}' | cut -d: -f2 # Count error types grep "ERROR" /var/log/app.log | awk '{print $6}' | sort | uniq -c

Text Processing

bash
# Delete empty lines and comments sed '/^$/d; /^#/d' file.txt # Replace multiple spaces with single space sed 's/\s\+/ /g' file.txt # Extract specific column and deduplicate awk '{print $1}' file.txt | sort -u # Calculate average awk '{sum += $1} END {print sum/NR}' file.txt

System Administration

bash
# Find processes with highest CPU usage ps aux | sort -rk 3 | head -n 5 # Find processes with highest memory usage ps aux | sort -rk 4 | head -n 5 # Count processes per user ps aux | awk '{print $1}' | sort | uniq -c # Find process using specific port lsof -i :8080 | awk '{print $2}' | tail -n +2

Best Practices

  1. Combine tools with pipes: grep | awk | sort | uniq
  2. Prefer grep for searching: Fastest for simple searches
  3. Use sed for replacement: Preferred tool for text replacement
  4. Use awk for column data: Best choice for structured text
  5. Use cut for fixed positions: Simple text cutting tasks
  6. Be aware of regex syntax: grep and sed have slightly different regex
  7. Test commands: Test commands before processing important files
标签:Shell