awk, sed, and grep for Linux Sysadmins: Practical Guide with Real Examples (2026)
Three tools have survived decades of Unix history and remain central to daily sysadmin work: grep, sed, and awk. Every Linux administrator uses them constantly — to dig through logs, edit configuration files in-place, parse structured output, and build quick analysis pipelines. Despite their age, no modern tool replaces them entirely; they ship on every Linux system and work over SSH without any installation overhead.
This guide covers each tool with real sysadmin examples, then shows how to chain them into the kind of production pipelines that actually appear on the command line.
grep: Finding What You Need
grep searches files or stdin for lines matching a pattern. Its name comes from the ed command g/re/p (globally match a regular expression and print). The basic usage is grep PATTERN FILE, but the flags are where it becomes powerful.
Essential grep Flags
# Case-insensitive search
grep -i "error" /var/log/syslog
# Recursive search through a directory tree
grep -r "database_url" /etc/myapp/
# List only filenames that match (not the lines)
grep -l "PermitRootLogin" /etc/ssh/
# Count matching lines instead of printing them
grep -c "FAILED" /var/log/auth.log
# Show 3 lines of context: Before, After, or both
grep -B 3 "segfault" /var/log/kern.log
grep -A 5 "OOM killer" /var/log/syslog
grep -C 2 "panic" /var/log/kern.log
# Extended regular expressions (no need to escape + ? | etc.)
grep -E "error|warn|crit" /var/log/nginx/error.log
# Limit search to specific file types in a recursive search
grep -r --include="*.conf" "Listen 80" /etc/
Find Failed SSH Login Attempts
This is a common on-call task. Brute-force attacks appear in /var/log/auth.log (Debian/Ubuntu) or /var/log/secure (RHEL/Fedora).
# Show all failed password attempts
grep "Failed password" /var/log/auth.log
# Find which IPs are attacking most aggressively (pipeline preview)
grep "Failed password" /var/log/auth.log | grep -oE "[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+" | sort | uniq -c | sort -rn | head -20
# Find invalid users being tried
grep "Invalid user" /var/log/auth.log | awk '{print $8}' | sort | uniq -c | sort -rn
# Check if a specific IP ever succeeded after failing
grep "192.168.1.100" /var/log/auth.log | grep -v Failed
grep with Regex
The -E flag unlocks full extended regex. Useful patterns for sysadmins:
# Match lines with HTTP 4xx or 5xx status codes
grep -E " [45][0-9]{2} " /var/log/nginx/access.log
# Match IPv4 addresses
grep -oE "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" /var/log/nginx/access.log
# Lines that do NOT match a pattern
grep -v "127.0.0.1" /var/log/nginx/access.log
# Combine -v and -E to exclude noisy lines
grep -vE "healthcheck|monitoring-bot|favicon" /var/log/nginx/access.log
sed: Stream Editing Files
sed processes text line by line, applying editing commands. Its most important use for sysadmins is in-place substitution in config files — changing a value without opening an editor.
Substitution: s/old/new/
# Replace first occurrence on each line
sed 's/http:/https:/' config.txt
# Replace ALL occurrences on each line (global flag)
sed 's/foo/bar/g' config.txt
# Case-insensitive replacement
sed 's/error/ERROR/gi' log.txt
# Edit a file in-place (modifies the file, no output)
sed -i 's/Listen 80/Listen 8080/' /etc/apache2/ports.conf
# In-place edit with a backup (backup gets .bak extension)
sed -i.bak 's/max_connections = 100/max_connections = 500/' /etc/postgresql/15/main/postgresql.conf
Edit Config Values In-Place
A common pattern: update a value in a config file that uses key = value or key=value format.
# Update a specific parameter in postgresql.conf
sed -i 's/^shared_buffers = .*/shared_buffers = 2GB/' /etc/postgresql/15/main/postgresql.conf
# Enable a commented-out setting
sed -i 's/^#PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config
# Change a value only on lines matching a specific pattern
sed -i '/^worker_processes/s/auto/4/' /etc/nginx/nginx.conf
# Replace across multiple files
find /etc/nginx/sites-enabled/ -name "*.conf" -exec sed -i 's/server_name example.com/server_name mysite.com/g' {} \;
Delete Lines and Print Ranges
# Delete blank lines
sed '/^$/d' file.txt
# Delete lines matching a pattern
sed '/^#/d' /etc/nginx/nginx.conf # Remove comment lines
# Print only lines 10 through 20
sed -n '10,20p' /var/log/syslog
# Print lines matching a pattern (like grep, but you have sed's other powers too)
sed -n '/ERROR/p' /var/log/app.log
# Delete lines between two patterns (inclusive)
sed '/^BEGIN/,/^END/d' file.txt
awk: Structured Text Processing
awk is a full programming language designed for column-oriented text. It splits each line into fields ($1, $2, ... $NF) separated by whitespace by default (or a custom delimiter with -F). It runs a BEGIN block before processing, a pattern-action block per line, and an END block after.
Field Processing Basics
# Print the first and fifth fields of ps aux output
ps aux | awk '{print $1, $5}'
# Use a custom field separator (colon for /etc/passwd)
awk -F: '{print $1, $3}' /etc/passwd # username and UID
# NR is the current line number
awk 'NR==1 {print "Header:", $0}' file.txt
awk 'NR>1 {print $0}' file.txt # Skip header line
Filter Rows Based on Field Values
# Show processes using more than 50% CPU (field 3 in ps aux)
ps aux | awk '$3 > 50 {print $1, $2, $3, $11}'
# Show processes owned by root using more than 100MB memory
ps aux | awk '$1 == "root" && $6 > 100000 {print $2, $6, $11}'
# Show nginx access log lines where response time (last field) > 2 seconds
awk '$NF > 2.0' /var/log/nginx/access.log
Parse nginx Access Logs
The default nginx combined log format looks like:
127.0.0.1 - frank [10/Oct/2025:13:55:36 -0700] "GET /index.html HTTP/1.1" 200 2326
Fields: $1=IP, $4=timestamp, $7=request_path, $9=status_code, $10=bytes.
# Count requests per status code
awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn
# Show only requests that returned 500
awk '$9 == 500 {print $1, $7, $9}' /var/log/nginx/access.log
# Calculate total bytes served
awk '{sum += $10} END {print "Total bytes:", sum}' /var/log/nginx/access.log
# Average bytes per request
awk '{sum += $10; count++} END {printf "Average: %.0f bytes\n", sum/count}' /var/log/nginx/access.log
# Top 10 most requested URLs
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10
# Top 10 client IPs by request count
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10
Sum Columns and Use the END Block
The END block runs after all lines have been processed — perfect for totals and summaries.
# Sum the size of all files listed by ls -l
ls -l | awk '{sum += $5} END {print "Total size:", sum, "bytes"}'
# Count lines matching a condition, report at the end
awk '$9 >= 500 {errors++} END {print "5xx errors:", errors}' /var/log/nginx/access.log
# Print a formatted summary table
awk '{codes[$9]++} END {for (code in codes) print code, codes[code]}' /var/log/nginx/access.log | sort
Real Sysadmin Pipelines
The real power comes from chaining these tools together. The classic pattern is:
grep | awk | sort | uniq -c | sort -rn
This pattern: filter relevant lines, extract the field you care about, sort to group duplicates, count them, then sort by count descending.
Log Analysis: Top Attack Sources by Country-Readiness
# Most common failed SSH usernames from outside your network
grep "Failed password" /var/log/auth.log \
| awk '{print $9}' \
| sort | uniq -c | sort -rn \
| head -20
Find High-CPU Processes and Alert
# Print a warning for any process over 80% CPU
ps aux | awk 'NR>1 && $3 > 80 {printf "HIGH CPU: %s (PID %s) %.1f%%\n", $11, $2, $3}'
Detect Repeated nginx 404s for the Same Path
A spike of 404s on a specific path often indicates a scanner or misconfigured redirect.
awk '$9 == 404 {print $7}' /var/log/nginx/access.log \
| sed 's/?.*$//' \
| sort | uniq -c | sort -rn \
| head -20
The sed step strips query strings so /search?q=foo and /search?q=bar are counted together as /search.
Extract and Summarize Disk Usage
# Find directories over 1GB and format output cleanly
du -sh /var/log/* 2>/dev/null \
| awk '$1 ~ /G/ {print}' \
| sort -rh \
| head -10
Build a Quick Access Report from Apache/nginx Logs
# Requests per hour for today, with status breakdown
grep "$(date +'%d/%b/%Y')" /var/log/nginx/access.log \
| awk '{
split($4, t, ":");
hour = t[2];
hours[hour]++;
if ($9 >= 500) errors[hour]++;
}
END {
for (h in hours)
printf "Hour %s: %d requests, %d errors\n", h, hours[h], errors[h]+0
}' \
| sort
Replace Multiple Config Values in One Pass
# Multiple sed substitutions with -e
sed -i \
-e 's/^max_connections = .*/max_connections = 200/' \
-e 's/^log_level = .*/log_level = warn/' \
-e 's/^timeout = .*/timeout = 30/' \
/etc/myapp/config.ini
Quick Reference
| Tool | Primary Use | Key Flag |
|---|---|---|
grep -i |
Case-insensitive search | -i |
grep -r |
Recursive search | -r --include="*.log" |
grep -E |
Extended regex | -E "pat1\|pat2" |
grep -C 3 |
Context around match | -A, -B, -C |
sed -i |
In-place file edit | -i.bak for backup |
sed 's/a/b/g' |
Global substitution | g flag |
awk -F: |
Custom delimiter | -F"," for CSV |
awk '$3>50' |
Filter on field value | numeric comparisons |
awk 'END{}' |
Summarize after all lines | accumulators |
These three tools cover the vast majority of text-processing tasks a sysadmin encounters. Master them and you will be able to answer most log analysis questions in seconds — without installing anything, even over a bare SSH connection.