Linux Process Management 2026: ps, top, htop, btop, kill, nice, and OOM Killer
Every Linux system administrator and DevOps engineer needs a solid command over process management. Whether you are diagnosing a runaway process consuming all CPU, hunting zombie processes, or tuning application priorities for a production workload, the toolset available on Linux in 2026 is richer than ever. This guide covers the full spectrum — from classic tools like ps and top, through the widely adopted htop, to the modern btop that is rapidly becoming the default choice for interactive monitoring. We also cover signals, priority scheduling, background job control, the /proc filesystem, and critical troubleshooting scenarios including zombie processes, the OOM Killer, D-state processes, and cgroup resource accounting.
Understanding Process States
Before reaching for any tool, it helps to understand what state a process can be in. The Linux kernel tracks each process with a state flag visible in tools like ps and top.
| State code | Name | Meaning |
|---|---|---|
R | Running | Actively executing on a CPU, or in the run queue waiting for a CPU slot |
S | Sleeping (interruptible) | Waiting for an event (I/O, timer, signal). Can be woken by a signal. This is the normal idle state for most processes. |
D | Sleeping (uninterruptible) | Waiting on hardware I/O or a kernel lock. Cannot be killed with SIGKILL. A process stuck in D state for a long time usually indicates an I/O or NFS problem. |
T | Stopped | Suspended by a signal (SIGSTOP) or by the debugger. Resumes on SIGCONT. |
Z | Zombie | The process has exited but its parent has not yet called wait() to collect the exit status. Takes no resources except a PID slot. |
Additional modifiers appear after the main letter: < (high priority), N (low priority), s (session leader), l (multi-threaded), + (foreground process group).
The ps Command
ps (process status) reads a snapshot of the process table from the /proc filesystem and prints it. It is available on every Linux system without any additional installation.
Common Invocations
# BSD-style: show all processes with user, CPU, and memory columns
ps aux
# UNIX-style: show all processes with PPID and full command line
ps -ef
The two formats overlap in coverage but differ in column layout. ps aux is the most commonly used form in practice.
Understanding the Output Columns
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.1 169400 13040 ? Ss Apr20 0:12 /sbin/init
www-data 1842 0.3 1.2 312448 98320 ? S 09:01 0:45 nginx: worker
| Column | Meaning |
|---|---|
USER | Effective user owning the process |
PID | Process ID — the kernel's unique numeric identifier |
PPID | Parent Process ID (visible with -ef) |
%CPU | CPU usage averaged since the process started |
%MEM | Resident set size as a percentage of total RAM |
VSZ | Virtual memory size in kilobytes (includes all mapped memory, often misleading) |
RSS | Resident Set Size — actual physical RAM in use, in kilobytes |
STAT | Process state (see table above) |
START | Time or date when the process started |
TIME | Cumulative CPU time consumed |
COMMAND | Command line that launched the process |
Filtering and Searching
# Find nginx processes using grep (note: grep itself also shows up — exclude with [n])
ps aux | grep '[n]ginx'
# Use pgrep for a cleaner result — returns just PIDs
pgrep nginx
# pgrep with -l shows the name, -a shows the full command line
pgrep -la python
# Find by exact command match
pgrep -f "python manage.py runserver"
pgrep reads /proc directly and is faster and safer to use in scripts than piping ps through grep.
Process Trees
Understanding parent-child relationships is essential when diagnosing spawning issues or cleaning up zombie processes.
# ASCII forest view with ps
ps auxf
# Dedicated tree tool — cleaner output, shows thread counts
pstree
# Show PIDs in pstree
pstree -p
# Show pstree for a specific user
pstree -u alice
Example ps auxf excerpt showing nginx master and workers:
root 1234 0.0 0.1 nginx: master process /usr/sbin/nginx
www-data 1235 0.0 0.5 \_ nginx: worker process
www-data 1236 0.0 0.5 \_ nginx: worker process
top — The Classic Interactive Monitor
top provides a continuously refreshed view of the system and its processes. It ships with virtually every Linux distribution as part of the procps package.
top
The header section shows uptime, number of users, load averages for 1, 5, and 15 minutes, overall CPU breakdown, and memory/swap usage. The process list below is sorted by CPU usage by default.
Load Average Explained
The three load average numbers (e.g., 0.72 1.15 0.89) represent the average number of processes that are runnable or in uninterruptible sleep over the last 1, 5, and 15 minutes. On a system with N CPU cores, a load average consistently above N indicates the system is overloaded. A load of 4.0 on a 4-core machine means all cores are fully busy; on an 8-core machine the same value means plenty of headroom remains.
Interactive Keys in top
| Key | Action |
|---|---|
1 | Toggle per-CPU breakdown in the header |
P | Sort by CPU usage (default) |
M | Sort by memory (RSS) |
T | Sort by cumulative CPU time |
k | Kill a process — prompts for PID then signal number |
r | Renice a process — prompts for PID then new nice value |
u | Filter by username |
f | Field management — add/remove columns |
c | Toggle full command line vs process name |
H | Toggle thread view |
q | Quit |
htop — The Friendlier Interactive Monitor
htop improves on top with a color-coded interface, mouse support, horizontal/vertical scrolling, and a more intuitive function-key menu bar.
Installation
# Debian / Ubuntu
sudo apt install htop
# RHEL / Fedora / Rocky Linux
sudo dnf install htop
# Alpine
sudo apk add htop
Navigating htop
When htop opens you see per-CPU bars across the top, memory and swap bars, uptime and load average, and the process list below. Use arrow keys to navigate the list.
Color coding (default theme):
- Green CPU bar: user-space processes
- Red CPU bar: kernel (system) time
- Blue CPU bar: low-priority (nice) processes
- Dark green memory: used memory
- Blue memory: buffer cache
- Yellow memory: shared memory / page cache
Key Function Keys
| Key | Action |
|---|---|
F1 | Help |
F2 | Setup — change colors, columns, display options |
F3 | Search — incremental process name search |
F4 | Filter — show only processes matching a string |
F5 | Tree view — toggle process tree |
F6 | Sort — choose sort column interactively |
F7 / F8 | Decrease / increase nice value of selected process |
F9 | Kill — opens signal picker for the selected process |
F10 | Quit |
Space | Tag a process (for batch kill/renice) |
U | Untag all |
You can also click column headers with the mouse to sort, and click process rows to select them.
btop — The Modern Choice for 2026
btop (and its Python predecessor bpytop) has become the go-to interactive monitor for many sysadmins in 2026. It offers smooth animated graphs, a full resource overview on a single screen, a responsive interface that works well in high-resolution terminals, and is actively maintained.
Why btop is Replacing htop
- Rich resource graphs: CPU, memory, disk I/O, and network throughput are all displayed as real-time graphs, not just numbers or static bars.
- Process tree built in: tree view is first-class, not a toggled afterthought.
- Network statistics: per-interface upload/download rates with graphs — no need to open a separate tool.
- Disk I/O: read/write speeds per device visible at a glance.
- Themeable: ships with many color themes; fully configurable.
- Low overhead: written in C++, negligible CPU cost.
Installation
# Ubuntu 22.04+ / Debian Bookworm+
sudo apt install btop
# RHEL 9 / Rocky 9 / AlmaLinux 9 (via EPEL)
sudo dnf install epel-release && sudo dnf install btop
# Fedora 36+
sudo dnf install btop
# macOS (for those who also manage macOS servers)
brew install btop
# From source (latest release)
git clone https://github.com/aristocratos/btop.git
cd btop && make && sudo make install
Using btop
Launch with:
btop
The default layout shows: - Top panel: CPU usage graph per core plus overall percentage - Middle left: memory and swap usage over time - Middle right: network interface graphs (upload/download) - Bottom left: disk I/O graphs - Bottom right: process list with tree toggle
Key bindings in btop:
| Key | Action |
|---|---|
hjkl / arrow keys | Navigate |
f | Filter/search processes |
e | Toggle process tree |
t | Sort by CPU time |
m | Sort by memory |
p | Sort by PID |
k | Kill selected process (opens signal picker) |
+ / - | Change update interval |
ESC | Close menu / go back |
q | Quit |
For a deeper look at Linux performance methodology, Brendan Gregg's work at brendangregg.com and his book Systems Performance are the definitive references.
Signals and Killing Processes
Signals are asynchronous notifications sent to processes. The kernel delivers them; the process can choose to handle, ignore, or be terminated by them (with two exceptions: SIGKILL and SIGSTOP cannot be caught or ignored).
Key Signals
| Signal | Number | Default action | Use case |
|---|---|---|---|
SIGHUP | 1 | Terminate (or reload if handled) | Reload config without restart (nginx, sshd) |
SIGINT | 2 | Terminate | Keyboard Ctrl+C |
SIGQUIT | 3 | Core dump | Ctrl+\ — debugging |
SIGTERM | 15 | Terminate (graceful) | Polite shutdown — the default kill signal |
SIGKILL | 9 | Terminate (forced, kernel) | Last resort — cannot be caught or ignored |
SIGSTOP | 19 | Suspend | Cannot be caught; pauses the process |
SIGCONT | 18 | Resume | Resume a stopped process |
SIGUSR1/2 | 10/12 | User-defined | Application-specific actions |
Sending Signals
# Send SIGTERM (15) — graceful shutdown request
kill 1842
# Send SIGKILL (9) — immediate forced termination
kill -9 1842
kill -KILL 1842 # same thing, by name
# Send SIGHUP — tell nginx to reload its configuration
kill -HUP $(cat /run/nginx.pid)
kill -1 1234 # equivalent numeric form
# Kill all processes named nginx (careful on a shared system)
killall nginx
# Kill by pattern match on the full command line
pkill -f "python manage.py runserver"
# Send SIGTERM to all processes owned by a user
pkill -u olduser
# Verify a process responded before resorting to -9
kill 1842 && sleep 3 && kill -0 1842 && echo "still alive" || echo "gone"
Best practice: always try SIGTERM first and wait a few seconds. Only escalate to SIGKILL if the process does not respond. SIGKILL skips any cleanup code (flushing buffers, releasing locks), which can leave data in an inconsistent state.
Process Priority with nice and renice
Linux uses a scheduling priority called niceness ranging from -20 (highest priority, least nice to others) to +19 (lowest priority, most nice). Regular users can only increase niceness (lower priority); only root can set negative values.
Launching with a Custom Priority
# Start a CPU-intensive job at low priority (nice = 10)
nice -n 10 gzip -9 large-archive.tar
# Run a backup script at the lowest possible priority
nice -n 19 /usr/local/bin/backup.sh
# Start a critical real-time job at high priority (root only)
sudo nice -n -10 /usr/sbin/critical-service
Changing Priority of a Running Process
# Lower priority of PID 4321 (any user can increase nice value)
renice -n 5 -p 4321
# Set priority for all processes owned by a user
renice -n 10 -u worker
# Increase priority of PID 5000 (requires root)
sudo renice -n -5 -p 5000
You can also renice interactively from top (press r) or htop (press F7/F8).
Background and Foreground Job Control
The shell provides job control for managing processes you launched from the current session.
# Start a command in the background
long-running-job &
# List background and suspended jobs in the current shell
jobs
# Bring job 1 to the foreground
fg %1
# Resume a suspended job in the background
bg %1
# Suspend the foreground process (sends SIGSTOP)
# Press Ctrl+Z
# Run a command immune to terminal hangup (SIGHUP)
nohup python server.py > server.log 2>&1 &
# Disown a running background job so it survives shell exit
long-running-job &
disown %1
For persistent background services not tied to a terminal session at all, use systemd units, tmux, or screen.
lsof — List Open Files
Every process has file descriptors for open files, network sockets, pipes, and devices. lsof (list open files) exposes this information.
# All files opened by a specific PID
lsof -p 1842
# Which process is listening on port 8080?
lsof -i :8080
# All network connections of a specific process
lsof -i -a -p 1842
# All files opened by a specific user
lsof -u alice
# All processes with a specific file open (useful when umounting fails)
lsof /var/log/app.log
# Find which process holds a deleted file (common cause of disk space not freeing)
lsof | grep deleted
ss and netstat are faster for pure network queries, but lsof is indispensable for correlating network sockets with PIDs and user accounts.
The /proc Filesystem
/proc is a virtual filesystem that the kernel maintains in memory. It is the data source for ps, top, and most other process tools. You can inspect it directly for information that higher-level tools do not expose.
# View the status of PID 1842 — includes State, PPid, threads, UID, memory maps
cat /proc/1842/status
# Read the full command line (null-separated arguments)
cat /proc/1842/cmdline | tr '\0' ' '
# List file descriptors — symlinks to actual files/sockets
ls -la /proc/1842/fd
# Read environment variables passed at launch
cat /proc/1842/environ | tr '\0' '\n'
# Current working directory
readlink /proc/1842/cwd
# Executable path (resolves even for replaced binaries)
readlink /proc/1842/exe
# Memory map — virtual address layout
cat /proc/1842/maps
# I/O statistics for the process
cat /proc/1842/io
# CPU scheduling statistics
cat /proc/1842/schedstat
/proc/PID/status key fields:
Name: nginx
State: S (sleeping)
Pid: 1842
PPid: 1234
VmRSS: 98320 kB ← resident RAM
VmSize: 312448 kB ← virtual size
Threads: 4
Brendan Gregg's USE Method — checking Utilization, Saturation, and Errors per resource — pairs naturally with /proc exploration. See brendangregg.com/usemethod.html.
Troubleshooting Scenarios
Scenario 1: Zombie Processes
A zombie (Z state) is a process that has exited but whose exit status has not been collected by its parent via wait(). Zombies consume no CPU or memory — only a PID slot — but a large accumulation indicates a bug in the parent process.
# Find zombie processes
ps aux | awk '$8 == "Z"'
# Or with pgrep
ps -el | grep ' Z '
Cleanup: You cannot kill a zombie directly — it is already dead. The solution is to signal its parent to call wait():
# Find the parent of zombie PID 9876
ps -o ppid= -p 9876
# Send SIGCHLD to parent (asks it to reap children)
kill -CHLD <parent_pid>
# If the parent is buggy and ignores SIGCHLD, kill the parent
kill <parent_pid>
When the parent exits or is killed, the zombie is adopted by PID 1 (init/systemd), which immediately reaps it.
Scenario 2: OOM Killer
When the Linux kernel runs out of available memory and cannot reclaim enough through page eviction, it invokes the OOM (Out Of Memory) Killer. The kernel selects a process with a high oom_score and kills it to free memory.
Detecting OOM events:
# Check kernel log for OOM kills
sudo grep -i 'oom\|killed process\|out of memory' /var/log/kern.log
# On systems using journald
sudo journalctl -k | grep -i 'oom\|killed process'
# dmesg is faster for recent events
sudo dmesg | grep -i 'oom\|killed'
A typical OOM entry looks like:
kernel: Out of memory: Kill process 7823 (java) score 892 or sacrifice child
kernel: Killed process 7823 (java) total-vm:4194304kB, anon-rss:3145728kB
Understanding oom_score:
# View the current OOM score for a process (0–1000, higher = more likely to be killed)
cat /proc/1842/oom_score
# View the adjustment value (root can set -1000 to +1000)
cat /proc/1842/oom_score_adj
# Protect a critical process from being OOM-killed (root only)
echo -1000 | sudo tee /proc/1842/oom_score_adj
Prevention: Size your workloads correctly, set LimitMEMLOCK and MemoryMax in systemd unit files, and configure application-level heap limits (e.g., -Xmx for JVM apps).
Scenario 3: High-CPU Process Investigation
# 1. Identify the top CPU consumer
top # look at the %CPU column, press P to sort
# 2. Get the PID (e.g., 3344) and look at its command line
cat /proc/3344/cmdline | tr '\0' ' '
# 3. Trace system calls to see what it is doing
sudo strace -p 3344 -c # summary of syscall counts and time
sudo strace -p 3344 # live stream of syscalls (verbose)
# 4. Check if it is CPU-bound or I/O-bound
cat /proc/3344/io # read_bytes / write_bytes
# 5. Profile with perf (requires kernel headers)
sudo perf top -p 3344
# 6. Check application logs for errors driving a retry loop
Scenario 4: Stuck Process in D State (Uninterruptible Sleep)
A process in the D state is waiting on a kernel operation — most commonly disk or network I/O. It cannot be killed with kill -9 because the signal is not delivered until the process wakes up from the kernel wait.
# Confirm the state
ps aux | grep ' D '
# Check what it is waiting on
sudo cat /proc/<PID>/wchan # kernel wait channel name
# Check for I/O issues
iostat -x 1 # look for high %util on a device
dmesg | tail -50 # look for I/O errors or storage timeouts
# For NFS-related D states, check NFS mounts
df -h # if this hangs, you have a stale NFS mount
cat /proc/mounts # if df hangs, read /proc directly
If a device is failing, force-unmounting the filesystem (umount -l for lazy unmount) or resolving the underlying storage issue will eventually release the D-state process. In the worst case, a kernel panic and reboot may be the only option.
cgroups and systemd-cgtop
Control groups (cgroups) are a Linux kernel feature that groups processes and enforces resource limits (CPU, memory, I/O, network) on those groups. systemd uses cgroups v2 extensively — every service, user session, and slice maps to a cgroup hierarchy.
Viewing Resource Usage by Service
# Interactive top-like view of cgroup resource usage
systemd-cgtop
# One-shot snapshot sorted by CPU
systemd-cgtop -n 1
# Filter to a specific service's cgroup
systemd-cgtop /system.slice/nginx.service
systemd-cgtop columns show the cgroup path, number of tasks, CPU %, memory in use, and I/O.
Inspecting cgroup Limits
# Show current cgroup for a process
cat /proc/1842/cgroup
# List memory limits for the nginx service cgroup (cgroups v2)
cat /sys/fs/cgroup/system.slice/nginx.service/memory.max
cat /sys/fs/cgroup/system.slice/nginx.service/memory.current
# CPU quota (e.g., 50000 = 50% of one core per 100ms period)
cat /sys/fs/cgroup/system.slice/nginx.service/cpu.max
Setting Limits via systemd
# Set memory limit for a running service (transient, lost on restart)
sudo systemctl set-property nginx.service MemoryMax=512M
# Set CPU quota to 50% of one core
sudo systemctl set-property nginx.service CPUQuota=50%
# Make it permanent (writes to /etc/systemd/system.control/)
sudo systemctl set-property --runtime=no nginx.service MemoryMax=512M
For detailed cgroup tuning, refer to the Red Hat RHEL 9 resource management guide.
Quick Reference Cheat Sheet
# Snapshot of all processes
ps aux
ps -ef
# Process tree
ps auxf
pstree -p
# Find a process
pgrep -la nginx
pgrep -f "manage.py"
# Interactive monitors
top
htop
btop
# Send signals
kill PID # SIGTERM
kill -9 PID # SIGKILL
kill -HUP PID # SIGHUP (reload)
killall nginx
pkill -f pattern
# Priority
nice -n 10 command
renice -n 5 -p PID
# Jobs
command &
jobs
fg %1
bg %1
nohup command &
disown %1
# Open files
lsof -p PID
lsof -i :8080
lsof -u alice
# /proc inspection
cat /proc/PID/status
cat /proc/PID/cmdline | tr '\0' ' '
ls -la /proc/PID/fd
cat /proc/PID/oom_score
# cgroups
systemd-cgtop
cat /proc/PID/cgroup
Summary
Linux process management in 2026 is a layered discipline. At the foundation, the kernel tracks process states (R, S, D, T, Z) and exposes them through the /proc virtual filesystem. Classical tools — ps, top, and kill — remain indispensable for scripting and low-level inspection. htop added interactivity and color; btop now goes further with animated graphs, network and disk stats, and a polished interface that makes it the default interactive monitor for many engineers.
Understanding signals is essential for correct process control: always prefer SIGTERM (15) over SIGKILL (9), use SIGHUP for configuration reloads, and remember that SIGKILL cannot reach a process in D state. Niceness and cgroups give you coarse and fine-grained priority control respectively, and systemd-cgtop makes cgroup resource accounting visible at a glance.
For troubleshooting, keep these patterns in mind: zombie processes require killing or signaling the parent; OOM kills leave traces in kern.log and /proc/PID/oom_score; D-state processes require fixing the underlying I/O issue, not sending more signals; and strace combined with /proc inspection is the fastest path to understanding what a misbehaving process is actually doing.