Playwright Python Tutorial 2026: Web Scraping, Browser Automation, and pytest Testing
Playwright has quietly overtaken Selenium as the go-to browser automation library for Python developers. Search interest has grown 60% year-over-year, and for good reason: it is faster, more reliable, and requires far less boilerplate. This tutorial walks you through everything — scraping JavaScript-rendered pages, automating browser actions, writing pytest tests, intercepting network requests, and running headless in Docker.
1. Why Playwright Beats Selenium in 2026
If you have used Selenium, you have written code like this:
time.sleep(2) # hope the element is ready by now
driver.find_element(By.ID, "submit").click()
That time.sleep is a symptom of Selenium's fundamental design: it does not know when the page is ready, so you guess. Playwright solves this with auto-waiting. Every action — click, fill, press — automatically waits for the target element to be visible, stable, and enabled before acting. Flaky tests caused by timing issues drop dramatically.
Other reasons Playwright has pulled ahead in playwright vs selenium comparisons:
- Multi-browser support out of the box. A single install covers Chromium, Firefox, and WebKit (Safari's engine). No separate driver binaries to manage.
- Network interception built in. Mock API responses, block third-party trackers, or capture request payloads — all with a few lines of code.
- Modern async API. Playwright's
async/awaitAPI is a first-class citizen, not an afterthought. - 60% YoY search growth. The community, tooling ecosystem, and documentation are expanding rapidly.
codegentest recorder. Record any browser session and export it as runnable Python code in seconds.
2. Installation
pip install playwright
playwright install chromium
To install all three browsers at once:
playwright install
On a fresh Linux server or CI environment, also pull the OS-level dependencies:
playwright install-deps chromium
Verify the install:
python -c "from playwright.sync_api import sync_playwright; print('OK')"
3. First Script: Scrape a JavaScript-Rendered Page
Many modern sites render their content via JavaScript, which means requests + BeautifulSoup sees an empty shell. This is the core use case for playwright web scraping python: running a real browser that executes JavaScript before you extract data.
import asyncio
from playwright.async_api import async_playwright
async def scrape_hn():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
await page.goto("https://news.ycombinator.com")
# Auto-wait for the story list to appear
await page.wait_for_selector(".athing")
titles = await page.eval_on_selector_all(
".athing .titleline > a",
"elements => elements.map(el => el.textContent)"
)
for i, title in enumerate(titles[:10], 1):
print(f"{i}. {title}")
await browser.close()
asyncio.run(scrape_hn())
The sync API (sync_playwright) is also available if you prefer synchronous code, but the async API integrates cleanly with asyncio and is the recommended approach for scraping pipelines where you may run many pages concurrently.
4. Selectors: CSS, Text, and Role-Based
Playwright supports multiple selector strategies. Prefer the semantic ones — they are more resilient to markup changes and mirror how users actually interact with a page.
# CSS selector (classic, works everywhere)
page.locator("button.submit-btn")
# Visible text content
page.get_by_text("Sign in")
# ARIA role — most robust for interactive elements
page.get_by_role("button", name="Submit")
page.get_by_role("link", name="Home")
page.get_by_role("textbox", name="Email")
# Label text — ideal for form fields
page.get_by_label("Password")
# Placeholder text
page.get_by_placeholder("Search...")
# Test ID attribute — use data-testid in your HTML
page.get_by_test_id("checkout-button")
page.get_by_role() and page.get_by_text() are the recommended defaults. They survive refactoring because they match what users see, not internal CSS class names that developers change freely.
5. Actions: Click, Fill, Screenshot, PDF
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch(headless=False) # headless=False to watch it run
page = browser.new_page()
page.goto("https://example.com/login")
# Fill a login form
page.get_by_label("Email").fill("[email protected]")
page.get_by_label("Password").fill("secret123")
page.get_by_role("button", name="Log in").click()
# Wait for navigation to complete
page.wait_for_url("**/dashboard")
# Take a full-page screenshot
page.screenshot(path="dashboard.png", full_page=True)
# Save as PDF — Chromium only
page.pdf(path="dashboard.pdf", format="A4")
browser.close()
Every action auto-waits. When .click() is called, Playwright waits for the element to be visible, stable (not mid-animation), and enabled before dispatching the event. You almost never need time.sleep() or manual wait_for_selector() calls.
Use expect() assertions to verify state between steps:
from playwright.sync_api import expect
expect(page.get_by_role("heading", name="Dashboard")).to_be_visible()
6. Network Interception: Mock API Responses and Block Trackers
page.route() intercepts any request the browser makes. This is invaluable for playwright web scraping python (block ads and trackers to speed up crawls) and for testing (mock the backend to test error states).
async def run(page):
# Mock a REST API endpoint with custom data
await page.route(
"**/api/v1/products",
lambda route: route.fulfill(
status=200,
content_type="application/json",
body='[{"id": 1, "name": "Widget"}]'
)
)
# Block all analytics and tracker requests
await page.route(
"**/(google-analytics|googletagmanager|hotjar|mixpanel).**",
lambda route: route.abort()
)
await page.goto("https://example.com/shop")
You can also observe traffic without modifying it:
page.on("request", lambda req: print(req.method, req.url))
page.on("response", lambda res: print(res.status, res.url))
7. Playwright + pytest
The pytest-playwright plugin provides browser fixtures and makes writing test suites straightforward. Install it alongside pytest-xdist for parallel execution:
pip install pytest-playwright pytest-xdist
Built-in Fixtures
pytest-playwright ships three core fixtures:
page— a fresh page per test (function scope)browser— a single browser instance shared across the sessioncontext— an isolated browser context (cookies, storage) per test
# tests/test_login.py
from playwright.sync_api import expect
def test_login_success(page):
page.goto("https://example.com/login")
page.get_by_label("Email").fill("[email protected]")
page.get_by_label("Password").fill("correct-password")
page.get_by_role("button", name="Log in").click()
expect(page).to_have_url("**/dashboard")
def test_login_failure(page):
page.goto("https://example.com/login")
page.get_by_label("Email").fill("[email protected]")
page.get_by_label("Password").fill("wrong-password")
page.get_by_role("button", name="Log in").click()
expect(page.get_by_text("Invalid credentials")).to_be_visible()
Parallel Test Execution
With pytest-xdist, pytest spawns one worker per CPU core. Each worker gets its own browser instance — no shared state, no race conditions.
# Auto-detect CPU count
pytest --numprocesses auto
# Explicit worker count
pytest --numprocesses 4
For large suites in CI, combine with Playwright's built-in sharding to split across multiple machines:
# On machine 1: run shard 1 of 4
pytest --shard=1/4 --browser chromium
Record Tests with codegen
Instead of writing selectors by hand, record your actions in a live browser:
playwright codegen https://example.com
A browser window opens. Everything you click and type is captured. When you close it, Playwright outputs ready-to-run Python test code. Use it as a starting point and refine from there.
# Save output to a file
playwright codegen --output tests/test_recorded.py https://example.com
8. Headless in Docker
Running headless Playwright in Docker requires installing the system libraries that the bundled Chromium binary depends on. The official Microsoft image handles this for you:
FROM mcr.microsoft.com/playwright/python:v1.44.0-jammy
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["pytest", "--numprocesses", "auto", "-v"]
If you prefer a minimal custom image based on python:slim:
FROM python:3.12-slim
RUN apt-get update && apt-get install -y \
libnss3 libatk1.0-0 libatk-bridge2.0-0 libcups2 \
libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 \
libxrandr2 libgbm1 libpango-1.0-0 libcairo2 libasound2 \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
RUN playwright install chromium
RUN playwright install-deps chromium
COPY . .
CMD ["pytest", "--numprocesses", "auto", "-v"]
requirements.txt:
playwright
pytest
pytest-playwright
pytest-xdist
Build and run:
docker build -t playwright-tests .
docker run --rm playwright-tests
9. Playwright vs Selenium: Comparison Table
| Feature | Playwright | Selenium |
|---|---|---|
| Auto-waiting | Yes — built-in on every action | No — manual WebDriverWait required |
| Browsers supported | Chromium, Firefox, WebKit | Chrome, Firefox, Edge, Safari (via drivers) |
| Driver management | Not needed | Required — WebDriver binaries per browser |
| Network interception | Built-in page.route() | Requires proxy or browser extension |
| Parallel execution | pytest-xdist + built-in sharding | Selenium Grid (separate infrastructure) |
| Test recording | playwright codegen | Third-party tools only |
| Async API | First-class async_playwright | Limited, tacked on |
| Active maintenance | Microsoft — frequent releases | Selenium HQ — slower release cadence |
| Speed | Faster — direct CDP/BiDi protocol | Slower — WebDriver HTTP protocol |
| Python install | pip install playwright | pip install selenium + driver binary |
The only scenario where Selenium still wins is when you need to test against obscure or legacy browsers that Playwright does not support. For all other playwright vs selenium decisions in 2026, Playwright is the better choice.
10. FAQ
Q: Does Playwright work with requests and BeautifulSoup? They serve different purposes. Use requests + BeautifulSoup for static HTML pages — it is much faster. Switch to Playwright when the page requires JavaScript execution to render its content.
Q: Can I use Playwright for scraping at scale? Yes. Use the async API (async_playwright) and open multiple browser contexts concurrently to maximize throughput. Run workers inside Docker containers for isolation and reproducibility. Always respect robots.txt and the site's terms of service.
Q: sync vs async API — which should I use? Use the async API for scraping pipelines and when integrating with async frameworks like FastAPI. Use the sync API for quick scripts and for pytest suites — pytest-playwright uses sync fixtures under the hood.
Q: How do I handle authentication across tests without re-logging in every time? Log in once in a session-scoped fixture, save the context with context.storage_state(path="auth.json"), then load it for each test: browser.new_context(storage_state="auth.json").
Q: Is Playwright free? Yes. Playwright is open source under the Apache 2.0 license and maintained by Microsoft.
Q: playwright python tutorial — where do I go next? The official Playwright Python docs cover tracing, video recording, mobile emulation, and the full API reference. The playwright codegen command is the fastest way to explore any new site's structure.
Summary
Playwright has become the standard for browser automation in Python. Its auto-waiting eliminates the most common source of test flakiness, its network interception makes API mocking trivial, and pytest-playwright plugs directly into the testing workflows most Python developers already use. Whether you are doing playwright web scraping python, writing a playwright pytest test suite, or evaluating playwright vs selenium for a new project, Playwright is the higher-leverage choice in 2026.