Playwright Python Tutorial 2026: Web Scraping, Browser Automation, and pytest Testing

Playwright Python Tutorial 2026: Web Scraping, Browser Automation, and pytest Testing

Playwright has quietly overtaken Selenium as the go-to browser automation library for Python developers. Search interest has grown 60% year-over-year, and for good reason: it is faster, more reliable, and requires far less boilerplate. This tutorial walks you through everything — scraping JavaScript-rendered pages, automating browser actions, writing pytest tests, intercepting network requests, and running headless in Docker.


1. Why Playwright Beats Selenium in 2026

If you have used Selenium, you have written code like this:

time.sleep(2)  # hope the element is ready by now
driver.find_element(By.ID, "submit").click()

That time.sleep is a symptom of Selenium's fundamental design: it does not know when the page is ready, so you guess. Playwright solves this with auto-waiting. Every action — click, fill, press — automatically waits for the target element to be visible, stable, and enabled before acting. Flaky tests caused by timing issues drop dramatically.

Other reasons Playwright has pulled ahead in playwright vs selenium comparisons:

  • Multi-browser support out of the box. A single install covers Chromium, Firefox, and WebKit (Safari's engine). No separate driver binaries to manage.
  • Network interception built in. Mock API responses, block third-party trackers, or capture request payloads — all with a few lines of code.
  • Modern async API. Playwright's async/await API is a first-class citizen, not an afterthought.
  • 60% YoY search growth. The community, tooling ecosystem, and documentation are expanding rapidly.
  • codegen test recorder. Record any browser session and export it as runnable Python code in seconds.

2. Installation

pip install playwright
playwright install chromium

To install all three browsers at once:

playwright install

On a fresh Linux server or CI environment, also pull the OS-level dependencies:

playwright install-deps chromium

Verify the install:

python -c "from playwright.sync_api import sync_playwright; print('OK')"

3. First Script: Scrape a JavaScript-Rendered Page

Many modern sites render their content via JavaScript, which means requests + BeautifulSoup sees an empty shell. This is the core use case for playwright web scraping python: running a real browser that executes JavaScript before you extract data.

import asyncio
from playwright.async_api import async_playwright

async def scrape_hn():
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()

        await page.goto("https://news.ycombinator.com")

        # Auto-wait for the story list to appear
        await page.wait_for_selector(".athing")

        titles = await page.eval_on_selector_all(
            ".athing .titleline > a",
            "elements => elements.map(el => el.textContent)"
        )

        for i, title in enumerate(titles[:10], 1):
            print(f"{i}. {title}")

        await browser.close()

asyncio.run(scrape_hn())

The sync API (sync_playwright) is also available if you prefer synchronous code, but the async API integrates cleanly with asyncio and is the recommended approach for scraping pipelines where you may run many pages concurrently.


4. Selectors: CSS, Text, and Role-Based

Playwright supports multiple selector strategies. Prefer the semantic ones — they are more resilient to markup changes and mirror how users actually interact with a page.

# CSS selector (classic, works everywhere)
page.locator("button.submit-btn")

# Visible text content
page.get_by_text("Sign in")

# ARIA role — most robust for interactive elements
page.get_by_role("button", name="Submit")
page.get_by_role("link", name="Home")
page.get_by_role("textbox", name="Email")

# Label text — ideal for form fields
page.get_by_label("Password")

# Placeholder text
page.get_by_placeholder("Search...")

# Test ID attribute — use data-testid in your HTML
page.get_by_test_id("checkout-button")

page.get_by_role() and page.get_by_text() are the recommended defaults. They survive refactoring because they match what users see, not internal CSS class names that developers change freely.


5. Actions: Click, Fill, Screenshot, PDF

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)  # headless=False to watch it run
    page = browser.new_page()
    page.goto("https://example.com/login")

    # Fill a login form
    page.get_by_label("Email").fill("[email protected]")
    page.get_by_label("Password").fill("secret123")
    page.get_by_role("button", name="Log in").click()

    # Wait for navigation to complete
    page.wait_for_url("**/dashboard")

    # Take a full-page screenshot
    page.screenshot(path="dashboard.png", full_page=True)

    # Save as PDF — Chromium only
    page.pdf(path="dashboard.pdf", format="A4")

    browser.close()

Every action auto-waits. When .click() is called, Playwright waits for the element to be visible, stable (not mid-animation), and enabled before dispatching the event. You almost never need time.sleep() or manual wait_for_selector() calls.

Use expect() assertions to verify state between steps:

from playwright.sync_api import expect

expect(page.get_by_role("heading", name="Dashboard")).to_be_visible()

6. Network Interception: Mock API Responses and Block Trackers

page.route() intercepts any request the browser makes. This is invaluable for playwright web scraping python (block ads and trackers to speed up crawls) and for testing (mock the backend to test error states).

async def run(page):
    # Mock a REST API endpoint with custom data
    await page.route(
        "**/api/v1/products",
        lambda route: route.fulfill(
            status=200,
            content_type="application/json",
            body='[{"id": 1, "name": "Widget"}]'
        )
    )

    # Block all analytics and tracker requests
    await page.route(
        "**/(google-analytics|googletagmanager|hotjar|mixpanel).**",
        lambda route: route.abort()
    )

    await page.goto("https://example.com/shop")

You can also observe traffic without modifying it:

page.on("request", lambda req: print(req.method, req.url))
page.on("response", lambda res: print(res.status, res.url))

7. Playwright + pytest

The pytest-playwright plugin provides browser fixtures and makes writing test suites straightforward. Install it alongside pytest-xdist for parallel execution:

pip install pytest-playwright pytest-xdist

Built-in Fixtures

pytest-playwright ships three core fixtures:

  • page — a fresh page per test (function scope)
  • browser — a single browser instance shared across the session
  • context — an isolated browser context (cookies, storage) per test
# tests/test_login.py
from playwright.sync_api import expect

def test_login_success(page):
    page.goto("https://example.com/login")
    page.get_by_label("Email").fill("[email protected]")
    page.get_by_label("Password").fill("correct-password")
    page.get_by_role("button", name="Log in").click()
    expect(page).to_have_url("**/dashboard")

def test_login_failure(page):
    page.goto("https://example.com/login")
    page.get_by_label("Email").fill("[email protected]")
    page.get_by_label("Password").fill("wrong-password")
    page.get_by_role("button", name="Log in").click()
    expect(page.get_by_text("Invalid credentials")).to_be_visible()

Parallel Test Execution

With pytest-xdist, pytest spawns one worker per CPU core. Each worker gets its own browser instance — no shared state, no race conditions.

# Auto-detect CPU count
pytest --numprocesses auto

# Explicit worker count
pytest --numprocesses 4

For large suites in CI, combine with Playwright's built-in sharding to split across multiple machines:

# On machine 1: run shard 1 of 4
pytest --shard=1/4 --browser chromium

Record Tests with codegen

Instead of writing selectors by hand, record your actions in a live browser:

playwright codegen https://example.com

A browser window opens. Everything you click and type is captured. When you close it, Playwright outputs ready-to-run Python test code. Use it as a starting point and refine from there.

# Save output to a file
playwright codegen --output tests/test_recorded.py https://example.com

8. Headless in Docker

Running headless Playwright in Docker requires installing the system libraries that the bundled Chromium binary depends on. The official Microsoft image handles this for you:

FROM mcr.microsoft.com/playwright/python:v1.44.0-jammy

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["pytest", "--numprocesses", "auto", "-v"]

If you prefer a minimal custom image based on python:slim:

FROM python:3.12-slim

RUN apt-get update && apt-get install -y \
    libnss3 libatk1.0-0 libatk-bridge2.0-0 libcups2 \
    libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 \
    libxrandr2 libgbm1 libpango-1.0-0 libcairo2 libasound2 \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
RUN playwright install chromium
RUN playwright install-deps chromium

COPY . .

CMD ["pytest", "--numprocesses", "auto", "-v"]

requirements.txt:

playwright
pytest
pytest-playwright
pytest-xdist

Build and run:

docker build -t playwright-tests .
docker run --rm playwright-tests

9. Playwright vs Selenium: Comparison Table

FeaturePlaywrightSelenium
Auto-waitingYes — built-in on every actionNo — manual WebDriverWait required
Browsers supportedChromium, Firefox, WebKitChrome, Firefox, Edge, Safari (via drivers)
Driver managementNot neededRequired — WebDriver binaries per browser
Network interceptionBuilt-in page.route()Requires proxy or browser extension
Parallel executionpytest-xdist + built-in shardingSelenium Grid (separate infrastructure)
Test recordingplaywright codegenThird-party tools only
Async APIFirst-class async_playwrightLimited, tacked on
Active maintenanceMicrosoft — frequent releasesSelenium HQ — slower release cadence
SpeedFaster — direct CDP/BiDi protocolSlower — WebDriver HTTP protocol
Python installpip install playwrightpip install selenium + driver binary

The only scenario where Selenium still wins is when you need to test against obscure or legacy browsers that Playwright does not support. For all other playwright vs selenium decisions in 2026, Playwright is the better choice.


10. FAQ

Q: Does Playwright work with requests and BeautifulSoup? They serve different purposes. Use requests + BeautifulSoup for static HTML pages — it is much faster. Switch to Playwright when the page requires JavaScript execution to render its content.

Q: Can I use Playwright for scraping at scale? Yes. Use the async API (async_playwright) and open multiple browser contexts concurrently to maximize throughput. Run workers inside Docker containers for isolation and reproducibility. Always respect robots.txt and the site's terms of service.

Q: sync vs async API — which should I use? Use the async API for scraping pipelines and when integrating with async frameworks like FastAPI. Use the sync API for quick scripts and for pytest suites — pytest-playwright uses sync fixtures under the hood.

Q: How do I handle authentication across tests without re-logging in every time? Log in once in a session-scoped fixture, save the context with context.storage_state(path="auth.json"), then load it for each test: browser.new_context(storage_state="auth.json").

Q: Is Playwright free? Yes. Playwright is open source under the Apache 2.0 license and maintained by Microsoft.

Q: playwright python tutorial — where do I go next? The official Playwright Python docs cover tracing, video recording, mobile emulation, and the full API reference. The playwright codegen command is the fastest way to explore any new site's structure.


Summary

Playwright has become the standard for browser automation in Python. Its auto-waiting eliminates the most common source of test flakiness, its network interception makes API mocking trivial, and pytest-playwright plugs directly into the testing workflows most Python developers already use. Whether you are doing playwright web scraping python, writing a playwright pytest test suite, or evaluating playwright vs selenium for a new project, Playwright is the higher-leverage choice in 2026.

Leonardo Lazzaro

Software engineer and technical writer. 10+ years experience in DevOps, Python, and Linux systems.

More articles by Leonardo Lazzaro