How to Bypass Cloudflare When Web Scraping in 2026

Sloane Merritt

17 Mar 2026 — 11 min read

Cloudflare protects over 20% of all websites on the internet. If you have spent any time building web scrapers, you have almost certainly hit it — the instant 403, the 1020 Access Denied page, or the spinning interstitial that your scraper cannot get past. In 2024 and 2025, Cloudflare's bot detection became significantly more aggressive. Sites that were easy to scrape two years ago now require the full stack.

The good news: Cloudflare is bypassable. The bad news: not with the approaches most tutorials describe. This post gives you the accurate picture of what Cloudflare actually checks in 2026, why common workarounds fail, and what actually works — with production-ready Python code you can run today.

TL;DR — Cloudflare blocks scrapers at six layers: IP reputation, TLS fingerprint, HTTP/2 fingerprint, browser headers, JavaScript challenge, and behavioural scoring. A plain Python requests call fails all six instantly. The reliable fix in 2026 is a managed API that handles all layers transparently. This post explains each detection layer, what ScrapeUp's premium=true and unlock=true parameters do to beat them, and includes production-ready Python code for five common scenarios.

Why your scraper gets blocked: the 1020 error

Cloudflare's 1020 error ('Access Denied') and 403 responses are not random. They are the output of a layered scoring system that evaluates every incoming request across six independent signals before deciding whether to serve content or block. Most scrapers fail on the first two layers before Cloudflare even looks at anything else.

Here is what happens when a plain Python requests call hits a Cloudflare-protected site:

# This is why your scraper fails against Cloudflare
import requests

# Attempt 1: plain requests — fails immediately
r = requests.get("https://cloudflare-protected-site.com/data")
print(r.status_code)   # 403 or 1020
print(r.text[:200])    # "Error 1020 Access Denied" or CAPTCHA HTML

# What Cloudflare sees when this request arrives:
# - IP:         AWS datacenter range (pre-flagged)
# - TLS:        Python urllib3 JA3 fingerprint (known bot signature)
# - User-Agent: "python-requests/2.31.0" (instant flag)
# - Headers:    Missing sec-ch-ua, sec-fetch-*, accept-language
# - Cookies:    No cf_clearance, no prior session
# - Behaviour:  Cold session, no referrer chain
# Combined bot score: ~0.98 (Cloudflare blocks at 0.5+)

The comment in that code is accurate: your request arrives with a bot confidence score near 1.0 before Cloudflare reads a single header. That is why the 403 is instantaneous — there is no decision to make.

The six Cloudflare detection layers (2026)

Understanding the detection stack is what separates developers who solve this from developers who spend weeks adding random workarounds that do not address the actual problem. Here is what each layer checks and why standard scrapers fail it:

Detection layer	What Cloudflare checks	Python requests result	ScrapeUp fix
IP reputation	ASN / datacenter range	Blocked (AWS pre-flagged)	Residential IP pool
TLS fingerprint	JA3 / JA4 cipher suite hash	urllib3 hash = known bot	Chrome 122 TLS match
HTTP/2 fingerprint	SETTINGS frames, header order	Non-browser signature	Real Chrome HTTP/2 stack
Browser headers	sec-ch-ua, sec-fetch-*, accept	Missing or wrong values	Full Chrome header set
JS challenge	navigator.webdriver, canvas, WebGL	Cannot execute JS	Headless Chrome execution
Behavioural / Turnstile	Session history, mouse entropy	Cold session, bot score 0.98	Session warm-up + CAPTCHA solver

Each layer is independent — failing one is enough to get blocked. Most DIY approaches address one or two layers but miss the others. A correctly rotated residential IP with wrong browser headers is still blocked. A correct browser header set with a datacenter IP is still blocked. You need all six.

# cloudflare_layers.py — what each detection layer checks

# LAYER 1: IP REPUTATION (checked in <1ms, before headers are read)
# Cloudflare maintains ASN-level blocklists.
# AWS us-east-1 (AS16509), GCP (AS15169), DigitalOcean (AS14061)
# are pre-scored as high-bot-probability.
# Fix: residential or ISP proxies only.

# LAYER 2: TLS FINGERPRINT (JA3/JA4 hash)
# Python urllib3 has a unique cipher suite order.
# JA3 hash for python-requests: 6734f37431670b3ab4292b8f60f29984
# Chrome 122 JA3:               abc123... (completely different)
# Fix: use curl-impersonate or a managed API that spoofs TLS.

# LAYER 3: HTTP/2 FINGERPRINT (AKAMAI fingerprint)
# Header pseudo-order, SETTINGS frames, WINDOW_UPDATE values
# differ between real Chrome and any HTTP library.
# Fix: full headless browser or API with HTTP/2 spoofing.

# LAYER 4: BROWSER HEADERS
# Missing or wrong values trigger immediate flag:
bad_headers = {
    "User-Agent": "python-requests/2.31.0",  # instant flag
    # Missing: sec-ch-ua, sec-ch-ua-mobile, sec-ch-ua-platform
    # Missing: sec-fetch-dest, sec-fetch-mode, sec-fetch-site
    # Missing: accept-language (or wrong casing)
}

good_headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
    "sec-ch-ua": '"Chromium";v="122", "Not(A:Brand";v="24", "Google Chrome";v="122"',
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": '"macOS"',
    "sec-fetch-dest": "document",
    "sec-fetch-mode": "navigate",
    "sec-fetch-site": "none",
    "accept-language": "en-US,en;q=0.9",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
}

# LAYER 5: JAVASCRIPT CHALLENGE
# Cloudflare's interstitial page runs JS to:
# - Check navigator.webdriver (Selenium/Playwright leak this)
# - Measure canvas fingerprint
# - Verify WebGL renderer string
# - Test timezone / locale consistency
# - Validate performance.now() timing patterns
# Fix: stealth browser or API with JS challenge bypass.

# LAYER 6: BEHAVIOURAL ANALYSIS (Turnstile + Bot Management)
# Cloudflare tracks: mouse movement entropy, click patterns,
# scroll behaviour, time-on-page, navigation depth.
# Cold sessions with no history get higher bot scores.
# Fix: session warm-up before scraping target pages.

What ScrapeUp does differently

ScrapeUp abstracts all six detection layers into two parameters. You do not manage proxies, TLS configuration, header sets, JavaScript execution environments, or CAPTCHA solvers. You pass a URL and get back clean HTML.

ScrapeUp parameter	What it does	Best for	Credit cost
`premium=true`	Residential IP + Chrome TLS + browser headers	Most Cloudflare sites	25 credits/request
`unlock=true`	Full bypass: JS challenges, Turnstile, cf_clearance	Enterprise Bot Management	50 credits/request
`session_number`	Reuses same proxy IP across requests	Session warm-up strategy	No extra cost
`keep_headers=true`	Forwards your custom headers verbatim	Sites checking specific UA or cookies	No extra cost

The key distinction between premium and unlock is the JavaScript challenge layer. Sites using standard Cloudflare protection (the majority) are handled by premium=true alone — residential IP plus correct TLS and headers is enough. Sites using Cloudflare Enterprise Bot Management, which deploys per-customer machine learning models and mandatory Turnstile challenges, require unlock=true which activates a full headless Chrome session with CAPTCHA solving.

Scenario 1: basic Cloudflare bypass (most sites)

The majority of Cloudflare-protected sites — news sites, standard e-commerce, directories, most SaaS pricing pages — are handled by premium=true alone. This is the starting point for any Cloudflare scraping task:

# bypass_cloudflare_scrapeup.py — the one-line fix
import requests
from bs4 import BeautifulSoup

API_KEY = "your_scrapeup_key"   # get free key at scrapeup.com

def scrape_cloudflare_site(url: str) -> str:
    """
    ScrapeUp handles every Cloudflare detection layer automatically:
    - Routes through residential IPs (not datacenter)
    - Matches Chrome TLS + HTTP/2 fingerprint
    - Injects correct browser headers
    - Solves JS challenges and Turnstile CAPTCHAs
    - Manages session continuity and cookies

    You just pass the URL and get back clean HTML.
    """
    r = requests.get("https://api.scrapeup.com", params={
        "api_key": API_KEY,
        "url":     url,
        "premium": "true",   # residential IP — bypasses IP reputation check
    }, timeout=90)

    if r.status_code != 200:
        raise Exception(f"API error: {r.status_code} — {r.text[:200]}")

    return r.text

# Run it against a Cloudflare-protected site
html = scrape_cloudflare_site("https://www.example-cf-protected.com/products")
soup = BeautifulSoup(html, "html.parser")
print(f"Got {len(html):,} bytes — page title: {soup.title.string if soup.title else 'N/A'}")

Scenario 2: aggressive sites with Enterprise Bot Management

Sites like Realtor.com, Ticketmaster, Shein, and some financial data providers use Cloudflare Enterprise with custom bot scoring. These require the full bypass stack. ScrapeUp's unlock=true parameter activates its Web Unlocker, which deploys a real Chrome browser session with CAPTCHA-solving capability:

# bypass_cloudflare_advanced.py — for sites with aggressive Cloudflare Enterprise
import requests
import time

API_KEY = "your_scrapeup_key"

def scrape_with_escalation(url: str) -> str:
    """
    Tier-escalating strategy for Cloudflare-protected sites:
      Attempt 1: premium=true (residential IP + browser fingerprint)
      Attempt 2: unlock=true (full Cloudflare bypass + CAPTCHA solver)

    Use unlock=true for sites with:
    - Cloudflare Enterprise Bot Management
    - Turnstile CAPTCHA challenges
    - Per-customer ML scoring models
    - Sites like Ticketmaster, Zillow, Realtor.com
    """
    # First attempt: premium residential proxy
    r = requests.get("https://api.scrapeup.com", params={
        "api_key": API_KEY,
        "url":     url,
        "premium": "true",
    }, timeout=90)

    html = r.text
    # Check if we got a real page or a Cloudflare block
    if r.status_code == 200 and "cf-error" not in html and "Access denied" not in html:
        print("Success on attempt 1 (premium proxy)")
        return html

    print("Premium proxy blocked — escalating to Web Unlocker...")
    time.sleep(2)

    # Second attempt: full Cloudflare bypass
    r = requests.get("https://api.scrapeup.com", params={
        "api_key": API_KEY,
        "url":     url,
        "unlock":  "true",   # CAPTCHA solving + full fingerprint bypass
    }, timeout=90)

    if r.status_code == 200:
        print("Success on attempt 2 (Web Unlocker)")
        return r.text

    raise Exception(f"Both attempts failed: HTTP {r.status_code}")

# Example: scrape a heavily protected e-commerce site
html = scrape_with_escalation("https://www.heavily-protected-site.com/products")
print(f"Retrieved {len(html):,} bytes")

The escalation strategy in that code is deliberate. unlock=true costs 50 credits per request versus 25 for premium=true. For sites where premium works, there is no reason to pay more. Only escalate when the block signals tell you to.

Scenario 3: session warm-up for behavioural scoring

Cloudflare's Bot Management scores sessions based on history. A cold session that jumps directly to a target page — no referrer, no prior cookies, no navigation history — arrives with a high bot score regardless of IP quality. The fix is session warm-up: hitting the site's homepage and a few internal pages on the same proxy IP before scraping the target.

ScrapeUp's session_number parameter makes this straightforward. All requests sharing a session_number route through the same residential IP, so cookies accumulate and the session builds a browsing history:

# session_warmup.py — defeat Cloudflare's behavioural scoring
import requests
import time
import random

API_KEY = "your_scrapeup_key"

def create_session_id() -> str:
    """Generate a unique session ID for ScrapeUp's session_number param.
    Same session ID = same proxy IP across all requests."""
    import uuid
    return str(uuid.uuid4())[:16]

def warm_up_session(domain: str, session_id: str) -> None:
    """
    Hit the homepage and a few internal pages before scraping the target.
    Cloudflare's Bot Management scores sessions based on prior behaviour.
    A session that 'browsed' the site first has a much lower bot score.
    """
    warm_up_urls = [
        f"https://{domain}/",
        f"https://{domain}/about",
    ]

    for url in warm_up_urls:
        print(f"  Warming up: {url}")
        requests.get("https://api.scrapeup.com", params={
            "api_key":        API_KEY,
            "url":            url,
            "premium":        "true",
            "session_number": session_id,
        }, timeout=60)
        # Human-like delay between page visits
        time.sleep(random.uniform(2.5, 5.0))

def scrape_with_warm_session(target_url: str, domain: str) -> str:
    session_id = create_session_id()

    # Step 1: warm up the session with realistic browsing
    print(f"Creating session {session_id}...")
    warm_up_session(domain, session_id)

    # Step 2: scrape the target using the same session (same IP)
    print(f"Scraping target: {target_url}")
    r = requests.get("https://api.scrapeup.com", params={
        "api_key":        API_KEY,
        "url":            target_url,
        "premium":        "true",
        "session_number": session_id,   # reuses same IP as warm-up
    }, timeout=90)

    return r.text

html = scrape_with_warm_session(
    target_url="https://www.example.com/protected-data",
    domain="example.com"
)
print(f"Retrieved {len(html):,} bytes with warmed session")

Use session warm-up any time you are scraping a site where premium=true gets a 403 on the first attempt but succeeds if you try again a few seconds later. That pattern indicates behavioural scoring — the site is waiting to see navigation history before serving content.

Scenario 4: Cloudflare Turnstile CAPTCHA

Turnstile is Cloudflare's CAPTCHA system, deployed across millions of sites. Unlike reCAPTCHA v2 which shows a visible checkbox, Turnstile often runs invisibly in the background — it evaluates browser signals and generates a verification token without user interaction. When your scraper gets a page with Turnstile active but no visible CAPTCHA, it still cannot access the underlying content without a valid token.

Detection is straightforward: look for the Turnstile script or widget in the page HTML. The fix is unlock=true:

# turnstile_bypass.py — handle Cloudflare Turnstile CAPTCHA
import requests
from bs4 import BeautifulSoup

API_KEY = "your_scrapeup_key"

def detect_turnstile(html: str) -> bool:
    """Check if the page contains a Turnstile CAPTCHA challenge."""
    indicators = [
        "challenges.cloudflare.com/turnstile",
        "cf-turnstile",
        "data-sitekey",
        "cf_challenge",
    ]
    return any(ind in html for ind in indicators)

def scrape_turnstile_page(url: str) -> str:
    """
    Turnstile is Cloudflare's CAPTCHA system. Unlike reCAPTCHA,
    Turnstile is often invisible — it runs in the background and
    generates a verification token the page needs to function.

    ScrapeUp's unlock=true parameter handles Turnstile automatically:
    it solves the challenge, obtains the cf_clearance cookie, and
    returns the fully unlocked page HTML in a single API call.
    """
    # First try premium proxy
    r = requests.get("https://api.scrapeup.com", params={
        "api_key": API_KEY,
        "url":     url,
        "premium": "true",
    }, timeout=90)

    if r.status_code == 200 and not detect_turnstile(r.text):
        return r.text   # No Turnstile — premium proxy was enough

    # Turnstile detected — use Web Unlocker with CAPTCHA solving
    print("Turnstile detected — activating CAPTCHA solver...")
    r = requests.get("https://api.scrapeup.com", params={
        "api_key": API_KEY,
        "url":     url,
        "unlock":  "true",   # Handles Turnstile, JS challenges, cf_clearance
    }, timeout=120)

    soup = BeautifulSoup(r.text, "html.parser")

    # Verify we got past the challenge
    if detect_turnstile(r.text):
        raise Exception("Turnstile bypass failed — try again or check unlock credits")

    print(f"Turnstile bypassed — got {len(r.text):,} bytes")
    return r.text

html = scrape_turnstile_page("https://turnstile-protected-site.com/data")

How to choose the right approach for your target site

Cloudflare protection level by site type

News & blogs

Easy premium only

E-commerce (general)

Moderate premium + session

Job boards (Indeed, LinkedIn)

Moderate premium + warm-up

Real estate (Zillow, Realtor)

Hard unlock required

Tickets (Ticketmaster)

Very hard unlock + warm-up

The rule of thumb: start with premium=true on everything. If you get blocks (403, 1020, or Cloudflare challenge pages), add session_number and a warm-up hit on the homepage. If blocks persist, escalate to unlock=true. This sequence minimises credit spend while ensuring you find the right level for each target.

Scenario 5: production-grade scraper with automatic escalation

For production use where reliability matters more than per-request cost, the right architecture is automatic tier escalation with block detection. This handles all five scenarios above in a single reusable function:

# production_cf_scraper.py — production-grade Cloudflare scraper
import requests
import time
import random
import logging
import uuid
from bs4 import BeautifulSoup
from typing import Optional

logging.basicConfig(level=logging.INFO)
log = logging.getLogger("cf_scraper")

API_KEY = "your_scrapeup_key"

CF_BLOCK_SIGNALS = [
    "cf-error", "cf_chl_", "Access denied", "Error 1020",
    "challenges.cloudflare.com", "cf-turnstile", "__cf_bm",
]

def is_blocked(html: str) -> bool:
    return any(sig in html for sig in CF_BLOCK_SIGNALS)

def get_session_id(prefix: str = "") -> str:
    return f"{prefix}-{uuid.uuid4().hex[:10]}"[:45]

def scrape(
    url: str,
    session_id: Optional[str] = None,
    warm_up_domain: Optional[str] = None,
) -> str:
    """
    Production Cloudflare scraper with:
      - Session warm-up (optional)
      - Tier escalation: premium → unlock
      - Exponential backoff on failure
      - Block detection before returning
    """
    if not session_id:
        session_id = get_session_id()

    # Optional: warm up the session first
    if warm_up_domain:
        log.info(f"Warming up session {session_id} on {warm_up_domain}")
        requests.get("https://api.scrapeup.com", params={
            "api_key": API_KEY, "url": f"https://{warm_up_domain}/",
            "premium": "true", "session_number": session_id,
        }, timeout=60)
        time.sleep(random.uniform(2, 4))

    configs = [
        {"premium": "true", "session_number": session_id},             # Tier 1
        {"premium": "true", "session_number": get_session_id()},       # Tier 2: fresh session
        {"unlock":  "true"},                                            # Tier 3: full bypass
    ]

    for i, params in enumerate(configs, 1):
        base = {"api_key": API_KEY, "url": url}
        r = requests.get("https://api.scrapeup.com", params={**base, **params}, timeout=120)

        if r.status_code == 200 and not is_blocked(r.text) and len(r.text) > 1000:
            log.info(f"Success on attempt {i} ({list(params.keys())[0]})")
            return r.text

        log.warning(f"Attempt {i} blocked (HTTP {r.status_code})")
        if i < len(configs):
            delay = 2 ** i + random.uniform(0, 2)
            time.sleep(delay)

    raise Exception(f"All {len(configs)} attempts failed for {url}")

# Run it
html = scrape(
    url="https://protected-site.com/products",
    warm_up_domain="protected-site.com",
)
soup = BeautifulSoup(html, "html.parser")
products = soup.select("div.product-item")
print(f"Scraped {len(products)} products")

The block signal detection in that code is important. Cloudflare sometimes returns HTTP 200 with a block page rather than a 403 — the status code is not a reliable success indicator. Checking for known block strings in the response body before returning the data prevents silent failures from entering your pipeline.

Common mistakes and what they actually mean

Rotating user agents alone does not help. Cloudflare does not primarily score on User-Agent — it scores on TLS fingerprint, which does not change when you rotate a header. A Python requests call with a Chrome User-Agent still has a Python TLS fingerprint.

Adding delays between requests also does not help against Cloudflare Bot Management. The bot score is evaluated per-request based on the request's properties, not on request frequency. Slowing down does not change the TLS fingerprint, IP reputation, or header set.

Playwright and Selenium in their default configurations are detected. Cloudflare checks navigator.webdriver (set to true by default in browser automation tools), the Chrome DevTools Protocol signatures in the browser's network behaviour, and specific JavaScript properties that differ between real browsers and headless instances. Use unlock=true rather than trying to patch these manually — Cloudflare updates its detection faster than open-source stealth libraries keep up.

Datacenter proxies with good reputation scores still get blocked on sites using Cloudflare's IP Intelligence. This is a paid Cloudflare feature that uses ASN-level classification to score IP addresses before the request is even processed. The only reliable bypass is genuine residential or ISP proxy IP addresses.

A note on legality: Bypassing Cloudflare to access publicly available data is generally legal under US and EU case law — Cloudflare is a technical barrier, not a legal one, and the hiQ v. LinkedIn ruling protects access to public information. That said, always respect the target site's terms of service, do not scrape at volumes that degrade service, and do not scrape data behind authentication walls. When in doubt, consult legal counsel for your specific use case.

Stop fighting Cloudflare. Let ScrapeUp handle it.

Free accounts include 1,000 credits per month — enough to test all six bypass strategies in this post against real Cloudflare-protected sites. No credit card required.

Get Your Free API Key