Scrapy

scrapy-playwright adds browser rendering to Scrapy. rayobrowse gives it a stealth-fingerprinted browser instead of the default detectable Chromium.

Setup

Install dependencies
Terminal window
```
pip install scrapy scrapy-playwright
```
You don’t need playwright install — the browser runs inside the rayobrowse container.

Configure settings.py

PLAYWRIGHT_CDP_URL = "ws://localhost:9222/connect?headless=true&os=windows"

DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

PLAYWRIGHT_PROCESS_REQUEST_HEADERS = None

Write your spider

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = ["https://quotes.toscrape.com/js/"]

    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url,
                meta={"playwright": True, "playwright_include_page": True},
            )

    async def parse(self, response):
        page = response.meta["playwright_page"]
        await page.close()

        for quote in response.css("div.quote"):
            yield {
                "text": quote.css("span.text::text").get(),
                "author": quote.css("small.author::text").get(),
            }

Run
Terminal window
```
scrapy crawl quotes -o quotes.json
```

Using a proxy

PLAYWRIGHT_CDP_URL = (
    "ws://localhost:9222/connect"
    "?headless=true"
    "&os=windows"
    "&proxy=http://user:pass@proxy.example.com:8080"
)

Remote mode

PLAYWRIGHT_CDP_URL = (
    "ws://your-server.example.com/connect"
    "?headless=true"
    "&os=windows"
    "&api_key=your-secret-key"
)

A ready-to-run example project is available at integrations/scrapy/ in the GitHub repository.