Allow custom PageMethod callbacks #318

jdemaeyer · 2024-09-12T08:47:40Z

Hi @elacuesta, still loving this library! :)

I often find myself having to deal with the Playwright page in my request callback because I need to perform some page actions involving loops or conditionals, which can't currently be done with the playwright_page_methods list. E.g. like this "click the 'load more' button while its visible" logic, mixing parsing with response preparation:

import scrapy
from playwright.async_api import expect


class PageActionSpider(scrapy.Spider):
    name = "pageaction"

    def start_requests(self):
        yield scrapy.Request(
            "https://example.com/",
            meta={
                "playwright": True,
                "playwright_include_page": True,
            },
        )

    async def parse(self, response):
        page = response.meta["playwright_page"]
        load_button = page.locator(".loadMore")
        loading_overlay = page.locator(".loadingOverlay")
        while (await load_button.is_visible()):
            await load_button.click()
            await expect(loading_overlay).to_be_hidden()
        sel = scrapy.Selector(text=await page.content())
        await page.close()
        print(sel.css(".interestingData").getall())

This PR allows setting a callable instead of a string as PageMethod.method, which will then be called with the page as its first argument, so that all the page-related async actions can again be handled by the download handler and I don't have to worry about closing the page myself or using a custom Selector instead of response.css:

import scrapy
from playwright.async_api import expect
from scrapy_playwright.page import PageMethod


class PageActionSpider(scrapy.Spider):
    name = "pageaction"

    def start_requests(self):
        yield scrapy.Request(
            "https://example.com/",
            meta={
                "playwright": True,
                "playwright_page_methods": [
                    PageMethod(self.extend_feed),
                ],
            },
        )

    async def extend_feed(self, page):
        load_button = page.locator(".loadMore")
        loading_overlay = page.locator(".loadingOverlay")
        while (await load_button.is_visible()):
            await load_button.click()
            await expect(loading_overlay).to_be_hidden()

    def parse(self, response):
        print(response.css(".interestingData").getall())

elacuesta · 2024-09-18T23:32:42Z

Amazing, thank you for the contribution @jdemaeyer 😄

I've added a simple test, I'll also mention it in the docs shortly.

scrapy_playwright/page.py

elacuesta · 2024-11-06T13:01:52Z

Thank you @jdemaeyer!

jdemaeyer · 2024-11-06T14:50:22Z

No thank you!

jdemaeyer and others added 2 commits September 12, 2024 10:34

Allow custom PageMethod callbacks

b3ad70d

Add test for callable page methods

f6201db

Adjust typing for PageMethod

5cf130c

elacuesta reviewed Sep 18, 2024

View reviewed changes

scrapy_playwright/page.py Outdated Show resolved Hide resolved

elacuesta added 4 commits September 18, 2024 20:44

Remove trailing commas (thank you pylint)

f1fa327

Update docstring

b762584

Update docs, tests & types

0321d97

Remove unused import

28d8ffc

elacuesta merged commit 5500a6e into scrapy-plugins:main Nov 6, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow custom PageMethod callbacks #318

Allow custom PageMethod callbacks #318

jdemaeyer commented Sep 12, 2024 •

edited

Loading

elacuesta commented Sep 18, 2024

elacuesta commented Nov 6, 2024

jdemaeyer commented Nov 6, 2024

Allow custom PageMethod callbacks #318

Allow custom PageMethod callbacks #318

Conversation

jdemaeyer commented Sep 12, 2024 • edited Loading

elacuesta commented Sep 18, 2024

elacuesta commented Nov 6, 2024

jdemaeyer commented Nov 6, 2024

jdemaeyer commented Sep 12, 2024 •

edited

Loading