Error: Target page, context or browser has been closed #44

EthanZ1996 · 2021-12-27T09:30:34Z

Hi, elacuesta,

I use your handler in my Scrapy and it runs well and can crawl the information I need. However, some error occurs before the process in item and pipeline. Here is an example:

2021-12-27 16:50:26 [asyncio] ERROR: Task exception was never retrieved
future: <Task finished name='Task-184' coro=<Route.continue_() done, defined at /home/ethanz/.local/lib/python3.8/site-packages/playwright/async_api/_generated.py:710> exception=Error('Target page, context or browser has been closed')>
Traceback (most recent call last):
  File "/home/ethanz/.local/lib/python3.8/site-packages/playwright/async_api/_generated.py", line 748, in continue_
    await self._async(
  File "/home/ethanz/.local/lib/python3.8/site-packages/playwright/_impl/_network.py", line 239, in continue_
    await self._channel.send("continue", cast(Any, overrides))
  File "/home/ethanz/.local/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 39, in send
    return await self.inner_send(method, params, False)
  File "/home/ethanz/.local/lib/python3.8/site-packages/playwright/_impl/_connection.py", line 63, in inner_send
    result = next(iter(done)).result()
playwright._impl._api_types.Error: Target page, context or browser has been closed

Sometimes, this error occurs 5 or 6 times per running, but I also met the situation that no errors. In addition, the difference among these errors are the numbers of the task, that is Task-180, Task-181, Task-182, et al.

I guess the error is about the coroutine or asynsio but I am not familiar with them. Do you know what is going on? Do need to change any settings? Thanks! BTW, I am using VM, ubuntu 20.04 on Windows 10.

Regards,
Ethan

The text was updated successfully, but these errors were encountered:

lime-n · 2022-01-06T15:25:37Z

I have received the same message when trying to get into the next few pages of a url. I'll provide some further information on my approach here:

I'm building a scraper that goes into the links for each post and then gets the next page, and keeps doing this then finally grabs the info from that page it linked too.

import hashlib
import logging
from pathlib import Path
from typing import Generator, Optional
from scrapy import Spider
from scrapy.crawler import CrawlerProcess
from scrapy.http.response import Response
import scrapy
import logging
from scrapy_playwright.page import PageCoroutine

cookies = {
    'VISITOR_ID': '3c553849d1dc612f60515f04d9316813',
    'INEU': '1',
    'PJBJOBSEEKER': '1',
    'LOCATIONJOBTYPEID': '3079',
    'AnonymousUser': 'MemberId=c3e94bc3-3fcf-423d-bc25-e5a5818cd2b9&IsAnonymous=True',
    'visitorid': '46315422-d835-4a38-b428-4b9c5d6243d3',
    's_fid': '6468EA5E39AF374B-2F7C971BB196D965',
    'sc_vid': '7c12948068d2cb92c1f1622aeaabc62d',
    'listing_page__qualtrics': 'empty',
    'SsaSessionCookie': 'fea6276c-cee9-43b3-9d57-9f00d6bcd32b',
    's_cc': 'true',
    'SessionCookie': '1d7806bc-4b79-47e9-839d-2d94ec224abb',
    'FreshUserTemp': 'https://www.jobsite.co.uk/',
    'bm_mi': '6BF6AA183A047F87BAC664C92ACA8E41~1Fku4TDwEBxz2+fwhUGUWjUhP3vaQED08Ala3VmmARyewb9/OjQUmvPEWw88MUA7USOzt+0MSpdyPmY/3N+iY08InyOy4DnNHgTq88AWwBigf1XhufLstD/eUhUJBgXQRSa1rVlO5SB5mlkhezcDRmv8bL+Gt4NZdsjVC4ZlVc3ptkbKY9cBB65yW2tyjZBLtxsQnz/rFJXo4a9PTKOvF/Betnb8S/XQrpNDsXOojdhtQrrU9V6XSziX+tHXT6xj1osB8XQtm0VGC7L6+4+bgQ==',
    'gpv_pn': '%2FJobSearch%2FResults.aspx',
    'ak_bmsc': '77506B6768E0463D238EEE24AE5B3A72~000000000000000000000000000000~YAAQFsITAnaZtJ99AQAA4LdCLw4PU4xFjE3/FbxxIG7pSjNqX9TClutWaS1MLKKy/9hAM9d6bcEN5Mr9Fbb8+1Jy3rrCsFO5TvxstcVAjaGbbvDCF/mXxeqJQAU1h/cvrZEH68FZyDuslnE+Ae7DuCs1QmNkNP6+0dvA4GT+/MENayQQk8szCo8ch3IfCK1j5/JL+jjbb04pmnpibV3XvUcLeqTJMY1IG9PlTuBIFWF8gXREI+ug2bb8pL+r7T1v1s9gVmfo633B0BoVcXIfWcDgtyFJjFNVayz2lHxUdtnInaWvi1ubzsjQ7cfUDdHTorHsJ0rP1RXB0utZ80GIBNbGdAzd1jkWy9BMIqdIcbBXM4+rCf3fbPw+qui+0Sr4RIxM5N41mvrOQ6W8s9bPR7GySeJr/2HGSmxTjf+4QDVY',
    'TJG-Engage': '1',
    'CONSENTMGR': 'c1:0%7Cc2:0%7Cc3:0%7Cc4:0%7Cc5:0%7Cc6:0%7Cc7:0%7Cc8:0%7Cc9:1%7Cc10:0%7Cc11:0%7Cc12:0%7Cc13:0%7Cc14:0%7Cc15:0%7Cts:1641470409597%7Cconsent:true',
    'utag_main': 'v_id:017e24e57d970023786b817ac51005079001e071009e2_sn:16$_se:5$_ss:0$_st:1641472209641$ses_id:1641470390747%3Bexp-session$_pn:3%3Bexp-session$PersistedFreshUserValue:0.1%3Bexp-session$PersistedClusterId:OTHER--9999%3Bexp-session',
    's_ppvl': '%2FJobSearch%2FResults.aspx%2C13%2C13%2C741%2C409%2C741%2C1600%2C900%2C2%2CL',
    's_ppv': '%2FJobSearch%2FResults.aspx%2C100%2C13%2C6616%2C423%2C741%2C1600%2C900%2C2%2CL',
    's_sq': 'stepstone-jobsite-uk%3D%2526c.%2526a.%2526activitymap.%2526page%253D%25252FJobSearch%25252FResults.aspx%2526link%253DNext%2526region%253Dapp-unifiedResultlist-db2486f4-fb7d-469f-8cfd-f31a3eafb692%2526pageIDType%253D1%2526.activitymap%2526.a%2526.c%2526pid%253D%25252FJobSearch%25252FResults.aspx%2526pidt%253D1%2526oid%253Dhttps%25253A%25252F%25252Fwww.jobsite.co.uk%25252Fjobs%25253Fpage%25253D3%252526action%25253Dpaging_next%2526ot%253DA',
    'bm_sv': '4C178898519D2A4ADEBB840C0B682999~sanqWSDI/ZT0KWrdWhNRc7UtVtqAZ61oPSoLv/MnCD1e0a7vUTSzpggIj9dt/bN4nXEmOaM48hugBFRwdBveJlobrjEcMZ1gHS3S3KXYaHfZPjq6IIf8/Fs1QUlg0s7oLp6DsZbkAWWOnNQiI/uaq7XT7EHnd+n/46ra5jgwfhA=',
    '_abck': '508823E0A454CEF8D6A48101DB66BDB8~0~YAAQFsITAjWdtJ99AQAAr21ELwcAj2rhoIgnvsHOkxQPuREHCA9mDMHsyk68FBhxQ0Jto+6FqaEHJkrrVEUGuYveQAjVJ7CGS+2ajmbcVkG/KIQn8ttCaGvn58jkwzpWm6Fjx4FsLBJyLsceRWSqw5rV2ezEeLrBd/ZToRMpdZop4yqixh5vquandn+h9ysqacaeHPO90VnvctIfvKTUvY5GrrHubGVMkD9/elxRI5whsBdH7ovATyGsLEgYx+e604lY2sQIahSvweclTI4Ud1hTQbSQTebWs52PiYdSU5wq9+YC/7Sr0JuQZCUMyGGqZgtXpfAdc9LDa8X3JfcdO25EZQHxsfEfT/pp7tjbxaXD/pgun9ozymRMy/hBuCj5/Bfln/LzAqOsdDv7q6WVerNr6qivHGDE0m2/~-1~-1~-1',
    'bm_sz': '747D15CEB59AC2C0003BD8479C4BF482~YAAQFsITAjadtJ99AQAAr21ELw5vDH+lMq9NICfxNXHGiXcPcBSrWov2Hy8Y0wgN/OAL7NJWfJ7Lkum/OqG3WNj9/+e8oJhNRQ96ksn+zk0N0gNnoPhUv46am0wktHih1PPfYRlqdPSQSdgE92eHwG3CsFaSeRROKu/1q89aNDH4+JBUk/TDdTmeBqsvJffzvP0S1gAv54dOecx0z2LSW6PEj0e0VtWqmBjFSQxCqH8LZ4r7TwqDpxAKArzWGDMlqR/xZcWvAm8ijUTG+mIuF3N7aBEDGdB90wdyaJGt2CGP3VinhBNtxV7vT8ebY9oWu2rJ+UmugGgJ/dasQP8=~3424837~3354936',
    'EntryUrl': '/jobs?page=3&action=paging_next',
    'SearchResults': '96094529,96094530,96094528,96094527,96094526,96094525,96094524,96094522,96094521,96094520,96094519,96094517,96094518,96094514,96094513,96094509,96094510,96094511,96094508,96094507,96094506,96094503,96094502,96094500,96094499',
}
headers = {
    'authority': 'www.jobsite.co.uk',
    'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"',
    'accept': 'application/json',
    'content-type': 'application/json',
    'sec-ch-ua-mobile': '?0',
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',
    'sec-ch-ua-platform': '"macOS"',
    'origin': 'https://www.jobsite.co.uk',
    'sec-fetch-site': 'same-origin',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://www.jobsite.co.uk/jobs?page=3&action=paging_next',
    'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
}


class JobSpider(scrapy.Spider):
    name = 'job_pages'
    start_urls = ['https://www.jobsite.co.uk/jobs/Degree-Accounting-and-Finance']
    
    custom_settings = {
        'USER_AGENT':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.2 Safari/605.1.15'
    }
    def start_requests(self):
        for url in self.start_urls:
            yield scrapy.Request(
                url = url,
                callback = self.parse,
                dont_filter = True,
                meta= dict(
                    playwright = True,
                    playwright_include_page = True,
                    playwright_page_coroutines = [
                        PageCoroutine('wait_for_selector', 'div.row.job-results-row')
                        ]
                )
            )
    def parse(self, response):
       stuff = response.xpath("//div[@class='ResultsSectionContainer-sc-gdhf14-0 kteggz']/div[@class='Wrapper-sc-11673k2-0 gIBPSk']")
       
       for items in stuff:
           for jobs in items.xpath('//article//div//div[position() mod 7 = 6]/a//@href'):
               yield response.follow(
                   jobs, 
                   callback = self.parse_jobs,
                   meta={
                    "playwright": True,
                    "playwright_include_page": True})

       next_page = response.xpath('(//div)[position() mod 5=3][83]/a[2]//@href').get()
       if next_page:
           yield scrapy.Request(
               url = next_page, 
               callback = self.parse,
               meta=dict(
                    playwright= True,
                    playwright_include_page= True,
                    playwright_page_coroutines=[PageCoroutine('wait_for_selector', 'div.row.job-results-row')]
                        
                        )
                            )


    async def parse_jobs(self, response):
        url_sha256 = hashlib.sha256(response.url.encode("utf-8")).hexdigest()
        page = response.meta["playwright_page"]
        await page.screenshot(
            path=Path(__file__).parent / "job_test" / f"{url_sha256}.png", full_page=True
        )
        await page.close()
        yield {
            "url": response.url,
            "title": response.xpath("//h1[@class='brand-font']//text()").get(),
            "price": response.xpath("//li[@class='salary icon']//div//text()").get(),
            "organisation": response.xpath("//a[@id='companyJobsLink']//text()").get(),
            "image": f"job_test/{url_sha256}.png",
        }
if __name__ == "__main__":
    process = CrawlerProcess(
        settings={
            "TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
            "DOWNLOAD_HANDLERS": {
                "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
                "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
            },
            "CONCURRENT_REQUESTS": 32,
            "CLOSESPIDER_ITEMCOUNT": 100,
            "FEED_URI":'jobs.jl',
            "FEED_FORMAT":'jsonlines',
        }
    )
    process.crawl(JobSpider)
    logging.getLogger("scrapy.core.engine").setLevel(logging.WARNING)
    logging.getLogger("scrapy.core.scraper").setLevel(logging.WARNING)
    process.start()

Here's the error output:

    result = next(iter(done)).result()
playwright._impl._api_types.Error: Target page, context or browser has been closed
2022-01-06 15:23:14 [scrapy-playwright] INFO: Closing browser
2022-01-06 15:23:14 [scrapy-playwright] INFO: Closing browser
2022-01-06 15:23:14 [scrapy-playwright] DEBUG: Browser context closed: 'default'

elacuesta · 2022-02-01T19:22:44Z

Please, provide a minimal, reproducible example (the provided code sample is hardly minimal).

lime-n · 2022-02-28T20:39:49Z

@elacuesta
It's been a while however I remember clearly that there was an error with my script as opposed to scrapy_playwright. It's a long-shot, but I presume the author of the post likely has a similar issue. In that their script may be problem.

elacuesta · 2022-03-27T00:05:07Z

Upon closer inspection this seems like a duplicate of #15, which I'm aiming to solve at #74. Feel free to reopen with more information if that's not the case.
I would suggest defining a request errback or a spider middleware with a process_spider_exception method to recover from these errors.

elacuesta added the needs more info label Feb 1, 2022

elacuesta closed this as completed Mar 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: Target page, context or browser has been closed #44

Error: Target page, context or browser has been closed #44

EthanZ1996 commented Dec 27, 2021

lime-n commented Jan 6, 2022

elacuesta commented Feb 1, 2022

lime-n commented Feb 28, 2022 •

edited

Loading

elacuesta commented Mar 27, 2022 •

edited

Loading

Error: Target page, context or browser has been closed #44

Error: Target page, context or browser has been closed #44

Comments

EthanZ1996 commented Dec 27, 2021

lime-n commented Jan 6, 2022

elacuesta commented Feb 1, 2022

lime-n commented Feb 28, 2022 • edited Loading

elacuesta commented Mar 27, 2022 • edited Loading

lime-n commented Feb 28, 2022 •

edited

Loading

elacuesta commented Mar 27, 2022 •

edited

Loading