Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] PDF directly access does not emit download event #1757

Closed
wizpresso-steve-cy-fan opened this issue Feb 9, 2023 · 2 comments
Closed

Comments

@wizpresso-steve-cy-fan
Copy link

          @mxschmitt This is just an example and I will try to do a download interaction based on button click later. As if I clicked a button to download the file, it also uses `goto` behind the scene, so I think both should behave the same, I just want to do a simplification. 

So far, this seems to be working:

import asyncio
from playwright.async_api import async_playwright
import json
from anyio import Path
from aiofiles.tempfile import TemporaryDirectory

preference = {
    "plugins": {
        "always_open_pdf_externally": True,
    },
}


async def handle(route):
    response = await route.fetch()
    if 'content-type' in response.headers and response.headers['content-type'] == 'application/pdf':
        response.headers['Content-Disposition'] = 'attachment'
    await route.fulfill(response=response, headers=response.headers)


async def main():
    async with TemporaryDirectory() as d:
        preference_dir = Path(d) / "Default"
        await preference_dir.mkdir(777, parents=True, exist_ok=True)
        await (preference_dir / "Preferences").write_text(json.dumps(preference))
        
        async with async_playwright() as p:
            context = await p.chromium.launch_persistent_context(d, headless=False, accept_downloads=True)
            try:
                await context.route("*", handle)
                page = await context.new_page()
                async with page.expect_download() as download_info:
                    try:
                        await page.goto("https://www1.hkexnews.hk/listedco/listconews/gem/2023/0209/2023020900150_c.pdf")
                    except:
                        download = await download_info.value
                        print(await download.path())
            finally:
                await context.close()

asyncio.run(main())

Combining the trick on microsoft/playwright#3509 (comment) and https://stackoverflow.com/a/75201448/3289081

My end goal is to capture the PDF download and send the file stream into stdout/remote pipe.

N.B.
Although I can go without making a persistent context to trigger the PDF download if I go headless, it apparently does not behave well in non-headless mode, so the suggestion at microsoft/playwright#3509 (comment) is not working.

Originally posted by @wizpresso-steve-cy-fan in microsoft/playwright#20771 (comment)

@wizpresso-steve-cy-fan wizpresso-steve-cy-fan changed the title @mxschmitt This is just an example and I will try to do a download interaction based on button click later. As if I clicked a button to download the file, it also uses goto behind the scene, so I think both should behave the same, I just want to do a simplification. [BUG] PDF directly access does not emit download event Feb 9, 2023
@mxschmitt
Copy link
Member

Sounds like a duplicate of microsoft/playwright#7822 to me which is exactly about that PDFs in headed don't end up in downloads.

@aslushnikov
Copy link
Contributor

Merging into microsoft/playwright#7822

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants