Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strategies for anti-ad blocker web pages? #177

Open
phochste opened this issue Nov 7, 2024 · 0 comments
Open

Strategies for anti-ad blocker web pages? #177

phochste opened this issue Nov 7, 2024 · 0 comments

Comments

@phochste
Copy link

phochste commented Nov 7, 2024

What strategies I can use for web pages that use "anti-ad blocker" tactics? E.g. when trying to retrieve the metadata for https://www.reuters.com/investigates/special-report/usa-riteaid-software/ I get

(3)(+0486231): POST /web?single=1 from 127.0.0.1 "node-fetch/1.0 (+https://github.com/bitinn/node-fetch)"

(3)(+0000019): HTTP GET https://www.reuters.com/investigates/special-report/usa-riteaid-software/

(1)(+0000172): Error: HTTP request to https://www.reuters.com/investigates/special-report/usa-riteaid-software/ rejected with status 401


  InternalServerError: An error occurred retrieving the document
      at Object.throw (/Users/hochsten/github.com/zotero/translation-server/node_modules/koa/lib/context.js:97:11)
      at WebSession.handleURL (/Users/hochsten/github.com/zotero/translation-server/src/webSession.js:219:19)
      at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
      at async Object.handle (/Users/hochsten/github.com/zotero/translation-server/src/webEndpoint.js:85:3)
      at async bodyParser (/Users/hochsten/github.com/zotero/translation-server/node_modules/koa-bodyparser/index.js:95:5)
      at async module.exports (/Users/hochsten/github.com/zotero/translation-server/src/cors.js:22:3)

When I inspect the page with CURL curl https://www.reuters.com/investigates/special-report/usa-riteaid-software/ I get a 401 indeed with content:

<html><head><title>reuters.com</title><style>#cmsg{animation: A 1.5s;}@keyframes A{0%{opacity:0;}99%{opacity:0;}100%{opacity:1;}}</style></head><body style="margin:0"><p id="cmsg">Please enable JS and disable any ad blocker</p><script data-cfasync="false">var dd={'rt':'c','cid':'AHrlqAAAAAMA663TFE6abY4AncHwgw==','hsh':'2013457ADA70C67D6A4123E0A76873','t':'bv','s':46356,'e':'f821f4289bd2422ec97e8f6d79d54fd4a41b74ccbb9abab0df58bc633802e6e3','host':'geo.captcha-delivery.com','cookie':'l0gxS3PlDjwTQb88t6e_Izbuj0VFzjmcr95~8KEkQ_fCvBakUNEPGMlKibBTGkRwsFcm1sa4EKb3~5_mmnt6cImKDxRdqBdiOSY8sQ50dxQECrruBobI44~BBVuKdLky'}</script><script data-cfasync="false" src="https://ct.captcha-delivery.com/c.js"></script></body></html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant