Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lua page script timeouts when trying to render binary pages #51

Open
lopuhin opened this issue Apr 29, 2016 · 5 comments
Open

Lua page script timeouts when trying to render binary pages #51

lopuhin opened this issue Apr 29, 2016 · 5 comments

Comments

@lopuhin
Copy link
Contributor

lopuhin commented Apr 29, 2016

When the page is not an html page but binary content (we can not know for sure when extracting links), the Lua script timeouts (even without HH enabled).
No only we do not download such pages, but this also slows down the whole crawl a lot.

@kmike
Copy link
Contributor

kmike commented Apr 29, 2016

what happens? is it because timeout is not large enough to download a file, or is it a problem because Splash doesn't handle non-html splash:go?

@lopuhin
Copy link
Contributor Author

lopuhin commented Apr 29, 2016

It's the latter - just plain

function main(splash)
  local url = splash.args.url
  assert(splash:go{url})
end

fails timeouts

@lopuhin
Copy link
Contributor Author

lopuhin commented Apr 29, 2016

But it looks like it's not ANY binary content causes splash:go to fail, will try to narrow it down.

@kmike
Copy link
Contributor

kmike commented Apr 29, 2016

Splash doesn't handle unsupported content now (http://doc.qt.io/archives/qt-5.5/qwebpage.html#forwardUnsupportedContent-prop), to fix it we need to add an API for that to Splash

@lopuhin
Copy link
Contributor Author

lopuhin commented Apr 29, 2016

The link was extracted from <a href="URL" id="ctl00_MasterMain_Hot_rpHot_ctl10_navImg" onclick="return hs.expand(this, {captionId: \'caption1\'})"> element, so all is correct (I don't think we should drop links with onclick if we don't click on them).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants