-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
knownUrls
processing logic is incorrectly using underscore
#50
Comments
Thanks for reviewing the code. Looks like at least in the second case the check always returns
should most likely be changed to
as Will check this more and fix this/update the tests. |
For the first case, you've got two I can't see any reason to keep the As an aside, I wasn't sure if this was still an actively developed repo, and wanted to make some other changes to it, I've forked the repo, and done a load of work to it, as I wanted to integrate it with headless chrome (as mentioned in #49). I pulled out request from being an explict thing, and created two separate 'pluginable' engines, one for request and one for chromium (via puppeteer) (none of this is up on GitHub yet, just on my machine). Which I'm happy to look at merging/passing over, however, I have also converted the whole thing to typescript (which is what highlighted this issue to me in the first place), and I don't know your feelings on whether you'd be happy to make that big a change. Also some slight changes as a result of pulling out request, so there's some breaking changes for the API. |
Yes, you are right, will check both cases. Most likely we should switch Thank you for pointing out the problematic places and finding these issues.
Not sure about switching to Typescript and adding support for headless Chromium with breaking API changes, these are quite large changes, but they can probably be done at least in a separate experimental branch. I thought about developing the crawler further and supporting other request engines/re-factoring the implementation. But then it might be a good idea to avoid huge potentially destabilizing changes in the 'master' branch and making sure the implementation is well tested and there are unit and end-to-end tests. Once the new implementation is stable enough/is tested it can be released as a new major version. From my side not sure how much time it is possible to dedicate to the development of the crawler, this whole process and releasing a new version can take at least a few months. I will try to find some time to do this. If you need something faster and want to add more experimental features/do more changes, then probably it is a good idea to fork the repository. |
crawler.js#L178 and crawler.js#L197 are using underscore to determine whether the provided url is a known url.
This is incorrect, as underscore doesn't handle objects on it's
contains
method, as such it's always returning false. It should be using the same form as crawler.js#L203.Changing it breaks a unit test though, and I'm not sure how it's going to impact the flow as expected
The text was updated successfully, but these errors were encountered: