-
-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add spider for Natal/RN #192
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@giuliocc how about review this PR ? ;) |
Sure. Thanks :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is very clean and concise. Very good work! Congratulations for that 🎉.
Regarding documentation I think the commits could be improved. There are some commits that don't provide enough information about what is being done and why. Some of the "Refactor" commits, for example. I think they could be at least squashed.
As a general tip, try to make the most out of the commit messages. It will make it easier for you or other people try to understand what was done here in the future.
Since we are talking about the future, if there is any information that you think is important about the implementation and/or the system you're scraping (and it can't be expressed in the code) use docstrings (file or methods) to convey that information. Spiders deal with systems we don't have control of. Documenting is key when trying to solve a bug in this case or make any refactoring easier.
Sorry for the big text, hope I was able to give some good insight :)
EDIT: Just realized you work at Scrapinghub 🤦 . I will not delete what I said about spiders here since it can be helpful for other people. But you can consider it irrelevant 😝
Co-authored-by: Giulio Carvalho <[email protected]>
Co-authored-by: Giulio Carvalho <[email protected]>
Thank you for your review, @giuliocc. I've tried not to make substantial changes to the original author's source code. Also, this is a very simple spider. I've sketched a longer and more organized (read complex) version of this spider but I don't think it's worth doing it for a spider that's so small and which complexity is very low. In this case, it doesn't pay the effort to write and maintain a more complex spider in the long run. Especially because when websites change, they probably completely break the old extraction logic. If that happens with small spiders like this, it's often quicker to write a new one from scratch than trying to fix the old one :) Regarding documentation, well, there might be a knowledge bias playing its role here, but I think the spider is pretty self-documented. It has ~30 lines and it's very easy to see what's going on here. Adding docs would be a little bit redundant in this case, but again, that's my opinion and it might be biased. I could have dedicated more attention to the commit messages, but feel free to squash it before the merge if it gets approved. Wish I had more time to dedicate to this project :) Cheers, |
No problems, if a comment is just providing noise it should not be made :) And I agree with you that a simpler spider is generally better than a complex one. It is good as it is, nicely done! |
@jvanz I approved the PR. About the commits, I don't have write permissions so I leave it up to you to decide the best course of action :) |
Thanks for your first contribution @victor-torres ! :) |
This pull request takes over development started by @dannnylo on #60
This should fix #189