Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spider for Natal/RN #192

Merged
merged 18 commits into from
Jun 28, 2020
Merged

Add spider for Natal/RN #192

merged 18 commits into from
Jun 28, 2020

Conversation

victor-torres
Copy link
Contributor

@victor-torres victor-torres commented Jun 26, 2020

This pull request takes over development started by @dannnylo on #60
This should fix #189

@victor-torres victor-torres marked this pull request as ready for review June 26, 2020 14:39
Copy link
Collaborator

@jvanz jvanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jvanz
Copy link
Collaborator

jvanz commented Jun 27, 2020

@giuliocc how about review this PR ? ;)

@jvanz jvanz added enhancement Melhoria, novo recurso ou ferramenta spider Adiciona robô raspador para município(s) labels Jun 27, 2020
@ogecece
Copy link
Member

ogecece commented Jun 27, 2020

Sure. Thanks :)

Copy link
Member

@ogecece ogecece left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is very clean and concise. Very good work! Congratulations for that 🎉.

Regarding documentation I think the commits could be improved. There are some commits that don't provide enough information about what is being done and why. Some of the "Refactor" commits, for example. I think they could be at least squashed.

As a general tip, try to make the most out of the commit messages. It will make it easier for you or other people try to understand what was done here in the future.

Since we are talking about the future, if there is any information that you think is important about the implementation and/or the system you're scraping (and it can't be expressed in the code) use docstrings (file or methods) to convey that information. Spiders deal with systems we don't have control of. Documenting is key when trying to solve a bug in this case or make any refactoring easier.

Sorry for the big text, hope I was able to give some good insight :)

EDIT: Just realized you work at Scrapinghub 🤦 . I will not delete what I said about spiders here since it can be helpful for other people. But you can consider it irrelevant 😝

processing/data_collection/gazette/spiders/rn_natal.py Outdated Show resolved Hide resolved
processing/data_collection/gazette/spiders/rn_natal.py Outdated Show resolved Hide resolved
processing/data_collection/gazette/spiders/rn_natal.py Outdated Show resolved Hide resolved
@victor-torres
Copy link
Contributor Author

victor-torres commented Jun 27, 2020

Thank you for your review, @giuliocc.

I've tried not to make substantial changes to the original author's source code.

Also, this is a very simple spider. I've sketched a longer and more organized (read complex) version of this spider but I don't think it's worth doing it for a spider that's so small and which complexity is very low.

In this case, it doesn't pay the effort to write and maintain a more complex spider in the long run. Especially because when websites change, they probably completely break the old extraction logic. If that happens with small spiders like this, it's often quicker to write a new one from scratch than trying to fix the old one :)

Regarding documentation, well, there might be a knowledge bias playing its role here, but I think the spider is pretty self-documented. It has ~30 lines and it's very easy to see what's going on here. Adding docs would be a little bit redundant in this case, but again, that's my opinion and it might be biased.

I could have dedicated more attention to the commit messages, but feel free to squash it before the merge if it gets approved.

Wish I had more time to dedicate to this project :)

Cheers,
Victor Torres

@ogecece
Copy link
Member

ogecece commented Jun 27, 2020

No problems, if a comment is just providing noise it should not be made :) And I agree with you that a simpler spider is generally better than a complex one. It is good as it is, nicely done!

@ogecece
Copy link
Member

ogecece commented Jun 28, 2020

@jvanz I approved the PR.

About the commits, I don't have write permissions so I leave it up to you to decide the best course of action :)

@jvanz jvanz merged commit 304cf0e into okfn-brasil:master Jun 28, 2020
@jvanz
Copy link
Collaborator

jvanz commented Jun 28, 2020

Thanks for your first contribution @victor-torres ! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Melhoria, novo recurso ou ferramenta spider Adiciona robô raspador para município(s)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Natal spider
4 participants