Add spider for Natal/RN #192

victor-torres · 2020-06-26T13:31:41Z

This pull request takes over development started by @dannnylo on #60
This should fix #189

jvanz

LGTM

jvanz · 2020-06-27T17:44:16Z

@giuliocc how about review this PR ? ;)

ogecece · 2020-06-27T21:59:12Z

Sure. Thanks :)

ogecece

The code is very clean and concise. Very good work! Congratulations for that 🎉.

Regarding documentation I think the commits could be improved. There are some commits that don't provide enough information about what is being done and why. Some of the "Refactor" commits, for example. I think they could be at least squashed.

As a general tip, try to make the most out of the commit messages. It will make it easier for you or other people try to understand what was done here in the future.

Since we are talking about the future, if there is any information that you think is important about the implementation and/or the system you're scraping (and it can't be expressed in the code) use docstrings (file or methods) to convey that information. Spiders deal with systems we don't have control of. Documenting is key when trying to solve a bug in this case or make any refactoring easier.

Sorry for the big text, hope I was able to give some good insight :)

EDIT: Just realized you work at Scrapinghub 🤦 . I will not delete what I said about spiders here since it can be helpful for other people. But you can consider it irrelevant 😝

processing/data_collection/gazette/spiders/rn_natal.py

Co-authored-by: Giulio Carvalho <[email protected]>

victor-torres · 2020-06-27T23:26:36Z

Thank you for your review, @giuliocc.

I've tried not to make substantial changes to the original author's source code.

Also, this is a very simple spider. I've sketched a longer and more organized (read complex) version of this spider but I don't think it's worth doing it for a spider that's so small and which complexity is very low.

In this case, it doesn't pay the effort to write and maintain a more complex spider in the long run. Especially because when websites change, they probably completely break the old extraction logic. If that happens with small spiders like this, it's often quicker to write a new one from scratch than trying to fix the old one :)

Regarding documentation, well, there might be a knowledge bias playing its role here, but I think the spider is pretty self-documented. It has ~30 lines and it's very easy to see what's going on here. Adding docs would be a little bit redundant in this case, but again, that's my opinion and it might be biased.

I could have dedicated more attention to the commit messages, but feel free to squash it before the merge if it gets approved.

Wish I had more time to dedicate to this project :)

Cheers,
Victor Torres

ogecece · 2020-06-27T23:42:32Z

No problems, if a comment is just providing noise it should not be made :) And I agree with you that a simpler spider is generally better than a complex one. It is good as it is, nicely done!

ogecece · 2020-06-28T01:23:49Z

@jvanz I approved the PR.

About the commits, I don't have write permissions so I leave it up to you to decide the best course of action :)

jvanz · 2020-06-28T20:21:16Z

Thanks for your first contribution @victor-torres ! :)

dannnylo added 5 commits June 26, 2020 10:41

Add spider

b31e788

Mark as done

ce86356

Changed to BaseGazetteSpider

5c6d0c5

Fix some mistakes

6cb2ee2

Change municipality_id to territory_id

4b0b58d

victor-torres force-pushed the rn_natal branch from 2dd5cef to 4b0b58d Compare June 26, 2020 13:48

victor-torres added 10 commits June 26, 2020 10:53

Refactor

2092178

Organize imports

2de8b86

Move from extract_first() to get()

462293d

Refactor

a3a7b3c

Fix bug with month numbers with single digit

92f284f

Refactor

b19ec50

Remove unused imports

216ff64

Make spider more resilient

d20b976

Update pull request number

a64b1d7

Black

967d42c

victor-torres marked this pull request as ready for review June 26, 2020 14:39

victor-torres mentioned this pull request Jun 26, 2020

Add spider for Natal/RN #60

Closed

jvanz approved these changes Jun 27, 2020

View reviewed changes

jvanz assigned victor-torres Jun 27, 2020

jvanz added enhancement Melhoria, novo recurso ou ferramenta spider Adiciona robô raspador para município(s) labels Jun 27, 2020

ogecece requested changes Jun 27, 2020

View reviewed changes

victor-torres and others added 2 commits June 27, 2020 19:59

Update processing/data_collection/gazette/spiders/rn_natal.py

5a3f108

Co-authored-by: Giulio Carvalho <[email protected]>

Update processing/data_collection/gazette/spiders/rn_natal.py

168db23

Co-authored-by: Giulio Carvalho <[email protected]>

Move territory id to a spider constant to follow project pattern

c2b5243

ogecece approved these changes Jun 28, 2020

View reviewed changes

jvanz merged commit 304cf0e into okfn-brasil:master Jun 28, 2020

victor-torres deleted the rn_natal branch June 29, 2020 00:52

victor-torres mentioned this pull request Jun 29, 2020

Should we make dateutil.rrule a development pattern? #193

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add spider for Natal/RN #192

Add spider for Natal/RN #192

victor-torres commented Jun 26, 2020 •

edited

Loading

jvanz left a comment

jvanz commented Jun 27, 2020

ogecece commented Jun 27, 2020

ogecece left a comment •

edited

Loading

victor-torres commented Jun 27, 2020 •

edited

Loading

ogecece commented Jun 27, 2020

ogecece commented Jun 28, 2020

jvanz commented Jun 28, 2020

Add spider for Natal/RN #192

Add spider for Natal/RN #192

Conversation

victor-torres commented Jun 26, 2020 • edited Loading

jvanz left a comment

Choose a reason for hiding this comment

jvanz commented Jun 27, 2020

ogecece commented Jun 27, 2020

ogecece left a comment • edited Loading

Choose a reason for hiding this comment

victor-torres commented Jun 27, 2020 • edited Loading

ogecece commented Jun 27, 2020

ogecece commented Jun 28, 2020

jvanz commented Jun 28, 2020

victor-torres commented Jun 26, 2020 •

edited

Loading

ogecece left a comment •

edited

Loading

victor-torres commented Jun 27, 2020 •

edited

Loading