Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we make dateutil.rrule a development pattern? #193

Closed
victor-torres opened this issue Jun 29, 2020 · 4 comments
Closed

Should we make dateutil.rrule a development pattern? #193

victor-torres opened this issue Jun 29, 2020 · 4 comments

Comments

@victor-torres
Copy link
Contributor

During code review of pull request #192, @giuliocc suggested the use of dateutil.rrule instead of the built-in function range to generate date candidates used to search for Gazettes. I'm creating this issue to discuss the pros and cons of using it.

[...] you can express your intent much more and have more power on the upper and lower date limits. Here is the examples page. Look for "MONTHLY".

Originally posted by @giuliocc in #192

Indeed, it has a nice interface. We just need to discuss some details before deciding if we're turning it into a development pattern or not.

We'll likely have to use datetime objects (date and datetime) to make use of rrule. We need to agree on how we're importing those objects. Otherwise, we would be generating more code diversity.

  • import datetime
  • import datetime as dt
  • from datetime import date, datetime

Also, we need to consider if turning this into a development pattern would make it more difficult for first-time contributors or beginners to contribute to the project.

@jvanz
Copy link
Collaborator

jvanz commented Jul 3, 2020

I like the idea of using rrule. But I do not think we need to write this in stone. We already have a low number of contributors and the potential contributors, many times, are not developers. Thus, between using range or not, I choose the approach which make the contribution easier. This does not mean that we cannot suggest improvements in the review process. Instead of You're doing wrong, check the development patterns we can have an approach like: Hey! Awesome work! May I suggest an improvement? Try this in this line using rrule. This is how you can do it... .

About the imports, I believe this is a little bit pedantic. I prefer what makes the code more easy to read. Which can change in different situations. Furthermore, I do think this problem does not exists yet. We do not have regular contributions for that happens. Again, nothing prevent us to make suggestion during the review process. If I would choose I will with the from sth import sth

What do you think?

@victor-torres
Copy link
Contributor Author

Totally agree, @jvanz. That's my thought.

@ogecece
Copy link
Member

ogecece commented Jul 8, 2020

Sorry for the delayed response. I had personal problems.

First, to be sure we are on the same page, I believe the correct theme we are discussing here is a "Style Guide". When talking about both rrule and from datetime import datetime.

I couldn't find any research on the specific topic (effects of style guides for contributors in open source projects) that could make discussing about it pretty clear :/

Although, this is a good read and their systematic review gives way too many insights about documentation, lack of domain expertise, etc. I'm loosely trying to apply some of the stuff they talk about in the review to this project when making this comment.

This is also a good read and many points are already implemented in this project!


On Style Guide

My general opinion is (I separated in topics so I can elaborate on each one later):

  1. new contributors should benefit from a simple and concise style guide;
  2. a style guide should only address the most common patterns in the project;
  3. the style guide should be enforced automatically whenever possible (and notify contributors about it);
  4. rules which are not automatically enforceable should be few and simple, as they should only be enforced during the review process;
  5. a well-enforced style guide would make project readability and maintenance easier.

1. new contributors should benefit from a simple and concise style guide

This topic addresses

We already have a low number of contributors and the potential contributors, many times, are not developers

A style guide should make life easier for those who are not used to these type of domains, both scraping and gazettes. There is no need to reinvent the wheel or go searching in the codebase what is acceptable if there are solutions that could be documented and readily available.

Whatever we do to take away the burden of learning Scrapy, the Gazettes domain of expertise and project's tacit rules, should make an improvement on casual contribution (and the next topic is about making sure this does not get out of hand).

2. a style guide should only address the most common patterns in the project

Too much documentation is daunting for a casual contributor and should be avoided.

Things like "every spider needs to inherit from BaseGazetteSpider" are rules that are general enough as it applies to every written spider in the project.

Styling guides for "date ranges" and "pagination" are desirable in my opinion (if there are simple solutions) as those are the most common ways of link generation/discovery for spiders in this domain. Two sets of rules would cover most of the spiders and would make the code more consistent.

Method naming could be a thing too. There is no need to decide between "get" or "parse" (or any other verb) if it has already been decided what to use.

Staying limited to the most common of the patterns in the project could make a quick skim through the Style Guide page and then keeping it open as a reference through development much easier for the casual contributor. Avoiding the need of looking through the project to discover the tacit rules and acceptable solutions for the problem at hand.

3. the style guide should be enforced automatically whenever possible (and notify contributors about it)

The importing rules could be easily enforced with a formatter and pre-commit. I agree with both of you that if we were to choose something it would be from datetime import datetime (and `from x import y, z in general). I can't think of a reason not to do it. It would be awesome to have nobody ever needing to think about it anymore.

4. rules which are not automatically enforceable should be few and simple, as they should only be enforced during the review process

Only easily identifiable and common stuff about the style guide should be left to the reviewer to enforce. Everything that can be automated should be automated and the review should focus more on the correctness of the data being scraped, performance tips, general readability tips, stuff like that.

5. a well-enforced style guide would make project readability and maintenance easier.

This is just a wrap-around, I already covered this explaining what I meant in the above topics.


On rrule

I believe we should make rrule one of the rules in a style guide. Although its absence is hard to detect automatically, it is a common pattern in this domain and could be easily identified in the review. If some use cases were already documented, there would be no need to explain much about it for people who do not know it yet. It avoids nested loops like a boss, keeps the dates in date objects and it can avoid troublesome date range code blocks if any appear (I don't think that would be the case, though).


On from datetime import datetime

As I explained in topic 3, for me it is a no-brainer.


This comment came out more of a "we would really benefit from a style guide" than "dateutil.rrule" and "from datetime import datetime". I tried to make it into a base for something we could agree than to diverge about specific rules and not constructing something useful out of it.

I don't mean to be patronizing, and telling you what a style guide is or whatever, I really think you already know it and considered it. I really just wanted to explain what are my thoughts about it as everyone can have different meanings in their heads for the same concepts. And I'm up for implementing and documenting this stuff if a decision to do something about it is reached.

Sorry for the looooooong text, as always :)


TL;DR; I think rrule should be enforced, from datetime import datetime should too, and we should make a style guide to make life easier for casual contributors and project maintainers.

@jvanz
Copy link
Collaborator

jvanz commented Jul 14, 2020

As a Open Source contributor for while already, I agree that the style guide it is important and know that it may facilitate and save time in many sitautions. And I think nobody is against it. What I was trying to explain in my previous comment was this is not in our top of priorities. One of the reasons for that is because the need for something for formal does not happened yet.

Well, this is a open source project. Open an PR with some suggestions, then we can discuss on something more concrete. ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants