-
Notifications
You must be signed in to change notification settings - Fork 926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Library should use either "data set" or "dataset" consistently #533
Comments
I vaguely remember there was a discussion on this before we open-sourced, @stichbury any thoughts on "dataset" vs "data set"? |
I feel like "dataset" has been adopted by a lot of major players. A few top hits off a search for
Definitions often list both forms:
Here's some data (biased towards books, FWIW): https://books.google.com/ngrams/graph?content=data+set%2Cdataset&year_start=1800&year_end=2019&corpus=26&smoothing=3&case_insensitive=true Some more, driven by the all-knowing Twitterverse: https://twitter.com/randal_olson/status/824702008007557121?lang=en However, a consideration against switching to "dataset" is that it messes with all of the classes. You could deprecate the two-word form now and remove it in 0.17, but I'm sure it'll cause some pain. Granted, that pain would be incurred if you ever wanted to make that switch. For me, I'm happy as long as it's (mostly) consistent (i.e. OK with the |
Yep @lorenabalan @deepyaman It is indeed a dataset :) if you follow our docs style guide. https://github.com/quantumblacklabs/private-kedro/blob/master/docs/README.md#kedro-lexicon
I think we are pretty much consistent in the docs in using the single word, no hyphenation. I've no opinion of the choice used for classes and in code TBH since I don't believe the code and docs have to follow each other exactly. Consistency is key though. |
I'm just following up, is it okay to close this issue? |
💯 from me |
@yetudada I don't think it's resolved in the code. At a minimum, can we change:
This way, at least module naming is consistent/people don't have to question which form to use. I'm happy to do this, given a green light. I assume we're not good to change |
Hi @deepyaman, yes that sounds good. We're very happy to accept a PR from you for this 😄 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Spotted in kedro-org/kedro-starters#137: Line 430 in fd8162d
|
We've decided to use "dataset" in prose, and This was partially achieved in https://github.com/quantumblacklabs/private-kedro/pull/1211 (private), and then a series of other PRs recently, including gh-2500, gh-2673, gh-2724, gh-2735, mostly by @deepyaman. @noklam collected some context in gh-2740, let's continue the conversation there. |
Description
I'm always frustrated when I want to create a new data set... or is it dataset?
Context
In both cases, it bothers/confuses me to see inconsistencies. Maybe it's just me. 🤷♂️
List of Inconsistencies
SomeDataSet
(data set as two words), but the module name is almost alwayssome_dataset.py
(dataset as one word).some_data_set.py
.cached_dataset.py
...partitioned_data_set.py
andtest_partitioned_dataset.py
).catalog.datasets
is a single word, as it could be argued that it's done for convenient reasons. Similarly,extras.datasets
makes sense as a single word, since it's more aligned with Python package naming convention.Much of this is trivial to address, although you do need to update some tests that rely on module name (e.g.
test_partitioned_dataset.py
).The text was updated successfully, but these errors were encountered: