Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCatalog]: Error message is confusing when using DataSet instead of Dataset #3909

Closed
ElenaKhaustova opened this issue Jun 3, 2024 · 6 comments · Fixed by #3952
Closed
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@ElenaKhaustova
Copy link
Contributor

ElenaKhaustova commented Jun 3, 2024

Description

There is confusion between DataSet and Dataset terminology, and the error message is not informative when using old naming. They have been renamed in 0.19, but people miss that fact when switching to the new version.

Relates to #2401

Context

Example of the current error message:

DatasetError: An exception occurred when parsing config for dataset 'companies':
Class 'pandas.CSVDataSet' not found, is this a typo?
@ElenaKhaustova ElenaKhaustova added the Issue: Feature Request New feature or improvement to existing feature label Jun 3, 2024
@datajoely
Copy link
Contributor

datajoely commented Jun 3, 2024

I wonder if we could auto-fix this with a bug warning

@astrojuanlu
Copy link
Member

I have a couple questions on this one:

  • As far as I understand, projects created with our starters have a kedro_init_version that limits what version of Kedro can be used. If, say, someone created a project with Kedro 0.18 (uppercase S datasets from kedro.extras) and then tried to use Kedro 0.19 (no kedro.extras at all, need to install kedro-datasets), they would get an error, right?
    • I also reckon though that the versioning of kedro-datasets is not limited by kedro_init_version
    • In other words, how does this problem manifest itself nowadays? What sequence of steps gets us to here?
  • It is well known that upgrading a Kedro version is hard in general (but I could not locate an issue for it). By looking at this problem from that angle, and considering that clearly it arises from people not reading our existing migration guides, can we provide linters ("kedro-lint") or semi-automatic migration utils ("kedro-modernize") to help with this task, rather than limiting ourselves to improving the traceback?

@ElenaKhaustova
Copy link
Contributor Author

I have a couple questions on this one:

  • As far as I understand, projects created with our starters have a kedro_init_version that limits what version of Kedro can be used. If, say, someone created a project with Kedro 0.18 (uppercase S datasets from kedro.extras) and then tried to use Kedro 0.19 (no kedro.extras at all, need to install kedro-datasets), they would get an error, right?

    • I also reckon though that the versioning of kedro-datasets is not limited by kedro_init_version
    • In other words, how does this problem manifest itself nowadays? What sequence of steps gets us to here?
  • It is well known that upgrading a Kedro version is hard in general (but I could not locate an issue for it). By looking at this problem from that angle, and considering that clearly it arises from people not reading our existing migration guides, can we provide linters ("kedro-lint") or semi-automatic migration utils ("kedro-modernize") to help with this task, rather than limiting ourselves to improving the traceback?

So far, we know that this is still happening when users already have Kedro project created for the older version but upgrading Kedro to a newer version. Another reason that was mentioned by interviewees is that our old blog posts have examples with old naming, which is fair because some time ago, it was relevant. But some of them still follow those examples and get confused.

I've also requested some extra details from the user side to better answer your questions.

@ElenaKhaustova
Copy link
Contributor Author

@astrojuanlu the blog post mentioned above: https://kedro.org/blog/add-kedro-to-your-data-science-notebook

@astrojuanlu
Copy link
Member

Very good point about old training material using the old names, didn't think about that... This might be a problem that will need some time to go away then, and we might indeed need to take some action on our side.

@merelcht
Copy link
Member

Looking at the error:

DatasetError: An exception occurred when parsing config for dataset 'companies':
Class 'pandas.CSVDataSet' not found, is this a typo?

I would still argue that the error isn't confusing, it states exactly what the problem is: spelling DataSet with a capital S instead of lower case s, which is indeed a typo. Now the question is whether we can add some additional clarification so that people check that lower/upper-case spelling. At the same time, it will be tricky to do specific matching for DataSet endings, because the user could have custom datasets that have that spelling and work fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Archived in project
4 participants