[DataCatalog]: Error message is confusing when using `DataSet` instead of `Dataset` #3909

ElenaKhaustova · 2024-06-03T10:50:38Z

Description

There is confusion between DataSet and Dataset terminology, and the error message is not informative when using old naming. They have been renamed in 0.19, but people miss that fact when switching to the new version.

Relates to #2401

Context

Example of the current error message:

DatasetError: An exception occurred when parsing config for dataset 'companies':
Class 'pandas.CSVDataSet' not found, is this a typo?

The text was updated successfully, but these errors were encountered:

datajoely · 2024-06-03T11:01:35Z

I wonder if we could auto-fix this with a bug warning

astrojuanlu · 2024-06-06T06:56:29Z

I have a couple questions on this one:

As far as I understand, projects created with our starters have a kedro_init_version that limits what version of Kedro can be used. If, say, someone created a project with Kedro 0.18 (uppercase S datasets from kedro.extras) and then tried to use Kedro 0.19 (no kedro.extras at all, need to install kedro-datasets), they would get an error, right?
- I also reckon though that the versioning of kedro-datasets is not limited by kedro_init_version
- In other words, how does this problem manifest itself nowadays? What sequence of steps gets us to here?
It is well known that upgrading a Kedro version is hard in general (but I could not locate an issue for it). By looking at this problem from that angle, and considering that clearly it arises from people not reading our existing migration guides, can we provide linters ("kedro-lint") or semi-automatic migration utils ("kedro-modernize") to help with this task, rather than limiting ourselves to improving the traceback?

ElenaKhaustova · 2024-06-06T14:49:57Z

I have a couple questions on this one:

As far as I understand, projects created with our starters have a kedro_init_version that limits what version of Kedro can be used. If, say, someone created a project with Kedro 0.18 (uppercase S datasets from kedro.extras) and then tried to use Kedro 0.19 (no kedro.extras at all, need to install kedro-datasets), they would get an error, right?

I also reckon though that the versioning of kedro-datasets is not limited by kedro_init_version

In other words, how does this problem manifest itself nowadays? What sequence of steps gets us to here?

It is well known that upgrading a Kedro version is hard in general (but I could not locate an issue for it). By looking at this problem from that angle, and considering that clearly it arises from people not reading our existing migration guides, can we provide linters ("kedro-lint") or semi-automatic migration utils ("kedro-modernize") to help with this task, rather than limiting ourselves to improving the traceback?

So far, we know that this is still happening when users already have Kedro project created for the older version but upgrading Kedro to a newer version. Another reason that was mentioned by interviewees is that our old blog posts have examples with old naming, which is fair because some time ago, it was relevant. But some of them still follow those examples and get confused.

I've also requested some extra details from the user side to better answer your questions.

ElenaKhaustova · 2024-06-06T15:55:15Z

@astrojuanlu the blog post mentioned above: https://kedro.org/blog/add-kedro-to-your-data-science-notebook

astrojuanlu · 2024-06-07T10:24:26Z

Very good point about old training material using the old names, didn't think about that... This might be a problem that will need some time to go away then, and we might indeed need to take some action on our side.

merelcht · 2024-06-10T14:11:15Z

Looking at the error:

DatasetError: An exception occurred when parsing config for dataset 'companies':
Class 'pandas.CSVDataSet' not found, is this a typo?

I would still argue that the error isn't confusing, it states exactly what the problem is: spelling DataSet with a capital S instead of lower case s, which is indeed a typo. Now the question is whether we can add some additional clarification so that people check that lower/upper-case spelling. At the same time, it will be tricky to do specific matching for DataSet endings, because the user could have custom datasets that have that spelling and work fine.

ElenaKhaustova added the Issue: Feature Request New feature or improvement to existing feature label Jun 3, 2024

merelcht added this to the Improve Developer Experience milestone Jun 3, 2024

ElenaKhaustova mentioned this issue Jun 6, 2024

Research summary of insights for redesigning Kedro's data catalog API #3934

Closed

ankatiyar mentioned this issue Jun 13, 2024

Update error message when kedro-datasets is not installed or DataSet spelling is used #3952

Merged

7 tasks

ankatiyar closed this as completed in #3952 Jun 19, 2024

github-actions bot mentioned this issue Jul 1, 2024

Monthly issue metrics report #3975

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DataCatalog]: Error message is confusing when using `DataSet` instead of `Dataset` #3909

[DataCatalog]: Error message is confusing when using `DataSet` instead of `Dataset` #3909

ElenaKhaustova commented Jun 3, 2024 •

edited

Loading

datajoely commented Jun 3, 2024 •

edited by iamelijahko

Loading

astrojuanlu commented Jun 6, 2024

ElenaKhaustova commented Jun 6, 2024

ElenaKhaustova commented Jun 6, 2024

astrojuanlu commented Jun 7, 2024

merelcht commented Jun 10, 2024

[DataCatalog]: Error message is confusing when using DataSet instead of Dataset #3909

[DataCatalog]: Error message is confusing when using DataSet instead of Dataset #3909

Comments

ElenaKhaustova commented Jun 3, 2024 • edited Loading

Description

Context

datajoely commented Jun 3, 2024 • edited by iamelijahko Loading

astrojuanlu commented Jun 6, 2024

ElenaKhaustova commented Jun 6, 2024

ElenaKhaustova commented Jun 6, 2024

astrojuanlu commented Jun 7, 2024

merelcht commented Jun 10, 2024

[DataCatalog]: Error message is confusing when using `DataSet` instead of `Dataset` #3909

[DataCatalog]: Error message is confusing when using `DataSet` instead of `Dataset` #3909

ElenaKhaustova commented Jun 3, 2024 •

edited

Loading

datajoely commented Jun 3, 2024 •

edited by iamelijahko

Loading