-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation improvement about create a custom dataset #3654
Comments
Thanks for opening this issue @avonarret ! Adding it to our backlog |
@avonarret Could you create a minimal project that we can reproduce? I have done this many times and it doesn't require If the modules are importable, datasets is just one of the module so there is nothing special about it. i.e. |
I am running in the same issue when accessing the catalog while debugging the standard |
@laurensversluis could you give a bit more detail on your setup? Going back to the original @avonarret description, I am in line with @avonarret and I'm not sure one needs kedro/kedro/framework/startup.py Lines 137 to 141 in 63d7516
|
Also, eventually we should move towards higher level tools like |
@noklam Sorry for the late reply. In the course of our internal developments, we have realized that we currently do not need any preprocessing for our needs. We realized that the use of Kedro would be a bit overkill and that we are already well served with Airflow DAGs for our “simpler” tasks. At least in the current project - who knows what's to come. I have now tried again to reproduce the problem I originally described. I followed the recent (most probably unchanged) documentation at https://docs.kedro.org/en/stable/data/how_to_create_a_custom_dataset.html#project-setup again in a new setup environment:
The created starter examples ran cleanly without any problems. Then I created the Result: I was able to successfully load the catalog with the custom dataset with
@astrojuanlu Since I didn't have to run a separate pip install when retesting, this probably won't be necessary for now? But basically I support the approach, should a similar need arise in other situations. |
Description
The documentation at Advanced: Tutorial to create a custom dataset describes how custom datasets can be created. However, the documentation still lacks some details on how to "register" your own dataset, so it can be imported and used by the catalog.load method.
Documentation page (if applicable)
https://docs.kedro.org/en/stable/data/how_to_create_a_custom_dataset.html
Context
According to the current documentation I have created the following file structure:
After configuring the catalog.yml as described in the docs and running the catalog.load method in a jupyter notebook, the custom dataset didn't get recognized. The following errors occured:
DatasetError: An exception occurred when parsing config for dataset 'catalogname': Class 'projectname.datasets.CustomDataset' not found, is this a typo?
When using this structure, according to @astrojuanlu there is a
pip install .
required Slack Conversation.Possible steps to consider for the docs:
src/projectname/datasets/datasetname.py
with the__init__.py
alongside.__init__.py
, this file probably needscd projectname
pip install .
conf/base/catalog.yml
thetype
should beprojectname.datasets.CustomDataset
Whereas projectname and CustomDataset have to be exchanged with the according names of the respective project.
With these steps I was able to sucessfully call catalog.load within the jupyter notebook.
The text was updated successfully, but these errors were encountered: