-
Notifications
You must be signed in to change notification settings - Fork 906
FAQ
You can find high level FAQs about Kedro on our website and technical FAQs in the developer documentation.
If you have a different question which isn't answered here, check out the searchable archive of Slack discussions or the older archive of discussions on Discord.
To ask your own question, join Kedro's Slack organisation and use the #questions channel.
Bruce Philp and Guilherme Braccialli are the brains behind a layered data-engineering convention as a model of managing data. You can find an in-depth walk through of their convention as a blog post on Medium.
Refer to the following table below for a high level guide to each layer's purpose
Note:The data layers don’t have to exist locally in the
data
folder within your project, but we recommend that you structure your S3 buckets or other data stores in a similar way.
Folder in data | Description |
---|---|
Raw | Initial start of the pipeline, containing the sourced data model(s) that should never be changed, it forms your single source of truth to work from. These data models are typically un-typed in most cases e.g. csv, but this will vary from case to case |
Intermediate | Optional data model(s), which are introduced to type your :code:raw data model(s), e.g. converting string based values into their current typed representation |
Primary | Domain specific data model(s) containing cleansed, transformed and wrangled data from either raw or intermediate , which forms your layer that you input into your feature engineering |
Feature | Analytics specific data model(s) containing a set of features defined against the primary data, which are grouped by feature area of analysis and stored against a common dimension |
Model input | Analytics specific data model(s) containing all :code:feature data against a common dimension and in the case of live projects against an analytics run date to ensure that you track the historical changes of the features over time |
Models | Stored, serialised pre-trained machine learning models |
Model output | Analytics specific data model(s) containing the results generated by the model based on the model input data |
Reporting | Reporting data model(s) that are used to combine a set of primary , feature , model input and model output data used to drive the dashboard and the views constructed. It encapsulates and removes the need to define any blending or joining of data, improve performance and replacement of presentation layer without having to redefine the data models |
This is a list of queries that we see commonly on our Slack channel (and previously on Discord). We are aiming to answer each of these in documentation or blog posts, but for now, it's handy to have a list of previous answers to draw upon.
- General Guideline for unit testing kedro project
- https://discord.com/channels/778216384475693066/931533715291648041
- https://discord.com/channels/778216384475693066/778998585454755870/864551888137486336
https://discord.com/channels/778216384475693066/846330075535769601/1035544689388036106
- https://kedro- org.slack.com/archives/C03RKP2LW64/p1673531471566119 it’s more of an opinion piece possibly even a philosophical discussion of tradeoffs and then a compromise on how you can achieve it https://discord.com/channels/778216384475693066/778998585454755870/1030410874466340895 https://discord.com/channels/778216384475693066/846330075535769601/1019966086088761455 https://discord.com/channels/778216384475693066/778998585454755870/1006200150823280751 https://discord.com/channels/778216384475693066/928337378345640009/968078377292537866 https://discord.com/channels/778216384475693066/846330075535769601/950582437409349652
https://discord.com/channels/778216384475693066/846330075535769601/1010521600740835449
https://discord.com/channels/778216384475693066/846330075535769601/1031896717630652416 https://discord.com/channels/778216384475693066/846330075535769601/984042296720908298 https://discord.com/channels/778216384475693066/941044759009587262/941093495236595792
Note that if you are running a debugger with tests, you may need to add an extra argument --no-cov
to make it work properly.
VS Code: https://docs.kedro.org/en/stable/development/set_up_vscode.html. PyCharm: https://docs.kedro.org/en/stable/development/set_up_pycharm.html?highlight=ide
- Contribute to Kedro
- Guidelines for contributing developers
- Contribute changes to Kedro that are tested on Databricks
- Backwards compatibility and breaking changes
- Contribute to the Kedro documentation
- Kedro documentation style guide
- Creating developer documentation
- Kedro new project creation - how it works
- The CI Setup: GitHub Actions
- The Performance Test Setup: Airspeed Velocity
- Kedro Framework team norms & ways of working ⭐️
- Kedro Framework Pull Request and Review team norms ✍️