FAQ

Frequently asked questions

You can find high level FAQs about Kedro on our website and technical FAQs in the developer documentation.

If you have a different question which isn't answered here, check out the searchable archive of Slack discussions or the older archive of discussions on Discord(https://linen-discord.kedro.org).

To ask your own question, join Kedro's Slack organisation and use the #questions channel.

What is data engineering convention?

Bruce Philp and Guilherme Braccialli are the brains behind a layered data-engineering convention as a model of managing data. You can find an in-depth walk through of their convention as a blog post on Medium.

Refer to the following table below for a high level guide to each layer's purpose

The data layers don’t have to exist locally in the `data` folder within your project, but we recommend that you structure your S3 buckets or other data stores in a similar way.

Folder in data	Description
Raw	Initial start of the pipeline, containing the sourced data model(s) that should never be changed, it forms your single source of truth to work from. These data models are typically un-typed in most cases e.g. csv, but this will vary from case to case
Intermediate	Optional data model(s), which are introduced to type your :code:`raw` data model(s), e.g. converting string based values into their current typed representation
Primary	Domain specific data model(s) containing cleansed, transformed and wrangled data from either `raw` or `intermediate`, which forms your layer that you input into your feature engineering
Feature	Analytics specific data model(s) containing a set of features defined against the `primary` data, which are grouped by feature area of analysis and stored against a common dimension
Model input	Analytics specific data model(s) containing all :code:`feature` data against a common dimension and in the case of live projects against an analytics run date to ensure that you track the historical changes of the features over time
Models	Stored, serialised pre-trained machine learning models
Model output	Analytics specific data model(s) containing the results generated by the model based on the `model input` data
Reporting	Reporting data model(s) that are used to combine a set of `primary`, `feature`, `model input` and `model output` data used to drive the dashboard and the views constructed. It encapsulates and removes the need to define any blending or joining of data, improve performance and replacement of presentation layer without having to redefine the data models

Who maintains Kedro?

The documentation about Kedro's Technical Steering Committee describes the detail of how Kedro is maintained, and the team behind the project.

We also want to thank all the open-source contributors whose work goes into Kedro releases.

How can I cite Kedro?

If you're an academic, Kedro can also help you, for example, as a tool to solve the problem of reproducible research. Use the "Cite this repository" button on our repository to generate a citation from the CITATION.cff file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAQ

Frequently asked questions

What is data engineering convention?

Who maintains Kedro?

How can I cite Kedro?

Contribute to Kedro

Kedro architecture

Technical docs

Developer docs

Kedro framework team norms

Research insights & summaries

☕️ Kedro Coffee Chat 🔶

Clone this wiki locally