You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have conducted extensive research to understand people's motivations for using or not using Kedro. We want to improve Kedro to provide more value to data scientists, data engineers, machine learning engineers, and other users. We've compiled the research insights and potential improvement ideas in this GitHub issue so that we can prioritise concepts that make Kedro an attractive option for users across roles and skill levels.
Our next step is to conduct value testing to identify the most impactful concepts. This user-centred process should guide us toward three high-potential solutions that can meaningfully solve pain points based on evidence directly from Kedro users.
What terminology will I be using?
To focus our research, we will look at two representative user profiles that encompass vital segments of the data science community:
"Notebook-focussed users" primarily use notebooks for analysis and may be less familiar with IDE-based coding workflows; this could include many data analysts and data scientists.
"IDE-focussed users" are comfortable using IDEs for development, indicating intermediate to advanced software engineering skills; this may include some data scientists, machine-learning engineers, and data engineers.
"Using Kedro as a framework" encapsulates using the framework (project template, session, context and CLI) and library components (ConfigLoader, Pipeline, Runner, Data Catalog and Datasets).
"Using Kedro as a library" refers to using one or more of Kedro's library components (ConfigLoader, Pipeline, Runner, Data Catalog and Datasets) in Python scripts or notebooks. Here, a user is leveraging Kedro modular components for their capabilities and is not using the framework.
What are some of our learnings?
IDE-focussed users want to adopt Kedro in an existing use case
IDE-focussed users will try to learn how to use Kedro by refactoring an existing use case into a project that uses Kedro. Their objective is to learn how to leverage Kedro as a framework or to adopt Kedro in stages by incorporating library components into their work.
IDE-focussed users want to incorporate Kedro in an existing project template
IDE-focussed users leverage internal project templates provided by CookieCutter or tools that provide project templates like Poetry, Hydra and DVC. This user group might bypass Kedro because of the high switching cost when adopting Kedro's project template or the challenges with integrating Kedro with those tools. We recommend that our users start from a Kedro project template or starter, and this may not be possible.
IDE-focussed users want to choose the features included in the project templated generated by Kedro
IDE-focussed users have a lot of opinions about how they want their project template to be structured. There was a lot of variance on #208. Suppose an IDE-focussed user has committed to the project template created by Kedro. In that case, they still want the flexibility to choose which features are enabled and visible in their personalised template.
IDE- and notebook-focussed users will pass over Kedro for use on collaborative projects when they're the only ones that want it
Kedro is positioned as an all-or-nothing overhaul. Our users will choose not to use Kedro when placed on a collaborative project and are the only ones that want to use Kedro. Most of these perspectives are associated with adopting the framework.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories
Our project template has a lot of software engineering concepts embedded in it, some more necessary than others. It is reasonable to expect that a notebook-focussed user, unfamiliar with this paradigm from software engineering frameworks, would need help understanding what each directory and file does - either by using our documentation or speaking to an expert user of Kedro. This user group also needed help understanding the role of configuration, and some preferred writing their code in a single file, a notebook.
Notebook-focussed and some IDE-focussed users don't know that they can use our Data Catalog; they think that using it requires a commitment to the framework
Users assume an all-or-nothing use of the Kedro framework and do not realise they can use the Data Catalog as a stand-alone item. Our documentation for Kedro as a data registry is a very unpopular page, but we also do not talk about this functionality at all with our users.
IDE-focussed users leverage our Data Catalog to help notebook-focused users or people who don't want to use our framework
Kedro's modular architecture provides opportunities to delight users by incrementally integrating specific components like the Data Catalog. For example, IDE-focused users have used the Catalog to empower analysts on their teams. Additionally, users who found the framework restrictive or just wanted to use Kedro for data exploration have benefited from the Data Catalog.
IDE-focussed users run into errors because our ConfigLoader requires a conf directory, makes users place their configuration in conf/base and needs conf/local to be present. We expected our users to make ConfigLoaders without these assumptions, but we have yet to see evidence that they have done this. Our users choose to use other tools instead of our ConfigLoader or have workarounds for the errors that we create. We've assumed that users would always start from a Kedro project, and that's not always true.
How are we trying to address these insights?
We have compiled a table of adoption opportunities, consolidated past and future concepts and solutions, and new ideas to build on learnings. This table catalogues challenges identified through user research and outlines the rationale behind solutions we have prototyped or proposed to address each obstacle.
What are some of our learnings?
User profile focus
What potential solutions or past approaches could address these learnings?
How does this concept help our users? What are known limitations?
IDE-focussed users want to adopt Kedro in an existing use case
kedro init assumes that IDE-focussed users want to adopt the framework. It only adds files to an existing project so that kedro recognises it as a project, these files are detailed in our architecture overview. Users will still need to make significant changes to their code, e.g. create pure Python functions, create a Python package for src, take out hard-coded configuration values, remove I/O, figure out how to integrate tools e.g. MLflow or DVC, and more.
IDE-focussed users want to adopt Kedro in an existing use case
Users would leverage Kedro's library components rather than adopting the framework. It's possible to do this today with our Data Catalog, Pipeline and Runner. The ConfigLoader has framework assumptions and requires using the conf structure from the project template. Users would not be able to use the CLI or Kedro-Viz.
IDE-focussed users want to incorporate Kedro in an existing project template
kedro init allows users to add the minimum files required for us to recognise that it's a Kedro project. This design will not address integration between the tools, e.g. look at Databricks' MLOPs stack and try to add files for Kedro to this. Nor will it provide flexibility for customising the project template created by Kedro (#2553).
IDE-focussed users want to incorporate Kedro in an existing project template
Users would leverage Kedro's library components rather than adopting the framework. It's possible to do this today with our Data Catalog, Pipeline and Runner. The ConfigLoader has framework assumptions and requires using the conf structure from the project template. Users would not be able to use the CLI or Kedro-Viz.
IDE-focussed users want to choose the features included in the project templated generated by Kedro
IDE
Utility modules (#2388), also known as Kedro Incremental Starters (#2054)
For IDE-focussed users, this design assumes the user has adopted the framework and wants to limit the features (and therefore folders and files) we include in their project template. This design does not create more users with this profile, because they are already using our framework, but helps with their user experience of Kedro.
IDE- and notebook-focussed users will pass over Kedro for use on collaborative projects when they're the only ones that want it
This might help increase the adoption of Kedro as a library where users are still determining if they can adopt the framework for their collaborative work. Rather than all-or-nothing, users would leverage Kedro's library components rather than adopting the framework. Users would not be able to use the CLI or Kedro-Viz.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories
This concept does not help with learning new software engineering concepts; our users would still need to do this. However, it does make it possible to avoid the IDE (a known challenge for this user profile). They also would not get overwhelmed by the project template. However, users could not use the CLI or use Kedro-Viz.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories
Notebook
Utility modules (#2388), also known as Kedro Incremental Starters (#2054)
This solution might make the project template more manageable to new users because it has fewer files and folders. Users opt-in for features to understand why specific files and folders get added to their projects.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories
Very few people used this solution, even when they were aware of it. They did not find it troubling to copy code into their project template. It's a user experience improvement feature and does aid the adoption of Kedro. This design also assumes that you know how Kedro comes together - awareness of nodes.py in the project template - and that you're not intimidated by the framework.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories
This idea builds on using Kedro as a library, and all it additionally allows users to do is visualise their Kedro pipeline. This idea inherits the benefits and downsides of using Kedro as a library.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories
Exporting code from a notebook (kedro jupyter convert) into a Kedro project was part of this concept; it's essentially a "framework in the notebook". Users did not use kedro jupyter convert. This plugin idea was one of the worst-rated ideas in our Kedro IDE exploratory concept tests because users wondered how to revert the code from the framework into the notebook and whether it would always work.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories
Notebook
Refactor Jupyter notebooks into Kedro projects, using for example GenAI (#2820)
Gen AI would guide users as they learn how to convert their Jupyter notebooks into framework use cases. The limitation of this idea is that we need to know how the notebook/s are structured. Users have created video walkthroughs on how to do this.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories
The concept would allow users to create nodes, pipelines and reuse code without exposure to the framework. A benefit is that they would get a Kedro framework project. Users were unsure how this would compete with other tooling like Alteryx; it was one of the lowest-rated ideas in the Kedro IDE exploratory concept tests.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories
This solution was supposed to make it easy for people to use Kedro for EDA (Data Catalog and ConfigLoader) by generating a mini-project template with the conf directory and a notebook. It also thought about a journey into the full Kedro project template. We don't have evidence to suggest this feature is well adopted; users solely leverage the Data Catalog (and even use alternative libraries for loading configuration), @Galileo-Galilel created a custom starter for his teams, and it does expect that users should have a partial understanding of the project template (conf).
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories
Gives users the ability to use the DataCatalog and ConfigLoader as standalone tools. Requires more information.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories
This design assumes there is an existing Kedro framework user, and the user of this feature might not be a Kedro framework user and they want to tweak parameters to get insights. This design requires part of the team to know the framework still, but not everyone needs to know it.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories
Notebook-focussed and some IDE-focussed users don't know that they can use our Data Catalog; they think that using it requires a commitment to the framework
IDE-focussed users know how to use the Data Catalog this way, but this user group wants the AbstractDataset to exist in kedro-datasets. They don't want to import kedro or install dependencies related to kedro when leveraging this functionality (#1758).
This would allow users to leverage the ConfigLoader in their work, especially with the DataCatalog or Parameters. Once again, this promotes using Kedro as a library, which might translate into something other than framework adoption. But this should help notebook-focussed users with simple projects adopt some best-practice, and they will not need an IDE.
The text was updated successfully, but these errors were encountered:
Introduction
What is this?
We have conducted extensive research to understand people's motivations for using or not using Kedro. We want to improve Kedro to provide more value to data scientists, data engineers, machine learning engineers, and other users. We've compiled the research insights and potential improvement ideas in this GitHub issue so that we can prioritise concepts that make Kedro an attractive option for users across roles and skill levels.
In part, this issue addresses: kedro-org/kedro-viz#1448
What's in the scope of this work?
Our next step is to conduct value testing to identify the most impactful concepts. This user-centred process should guide us toward three high-potential solutions that can meaningfully solve pain points based on evidence directly from Kedro users.
What terminology will I be using?
To focus our research, we will look at two representative user profiles that encompass vital segments of the data science community:
It's helpful to define two existing ways of using Kedro, to ensure we have a shared understanding when discussing Kedro's architecture:
What are some of our learnings?
IDE-focussed users want to adopt Kedro in an existing use case
IDE-focussed users will try to learn how to use Kedro by refactoring an existing use case into a project that uses Kedro. Their objective is to learn how to leverage Kedro as a framework or to adopt Kedro in stages by incorporating library components into their work.
IDE-focussed users want to incorporate Kedro in an existing project template
IDE-focussed users leverage internal project templates provided by CookieCutter or tools that provide project templates like Poetry, Hydra and DVC. This user group might bypass Kedro because of the high switching cost when adopting Kedro's project template or the challenges with integrating Kedro with those tools. We recommend that our users start from a Kedro project template or starter, and this may not be possible.
IDE-focussed users want to choose the features included in the project templated generated by Kedro
IDE-focussed users have a lot of opinions about how they want their project template to be structured. There was a lot of variance on #208. Suppose an IDE-focussed user has committed to the project template created by Kedro. In that case, they still want the flexibility to choose which features are enabled and visible in their personalised template.
IDE- and notebook-focussed users will pass over Kedro for use on collaborative projects when they're the only ones that want it
Kedro is positioned as an all-or-nothing overhaul. Our users will choose not to use Kedro when placed on a collaborative project and are the only ones that want to use Kedro. Most of these perspectives are associated with adopting the framework.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories
Our project template has a lot of software engineering concepts embedded in it, some more necessary than others. It is reasonable to expect that a notebook-focussed user, unfamiliar with this paradigm from software engineering frameworks, would need help understanding what each directory and file does - either by using our documentation or speaking to an expert user of Kedro. This user group also needed help understanding the role of configuration, and some preferred writing their code in a single file, a notebook.
Notebook-focussed and some IDE-focussed users don't know that they can use our Data Catalog; they think that using it requires a commitment to the framework
Users assume an all-or-nothing use of the Kedro framework and do not realise they can use the Data Catalog as a stand-alone item. Our documentation for Kedro as a data registry is a very unpopular page, but we also do not talk about this functionality at all with our users.
IDE-focussed users leverage our Data Catalog to help notebook-focused users or people who don't want to use our framework
Kedro's modular architecture provides opportunities to delight users by incrementally integrating specific components like the Data Catalog. For example, IDE-focused users have used the Catalog to empower analysts on their teams. Additionally, users who found the framework restrictive or just wanted to use Kedro for data exploration have benefited from the Data Catalog.
IDE-focussed users workaround our ConfigLoader's assumptions
IDE-focussed users run into errors because our ConfigLoader requires a
conf
directory, makes users place their configuration inconf/base
and needsconf/local
to be present. We expected our users to make ConfigLoaders without these assumptions, but we have yet to see evidence that they have done this. Our users choose to use other tools instead of our ConfigLoader or have workarounds for the errors that we create. We've assumed that users would always start from a Kedro project, and that's not always true.How are we trying to address these insights?
We have compiled a table of adoption opportunities, consolidated past and future concepts and solutions, and new ideas to build on learnings. This table catalogues challenges identified through user research and outlines the rationale behind solutions we have prototyped or proposed to address each obstacle.
kedro init
(#2512)kedro init
assumes that IDE-focussed users want to adopt the framework. It only adds files to an existing project so thatkedro
recognises it as a project, these files are detailed in our architecture overview. Users will still need to make significant changes to their code, e.g. create pure Python functions, create a Python package forsrc
, take out hard-coded configuration values, remove I/O, figure out how to integrate tools e.g. MLflow or DVC, and more.conf
structure from the project template. Users would not be able to use the CLI or Kedro-Viz.kedro init
(#2512)kedro init
allows users to add the minimum files required for us to recognise that it's a Kedro project. This design will not address integration between the tools, e.g. look at Databricks' MLOPs stack and try to add files for Kedro to this. Nor will it provide flexibility for customising the project template created by Kedro (#2553).conf
structure from the project template. Users would not be able to use the CLI or Kedro-Viz.nodes.py
in the project template - and that you're not intimidated by the framework.kedro jupyter convert
) into a Kedro project was part of this concept; it's essentially a "framework in the notebook". Users did not usekedro jupyter convert
. This plugin idea was one of the worst-rated ideas in our Kedro IDE exploratory concept tests because users wondered how to revert the code from the framework into the notebook and whether it would always work.conf
directory and a notebook. It also thought about a journey into the full Kedro project template. We don't have evidence to suggest this feature is well adopted; users solely leverage the Data Catalog (and even use alternative libraries for loading configuration), @Galileo-Galilel created a custom starter for his teams, and it does expect that users should have a partial understanding of the project template (conf
).conf
base
andlocal
confused some users, but we never completed the rename because the results were inconclusive. @idanov considered allowing users to choose their own names (#770).AbstractDataset
tokedro-datasets
(#2409)AbstractDataset
to exist inkedro-datasets
. They don't want to importkedro
or install dependencies related tokedro
when leveraging this functionality (#1758).The text was updated successfully, but these errors were encountered: