-
Notifications
You must be signed in to change notification settings - Fork 906
Insights and opportunities related to helping Kedro impact more users
We have conducted extensive research to understand people's motivations for using or not using Kedro. We want to improve Kedro to provide more value to data scientists, data engineers, machine learning engineers, and other users. We've compiled the research insights and potential improvement ideas in this GitHub issue so that we can prioritise concepts that make Kedro an attractive option for users across roles and skill levels.
In part, this issue addresses: https://github.com/kedro-org/kedro-viz/issues/1448
Our next step is to conduct value testing to identify the most impactful concepts. This user-centred process should guide us toward three high-potential solutions that can meaningfully solve pain points based on evidence directly from Kedro users.
To focus our research, we will look at two representative user profiles that encompass vital segments of the data science community:
- "Notebook-focussed users" primarily use notebooks for analysis and may be less familiar with IDE-based coding workflows; this could include many data analysts and data scientists.
- "IDE-focussed users" are comfortable using IDEs for development, indicating intermediate to advanced software engineering skills; this may include some data scientists, machine-learning engineers, and data engineers.
It's helpful to define two existing ways of using Kedro, to ensure we have a shared understanding when discussing Kedro's architecture:
- "Using Kedro as a framework" encapsulates using the framework (project template, session, context and CLI) and library components (ConfigLoader, Pipeline, Runner, Data Catalog and Datasets).
- "Using Kedro as a library" refers to using one or more of Kedro's library components (ConfigLoader, Pipeline, Runner, Data Catalog and Datasets) in Python scripts or notebooks. Here, a user is leveraging Kedro modular components for their capabilities and is not using the framework.
IDE-focussed users will try to learn how to use Kedro by refactoring an existing use case into a project that uses Kedro. Their objective is to learn how to leverage Kedro as a framework or to adopt Kedro in stages by incorporating library components into their work.
IDE-focussed users leverage internal project templates provided by CookieCutter or tools that provide project templates like Poetry, Hydra and DVC. This user group might bypass Kedro because of the high switching cost when adopting Kedro's project template or the challenges with integrating Kedro with those tools. We recommend that our users start from a Kedro project template or starter, and this may not be possible.
IDE-focussed users have a lot of opinions about how they want their project template to be structured. There was a lot of variance on #208. Suppose an IDE-focussed user has committed to the project template created by Kedro. In that case, they still want the flexibility to choose which features are enabled and visible in their personalised template.
IDE- and notebook-focussed users will pass over Kedro for use on collaborative projects when they're the only ones that want it
Kedro is positioned as an all-or-nothing overhaul. Our users will choose not to use Kedro when placed on a collaborative project and are the only ones that want to use Kedro. Most of these perspectives are associated with adopting the framework.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories
Our project template has a lot of software engineering concepts embedded in it, some more necessary than others. It is reasonable to expect that a notebook-focussed user, unfamiliar with this paradigm from software engineering frameworks, would need help understanding what each directory and file does - either by using our documentation or speaking to an expert user of Kedro. This user group also needed help understanding the role of configuration, and some preferred writing their code in a single file, a notebook.
Notebook-focussed and some IDE-focussed users don't know that they can use our Data Catalog; they think that using it requires a commitment to the framework
Users assume an all-or-nothing use of the Kedro framework and do not realise they can use the Data Catalog as a stand-alone item. Our documentation for Kedro as a data registry is a very unpopular page, but we also do not talk about this functionality at all with our users.
IDE-focussed users leverage our Data Catalog to help notebook-focused users or people who don't want to use our framework
Kedro's modular architecture provides opportunities to delight users by incrementally integrating specific components like the Data Catalog. For example, IDE-focused users have used the Catalog to empower analysts on their teams. Additionally, users who found the framework restrictive or just wanted to use Kedro for data exploration have benefited from the Data Catalog.
IDE-focussed users run into errors because our ConfigLoader requires a conf
directory, makes users place their configuration in conf/base
and needs conf/local
to be present. We expected our users to make ConfigLoaders without these assumptions, but we have yet to see evidence that they have done this. Our users choose to use other tools instead of our ConfigLoader or have workarounds for the errors that we create. We've assumed that users would always start from a Kedro project, and that's not always true.
We have compiled a table of adoption opportunities, consolidated past and future concepts and solutions, and new ideas to build on learnings. This table catalogues challenges identified through user research and outlines the rationale behind solutions we have prototyped or proposed to address each obstacle.
What are some of our learnings? | User profile focus | What potential solutions or past approaches could address these learnings? | How does this concept help our users? What are known limitations? |
---|---|---|---|
IDE-focussed users want to adopt Kedro in an existing use case | IDE |
kedro init (#2512) |
kedro init assumes that IDE-focussed users want to adopt the framework. It only adds files to an existing project so that kedro recognises it as a project, these files are detailed in our architecture overview. Users will still need to make significant changes to their code, e.g. create pure Python functions, create a Python package for src , take out hard-coded configuration values, remove I/O, figure out how to integrate tools e.g. MLflow or DVC, and more. |
IDE-focussed users want to adopt Kedro in an existing use case | IDE | Use Kedro as a library (in part in #2819) | Users would leverage Kedro's library components rather than adopting the framework. It's possible to do this today with our Data Catalog, Pipeline and Runner. The ConfigLoader has framework assumptions and requires using the conf structure from the project template. Users would not be able to use the CLI or Kedro-Viz. |
IDE-focussed users want to incorporate Kedro in an existing project template | IDE |
kedro init (#2512) |
kedro init allows users to add the minimum files required for us to recognise that it's a Kedro project. This design will not address integration between the tools, e.g. look at Databricks' MLOPs stack and try to add files for Kedro to this. Nor will it provide flexibility for customising the project template created by Kedro (#2553). |
IDE-focussed users want to incorporate Kedro in an existing project template | IDE | Starters | Starters enable merging project templates from different tools and help with integration but introduce a maintenance burden for users (#1961). |
IDE-focussed users want to incorporate Kedro in an existing project template | IDE | Use Kedro as a library (in part in #2819) | Users would leverage Kedro's library components rather than adopting the framework. It's possible to do this today with our Data Catalog, Pipeline and Runner. The ConfigLoader has framework assumptions and requires using the conf structure from the project template. Users would not be able to use the CLI or Kedro-Viz. |
IDE-focussed users want to choose the features included in the project templated generated by Kedro | IDE | Utility modules (#2388), also known as Kedro Incremental Starters (#2054) | For IDE-focussed users, this design assumes the user has adopted the framework and wants to limit the features (and therefore folders and files) we include in their project template. This design does not create more users with this profile, because they are already using our framework, but helps with their user experience of Kedro. |
IDE- and notebook-focussed users will pass over Kedro for use on collaborative projects when they're the only ones that want it | IDE + notebook | Use Kedro as a library (in part in #2819) | This might help increase the adoption of Kedro as a library where users are still determining if they can adopt the framework for their collaborative work. Rather than all-or-nothing, users would leverage Kedro's library components rather than adopting the framework. Users would not be able to use the CLI or Kedro-Viz. |
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories | Notebook | Use Kedro as a library (in part in #2819) | This concept does not help with learning new software engineering concepts; our users would still need to do this. However, it does make it possible to avoid the IDE (a known challenge for this user profile). They also would not get overwhelmed by the project template. However, users could not use the CLI or use Kedro-Viz. |
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories | Notebook | Utility modules (#2388), also known as Kedro Incremental Starters (#2054) | This solution might make the project template more manageable to new users because it has fewer files and folders. Users opt-in for features to understand why specific files and folders get added to their projects. |
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories | Notebook | Export nodes from a notebook | Very few people used this solution, even when they were aware of it. They did not find it troubling to copy code into their project template. It's a user experience improvement feature and does aid the adoption of Kedro. This design also assumes that you know how Kedro comes together - awareness of nodes.py in the project template - and that you're not intimidated by the framework. |
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories | Notebook | Make it possible to use Kedro-Viz without the framework (https://github.com/kedro-org/kedro-viz/issues/1459); similar to Kedro-Light | This idea builds on using Kedro as a library, and all it additionally allows users to do is visualise their Kedro pipeline. This idea inherits the benefits and downsides of using Kedro as a library. |
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories | Notebook | Kedro-Jupyter plugin (Slide 20) | Exporting code from a notebook (kedro jupyter convert ) into a Kedro project was part of this concept; it's essentially a "framework in the notebook". Users did not use kedro jupyter convert . This plugin idea was one of the worst-rated ideas in our Kedro IDE exploratory concept tests because users wondered how to revert the code from the framework into the notebook and whether it would always work. |
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories | Notebook | Refactor Jupyter notebooks into Kedro projects, using for example GenAI (#2820) | Gen AI would guide users as they learn how to convert their Jupyter notebooks into framework use cases. The limitation of this idea is that we need to know how the notebook/s are structured. Users have created video walkthroughs on how to do this. |
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories | Notebook | Pipeline Builder UI also known as Kedro Lab, Pipeline Builder GUI or even two-way communication between framework & viz | The concept would allow users to create nodes, pipelines and reuse code without exposure to the framework. A benefit is that they would get a Kedro framework project. Users were unsure how this would compete with other tooling like Alteryx; it was one of the lowest-rated ideas in the Kedro IDE exploratory concept tests. |
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories | Notebook | Standalone Data Catalog also known as Mini-Kedro | This solution was supposed to make it easy for people to use Kedro for EDA (Data Catalog and ConfigLoader) by generating a mini-project template with the conf directory and a notebook. It also thought about a journey into the full Kedro project template. We don't have evidence to suggest this feature is well adopted; users solely leverage the Data Catalog (and even use alternative libraries for loading configuration), @Galileo-Galilel created a custom starter for his teams, and it does expect that users should have a partial understanding of the project template (conf ). |
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories | Notebook | Kedro Plugins proposed by @noklam | Gives users the ability to use the DataCatalog and ConfigLoader as standalone tools. Requires more information. |
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories | Notebook | Supporting the reporting use case on Kedro-Viz also known as the Parameter Editor (Slide 18) | This design assumes there is an existing Kedro framework user, and the user of this feature might not be a Kedro framework user and they want to tweak parameters to get insights. This design requires part of the team to know the framework still, but not everyone needs to know it. |
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories | Notebook | Rename the directories under conf
|
Users would still have to know much about the framework, specifically the project template, so this only helps a little. We discovered that base and local confused some users, but we never completed the rename because the results were inconclusive. @idanov considered allowing users to choose their own names (#770). |
Notebook-focussed and some IDE-focussed users don't know that they can use our Data Catalog; they think that using it requires a commitment to the framework | Notebook | Use Kedro as a library (in part in #2819) | This is possible; it needs to be promoted. Users need to know they can use the Data Catalog without worry - the ConfigLoader is out of scope. |
IDE-focussed users leverage our Data Catalog to help notebook-focused users or people who don't want to use our framework | IDE + notebook | Move AbstractDataset to kedro-datasets (#2409) |
IDE-focussed users know how to use the Data Catalog this way, but this user group wants the AbstractDataset to exist in kedro-datasets . They don't want to import kedro or install dependencies related to kedro when leveraging this functionality (#1758). |
IDE-focussed users workaround our ConfigLoader's assumptions | IDE + notebook | Make it easier to use the ConfigLoader (#2819) | This would allow users to leverage the ConfigLoader in their work, especially with the DataCatalog or Parameters. Once again, this promotes using Kedro as a library, which might translate into something other than framework adoption. But this should help notebook-focussed users with simple projects adopt some best-practice, and they will not need an IDE. |
- Contribute to Kedro
- Guidelines for contributing developers
- Contribute changes to Kedro that are tested on Databricks
- Backwards compatibility and breaking changes
- Contribute to the Kedro documentation
- Kedro documentation style guide
- Creating developer documentation
- Kedro new project creation - how it works
- The CI Setup: GitHub Actions
- The Performance Test Setup: Airspeed Velocity
- Kedro Framework team norms & ways of working ⭐️
- Kedro Framework Pull Request and Review team norms ✍️