Insights and opportunities related to helping Kedro impact more users #2901

yetudada · 2023-08-07T09:54:13Z

Introduction

What is this?

We have conducted extensive research to understand people's motivations for using or not using Kedro. We want to improve Kedro to provide more value to data scientists, data engineers, machine learning engineers, and other users. We've compiled the research insights and potential improvement ideas in this GitHub issue so that we can prioritise concepts that make Kedro an attractive option for users across roles and skill levels.

In part, this issue addresses: kedro-org/kedro-viz#1448

What's in the scope of this work?

Our next step is to conduct value testing to identify the most impactful concepts. This user-centred process should guide us toward three high-potential solutions that can meaningfully solve pain points based on evidence directly from Kedro users.

What terminology will I be using?

To focus our research, we will look at two representative user profiles that encompass vital segments of the data science community:

"Notebook-focussed users" primarily use notebooks for analysis and may be less familiar with IDE-based coding workflows; this could include many data analysts and data scientists.
"IDE-focussed users" are comfortable using IDEs for development, indicating intermediate to advanced software engineering skills; this may include some data scientists, machine-learning engineers, and data engineers.

It's helpful to define two existing ways of using Kedro, to ensure we have a shared understanding when discussing Kedro's architecture:

"Using Kedro as a framework" encapsulates using the framework (project template, session, context and CLI) and library components (ConfigLoader, Pipeline, Runner, Data Catalog and Datasets).
"Using Kedro as a library" refers to using one or more of Kedro's library components (ConfigLoader, Pipeline, Runner, Data Catalog and Datasets) in Python scripts or notebooks. Here, a user is leveraging Kedro modular components for their capabilities and is not using the framework.

What are some of our learnings?

IDE-focussed users want to adopt Kedro in an existing use case

IDE-focussed users will try to learn how to use Kedro by refactoring an existing use case into a project that uses Kedro. Their objective is to learn how to leverage Kedro as a framework or to adopt Kedro in stages by incorporating library components into their work.

IDE-focussed users want to incorporate Kedro in an existing project template

IDE-focussed users leverage internal project templates provided by CookieCutter or tools that provide project templates like Poetry, Hydra and DVC. This user group might bypass Kedro because of the high switching cost when adopting Kedro's project template or the challenges with integrating Kedro with those tools. We recommend that our users start from a Kedro project template or starter, and this may not be possible.

IDE-focussed users want to choose the features included in the project templated generated by Kedro

IDE-focussed users have a lot of opinions about how they want their project template to be structured. There was a lot of variance on #208. Suppose an IDE-focussed user has committed to the project template created by Kedro. In that case, they still want the flexibility to choose which features are enabled and visible in their personalised template.

IDE- and notebook-focussed users will pass over Kedro for use on collaborative projects when they're the only ones that want it

Kedro is positioned as an all-or-nothing overhaul. Our users will choose not to use Kedro when placed on a collaborative project and are the only ones that want to use Kedro. Most of these perspectives are associated with adopting the framework.

Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories

Our project template has a lot of software engineering concepts embedded in it, some more necessary than others. It is reasonable to expect that a notebook-focussed user, unfamiliar with this paradigm from software engineering frameworks, would need help understanding what each directory and file does - either by using our documentation or speaking to an expert user of Kedro. This user group also needed help understanding the role of configuration, and some preferred writing their code in a single file, a notebook.

Notebook-focussed and some IDE-focussed users don't know that they can use our Data Catalog; they think that using it requires a commitment to the framework

Users assume an all-or-nothing use of the Kedro framework and do not realise they can use the Data Catalog as a stand-alone item. Our documentation for Kedro as a data registry is a very unpopular page, but we also do not talk about this functionality at all with our users.

IDE-focussed users leverage our Data Catalog to help notebook-focused users or people who don't want to use our framework

Kedro's modular architecture provides opportunities to delight users by incrementally integrating specific components like the Data Catalog. For example, IDE-focused users have used the Catalog to empower analysts on their teams. Additionally, users who found the framework restrictive or just wanted to use Kedro for data exploration have benefited from the Data Catalog.

IDE-focussed users workaround our ConfigLoader's assumptions

IDE-focussed users run into errors because our ConfigLoader requires a conf directory, makes users place their configuration in conf/base and needs conf/local to be present. We expected our users to make ConfigLoaders without these assumptions, but we have yet to see evidence that they have done this. Our users choose to use other tools instead of our ConfigLoader or have workarounds for the errors that we create. We've assumed that users would always start from a Kedro project, and that's not always true.

How are we trying to address these insights?

We have compiled a table of adoption opportunities, consolidated past and future concepts and solutions, and new ideas to build on learnings. This table catalogues challenges identified through user research and outlines the rationale behind solutions we have prototyped or proposed to address each obstacle.

What are some of our learnings?	User profile focus	What potential solutions or past approaches could address these learnings?	How does this concept help our users? What are known limitations?
IDE-focussed users want to adopt Kedro in an existing use case	IDE	`kedro init` (#2512)	`kedro init` assumes that IDE-focussed users want to adopt the framework. It only adds files to an existing project so that `kedro` recognises it as a project, these files are detailed in our architecture overview. Users will still need to make significant changes to their code, e.g. create pure Python functions, create a Python package for `src`, take out hard-coded configuration values, remove I/O, figure out how to integrate tools e.g. MLflow or DVC, and more.
IDE-focussed users want to adopt Kedro in an existing use case	IDE	Use Kedro as a library (in part in #2819)	Users would leverage Kedro's library components rather than adopting the framework. It's possible to do this today with our Data Catalog, Pipeline and Runner. The ConfigLoader has framework assumptions and requires using the `conf` structure from the project template. Users would not be able to use the CLI or Kedro-Viz.
IDE-focussed users want to incorporate Kedro in an existing project template	IDE	`kedro init` (#2512)	`kedro init` allows users to add the minimum files required for us to recognise that it's a Kedro project. This design will not address integration between the tools, e.g. look at Databricks' MLOPs stack and try to add files for Kedro to this. Nor will it provide flexibility for customising the project template created by Kedro (#2553).
IDE-focussed users want to incorporate Kedro in an existing project template	IDE	Starters	Starters enable merging project templates from different tools and help with integration but introduce a maintenance burden for users (#1961).
IDE-focussed users want to incorporate Kedro in an existing project template	IDE	Use Kedro as a library (in part in #2819)	Users would leverage Kedro's library components rather than adopting the framework. It's possible to do this today with our Data Catalog, Pipeline and Runner. The ConfigLoader has framework assumptions and requires using the `conf` structure from the project template. Users would not be able to use the CLI or Kedro-Viz.
IDE-focussed users want to choose the features included in the project templated generated by Kedro	IDE	Utility modules (#2388), also known as Kedro Incremental Starters (#2054)	For IDE-focussed users, this design assumes the user has adopted the framework and wants to limit the features (and therefore folders and files) we include in their project template. This design does not create more users with this profile, because they are already using our framework, but helps with their user experience of Kedro.
IDE- and notebook-focussed users will pass over Kedro for use on collaborative projects when they're the only ones that want it	IDE + notebook	Use Kedro as a library (in part in #2819)	This might help increase the adoption of Kedro as a library where users are still determining if they can adopt the framework for their collaborative work. Rather than all-or-nothing, users would leverage Kedro's library components rather than adopting the framework. Users would not be able to use the CLI or Kedro-Viz.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories	Notebook	Use Kedro as a library (in part in #2819)	This concept does not help with learning new software engineering concepts; our users would still need to do this. However, it does make it possible to avoid the IDE (a known challenge for this user profile). They also would not get overwhelmed by the project template. However, users could not use the CLI or use Kedro-Viz.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories	Notebook	Utility modules (#2388), also known as Kedro Incremental Starters (#2054)	This solution might make the project template more manageable to new users because it has fewer files and folders. Users opt-in for features to understand why specific files and folders get added to their projects.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories	Notebook	Export nodes from a notebook	Very few people used this solution, even when they were aware of it. They did not find it troubling to copy code into their project template. It's a user experience improvement feature and does aid the adoption of Kedro. This design also assumes that you know how Kedro comes together - awareness of `nodes.py` in the project template - and that you're not intimidated by the framework.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories	Notebook	Make it possible to use Kedro-Viz without the framework (kedro-org/kedro-viz#1459); similar to Kedro-Light	This idea builds on using Kedro as a library, and all it additionally allows users to do is visualise their Kedro pipeline. This idea inherits the benefits and downsides of using Kedro as a library.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories	Notebook	Kedro-Jupyter plugin (Slide 20)	Exporting code from a notebook (`kedro jupyter convert`) into a Kedro project was part of this concept; it's essentially a "framework in the notebook". Users did not use `kedro jupyter convert`. This plugin idea was one of the worst-rated ideas in our Kedro IDE exploratory concept tests because users wondered how to revert the code from the framework into the notebook and whether it would always work.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories	Notebook	Refactor Jupyter notebooks into Kedro projects, using for example GenAI (#2820)	Gen AI would guide users as they learn how to convert their Jupyter notebooks into framework use cases. The limitation of this idea is that we need to know how the notebook/s are structured. Users have created video walkthroughs on how to do this.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories	Notebook	Pipeline Builder UI also known as Kedro Lab, Pipeline Builder GUI or even two-way communication between framework & viz	The concept would allow users to create nodes, pipelines and reuse code without exposure to the framework. A benefit is that they would get a Kedro framework project. Users were unsure how this would compete with other tooling like Alteryx; it was one of the lowest-rated ideas in the Kedro IDE exploratory concept tests.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories	Notebook	Standalone Data Catalog also known as Mini-Kedro	This solution was supposed to make it easy for people to use Kedro for EDA (Data Catalog and ConfigLoader) by generating a mini-project template with the `conf` directory and a notebook. It also thought about a journey into the full Kedro project template. We don't have evidence to suggest this feature is well adopted; users solely leverage the Data Catalog (and even use alternative libraries for loading configuration), @Galileo-Galilel created a custom starter for his teams, and it does expect that users should have a partial understanding of the project template (`conf`).
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories	Notebook	Kedro Plugins proposed by @noklam	Gives users the ability to use the DataCatalog and ConfigLoader as standalone tools. Requires more information.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories	Notebook	Supporting the reporting use case on Kedro-Viz also known as the Parameter Editor (Slide 18)	This design assumes there is an existing Kedro framework user, and the user of this feature might not be a Kedro framework user and they want to tweak parameters to get insights. This design requires part of the team to know the framework still, but not everyone needs to know it.
Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories	Notebook	Rename the directories under `conf`	Users would still have to know much about the framework, specifically the project template, so this only helps a little. We discovered that `base` and `local` confused some users, but we never completed the rename because the results were inconclusive. @idanov considered allowing users to choose their own names (#770).
Notebook-focussed and some IDE-focussed users don't know that they can use our Data Catalog; they think that using it requires a commitment to the framework	Notebook	Use Kedro as a library (in part in #2819)	This is possible; it needs to be promoted. Users need to know they can use the Data Catalog without worry - the ConfigLoader is out of scope.
IDE-focussed users leverage our Data Catalog to help notebook-focused users or people who don't want to use our framework	IDE + notebook	Move `AbstractDataset` to `kedro-datasets` (#2409)	IDE-focussed users know how to use the Data Catalog this way, but this user group wants the `AbstractDataset` to exist in `kedro-datasets`. They don't want to import `kedro` or install dependencies related to `kedro` when leveraging this functionality (#1758).
IDE-focussed users workaround our ConfigLoader's assumptions	IDE + notebook	Make it easier to use the ConfigLoader (#2819)	This would allow users to leverage the ConfigLoader in their work, especially with the DataCatalog or Parameters. Once again, this promotes using Kedro as a library, which might translate into something other than framework adoption. But this should help notebook-focussed users with simple projects adopt some best-practice, and they will not need an IDE.

The text was updated successfully, but these errors were encountered:

merelcht · 2024-03-27T17:49:21Z

I have moved this to the Kedro wiki, because this is not an issue we would take on as Sprint work as is: https://github.com/kedro-org/kedro/wiki/Insights-and-opportunities-related-to-helping-Kedro-impact-more-users

yetudada added Issue: Feature Request New feature or improvement to existing feature Type: Parent Issue labels Aug 7, 2023

yetudada added this to the Improve onboarding experience for non-Kedro users (docs and examples) milestone Aug 7, 2023

yetudada removed the Issue: Feature Request New feature or improvement to existing feature label Aug 7, 2023

yetudada mentioned this issue Aug 7, 2023

Research summary of insights for improving Kedro's value #2902

Closed

yetudada mentioned this issue Aug 14, 2023

Convert an existing project into a project using Kedro as a Library + Framework #2924

Closed

noklam mentioned this issue Aug 21, 2023

Migration guide for switching to OmegaConfigLoader #2699

Closed

2 tasks

github-actions bot mentioned this issue Sep 1, 2023

Monthly issue metrics report #2996

Closed

merelcht closed this as not planned Won't fix, can't repro, duplicate, stale Mar 27, 2024

ElenaKhaustova mentioned this issue Jul 8, 2024

Design DataCatalog2.0 #3995

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Insights and opportunities related to helping Kedro impact more users #2901

Insights and opportunities related to helping Kedro impact more users #2901

yetudada commented Aug 7, 2023 •

edited

Loading

merelcht commented Mar 27, 2024

Insights and opportunities related to helping Kedro impact more users #2901

Insights and opportunities related to helping Kedro impact more users #2901

Comments

yetudada commented Aug 7, 2023 • edited Loading

Introduction

What is this?

What's in the scope of this work?

What terminology will I be using?

What are some of our learnings?

IDE-focussed users want to adopt Kedro in an existing use case

IDE-focussed users want to incorporate Kedro in an existing project template

IDE-focussed users want to choose the features included in the project templated generated by Kedro

IDE- and notebook-focussed users will pass over Kedro for use on collaborative projects when they're the only ones that want it

Notebook-focussed users find our framework challenging to learn because we introduce software engineering concepts, and they are also not used to splitting a project into multiple files and directories

Notebook-focussed and some IDE-focussed users don't know that they can use our Data Catalog; they think that using it requires a commitment to the framework

IDE-focussed users leverage our Data Catalog to help notebook-focused users or people who don't want to use our framework

IDE-focussed users workaround our ConfigLoader's assumptions

How are we trying to address these insights?

merelcht commented Mar 27, 2024

yetudada commented Aug 7, 2023 •

edited

Loading