Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnboundLocalError: cannot access local variable 'pipelines_package' where it is not associated with a value #3847

Open
JenspederM opened this issue May 2, 2024 · 8 comments
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed

Comments

@JenspederM
Copy link

JenspederM commented May 2, 2024

Description

Error is thrown when trying to print find_pipelines from the kedro.framework.project module.

Context

Unable to use find_pipelines

Steps to Reproduce

  1. Add print(find_pipelines()) to the bottom of the pipeline_regitry.py file
  2. Run the file python ./src/<project>/pipeline_regitry.py

Expected Result

A dict of pipelines.

Actual Result

I get the following error:

[05/02/24 18:05:49] WARNING  /Users/.../.venv/lib/python3.12/site-pac warnings.py:110
                             kages/kedro/framework/project/__init__.py:350: UserWarning: An error                      
                             occurred while importing the 'None.pipeline' module. Nothing defined                      
                             therein will be returned by 'find_pipelines'.                                             
                                                                                                                       
                             Traceback (most recent call last):                                                        
                               File                                                                                    
                             "/Users/.../.venv/lib/python3.12/site-pa                
                             ckages/kedro/framework/project/__init__.py", line 347, in find_pipelines                  
                                 pipeline_module = importlib.import_module(pipeline_module_name)                       
                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                       
                               File                                                                                    
                             "/Users/.../.rye/py/[email protected]/install/lib/python3.12/i                
                             mportlib/__init__.py", line 90, in import_module                                          
                                 return _bootstrap._gcd_import(name[level:], package, level)                           
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                           
                               File "<frozen importlib._bootstrap>", line 1387, in _gcd_import                         
                               File "<frozen importlib._bootstrap>", line 1360, in _find_and_load                      
                               File "<frozen importlib._bootstrap>", line 1310, in                                     
                             _find_and_load_unlocked                                                                   
                               File "<frozen importlib._bootstrap>", line 488, in                                      
                             _call_with_frames_removed                                                                 
                               File "<frozen importlib._bootstrap>", line 1387, in _gcd_import                         
                               File "<frozen importlib._bootstrap>", line 1360, in _find_and_load                      
                               File "<frozen importlib._bootstrap>", line 1324, in                                     
                             _find_and_load_unlocked                                                                   
                             ModuleNotFoundError: No module named 'None'                                               
                                                                                                                       
                               warnings.warn(                                                                          
                                                                                                                       
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/.../project/src/project/pipeline_registy.py:21 in <module>                                                                             │
│                                                                                                  │
│   18                                                                                             │
│   19                                                                                             │
│   20 if __name__ == "__main__":                                                                  │
│ ❱ 21 │   print(register_pipelines())                                                             │
│   22                                                                                             │
│                                                                                                  │
│ /Users/.../project/src/project/pipeline_registry.py:15 in register_pipelines                                                                   │
│                                                                                                  │
│   12 │   Returns:                                                                                │
│   13 │   │   A mapping from pipeline names to ``Pipeline`` objects.                              │
│   14 │   """                                                                                     │
│ ❱ 15 │   pipelines = find_pipelines()                                                            │
│   16 │   pipelines["__default__"] = sum(pipelines.values())                                      │
│   17 │   return pipelines                                                                        │
│   18                                                                                             │
│                                                                                                  │
│ /Users/.../.venv/lib/python3.12/site-packages/kedro/framework/project/__init__.py:367 in find_pipelines                                                        │
│                                                                                                  │
│   364 │   │   if str(exc) == f"No module named '{PACKAGE_NAME}.pipelines'":                      │
│   365 │   │   │   return pipelines_dict                                                          │
│   366 │                                                                                          │
│ ❱ 367 │   for pipeline_dir in pipelines_package.iterdir():                                       │
│   368 │   │   if not pipeline_dir.is_dir():                                                      │
│   369 │   │   │   continue                                                                       │
│   370                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
UnboundLocalError: cannot access local variable 'pipelines_package' where it is not associated with a value

Your Environment

  • Kedro version used (pip show kedro or kedro -V): kedro, version 0.19.5
  • Python version used (python -V): Python 3.12.2 using rye as package manager
  • Operating system and version: M1 Mac with macOS Sonoma Version 14.4.1
@merelcht
Copy link
Member

Hi @JenspederM, thanks for flagging this issue. Can I ask what your use case is for printing the result of find_pipelines()?

This method has been added to enable auto discovery of pipelines and does some stuff in the back to make sure your project and its modules are discoverable (https://docs.kedro.org/en/stable/nodes_and_pipelines/pipeline_registry.html). It's meant to run as part of a "regular" Kedro flow where it's preceded by certain project setup methods. You can fix your script by calling bootstrap_project() before find_pipelines() (https://docs.kedro.org/en/stable/kedro_project_setup/session.html#bootstrap-project-and-configure-project). However, I would only recommend doing that for exploration and not if you're planning to run that code in production.

Let me know if this makes sense!

@merelcht merelcht added the Community Issue/PR opened by the open-source community label May 21, 2024
@JenspederM
Copy link
Author

JenspederM commented May 21, 2024

Hi @merelcht,

Thank you for your reply.

I am using find_pipelines() to generate databricks assets bundle resources. I am working on a template for asset bundles that uses Kedro for defining pipelines and dependencies and databricks workflows for scheduling. You can find the project here

Thanks for the suggesting bootstrap_project(). For now, I have been using configure_project(<package-name>) as used in databricks_run.py in the databricks-iris starter.

You can see my exact usage right here

@JenspederM
Copy link
Author

@merelcht

I have been thinking of making a cookiecutter for Kedro as well. Do you think there would be any interest in this?

I made the template based on my own experience of running large scale Databricks projects in production with many contributors of varying levels of experience.

@astrojuanlu
Copy link
Member

I'd say, regardless of use case, raising an UnboundLocalError from internal code should not happen, but a more informative error instead.

I have been thinking of making a cookiecutter for Kedro as well. Do you think there would be any interest in this?

Of course! When you get to do it, we can promote it on https://github.com/kedro-org/awesome-kedro

Also consider exploring https://github.com/copier-org/copier/, a modern alternative to cookiecutter

@astrojuanlu astrojuanlu added the Issue: Bug Report 🐞 Bug that needs to be fixed label May 29, 2024
@JenspederM
Copy link
Author

The only problem that I haven't really found a solution for is how I would get the workspace host from the users' Databricks config without using the Databricks CLI.

@JenspederM
Copy link
Author

I'd say, regardless of use case, raising an UnboundLocalError from internal code should not happen, but a more informative error instead.

@astrojuanlu I also looked into the UnboundLocalError, and I see that it could be resolved by adding asserts or running validate_settings() in find_pipelines() and ParallelRunner._run().

Or does it deserve a greater redesign?

IMO global variables can be quite dangerous when used like this, so I would probably advice for redesigning this logic to remove the use of globals.

@astrojuanlu
Copy link
Member

Moving this to our Inbox so that we can look at it and it doesn't get lost.

@astrojuanlu
Copy link
Member

IMO global variables can be quite dangerous when used like this, so I would probably advice for redesigning this logic to remove the use of globals.

For the record, I agree

@merelcht merelcht removed the Community Issue/PR opened by the open-source community label Jul 15, 2024
@merelcht merelcht moved this to To Do in Kedro Framework Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed
Projects
Status: To Do
Development

No branches or pull requests

3 participants