From 80ad182a220d691c77efc05cc4e2b36dcecb6ac7 Mon Sep 17 00:00:00 2001 From: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com> Date: Mon, 12 Feb 2024 10:06:10 +0000 Subject: [PATCH] Streamline debugging documentation (#3608) * Add first draft Signed-off-by: lrcouto * Remoe outdated kedro jupyter convert docs Signed-off-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com> * Suggestion: Review edits Signed-off-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com> * Update FAQs Signed-off-by: Ahdra Merali * Edit jupyter ipython debug section Signed-off-by: lrcouto * Change link to section that does not exist anymore Signed-off-by: L. R. Couto * Change link to section that does not exist anymore Signed-off-by: L. R. Couto * Change wording and formatting Signed-off-by: lrcouto * Lint Signed-off-by: lrcouto * Update docs/source/notebooks_and_ipython/kedro_and_notebooks.md Co-authored-by: Jo Stichbury Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com> * Update docs/source/notebooks_and_ipython/kedro_and_notebooks.md Co-authored-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com> Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com> * Changes to the wording, remove unnecessary section Signed-off-by: lrcouto * Move docs on debugging with hooks to hooks section Signed-off-by: Ahdra Merali * Add links to main debugging page Signed-off-by: Ahdra Merali * Make notebook debugging an independent section Signed-off-by: Ahdra Merali * Update link in FAQs Signed-off-by: Ahdra Merali * Apply suggestions from code review - adjust wording Co-authored-by: Jo Stichbury Signed-off-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com> * Capitalise Hooks Signed-off-by: Ahdra Merali * Reorder links on debugging page Signed-off-by: Ahdra Merali * Use markdown admonitions Signed-off-by: Ahdra Merali * Add short explanations to debugging page Signed-off-by: Ahdra Merali --------- Signed-off-by: lrcouto Signed-off-by: Ahdra Merali <90615669+AhdraMeraliQB@users.noreply.github.com> Signed-off-by: Ahdra Merali Signed-off-by: L. R. Couto Signed-off-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com> Co-authored-by: lrcouto Co-authored-by: L. R. Couto <57910428+lrcouto@users.noreply.github.com> Co-authored-by: Jo Stichbury --- docs/source/development/debugging.md | 85 ++----------------- docs/source/faq/faq.md | 2 +- docs/source/hooks/common_use_cases.md | 76 ++++++++++++++++- .../kedro_and_notebooks.md | 14 +-- 4 files changed, 90 insertions(+), 87 deletions(-) diff --git a/docs/source/development/debugging.md b/docs/source/development/debugging.md index f0e600d56f..29c048f74a 100644 --- a/docs/source/development/debugging.md +++ b/docs/source/development/debugging.md @@ -1,83 +1,12 @@ # Debugging -## Introduction +:::note -If you're running your Kedro pipeline from the CLI or you can't/don't want to run Kedro from within your IDE debugging framework, it can be hard to debug your Kedro pipeline or nodes. This is particularly frustrating because: +Our debugging documentation has moved. Please see our existing guides: -* If you have long running nodes or pipelines, inserting `print` statements and running them multiple times quickly becomes time-consuming. -* Debugging nodes outside the `run` session isn't very helpful because getting access to the local scope within the `node` can be hard, especially if you're dealing with large data or memory datasets, where you need to chain a few nodes together or re-run your pipeline to produce the data for debugging purposes. +::: -This guide provides examples on [how to instantiate a post-mortem debugging session](https://docs.python.org/3/library/pdb.html#pdb.post_mortem) with [`pdb`](https://docs.python.org/3/library/pdb.html) using [Kedro Hooks](../hooks/introduction.md) when an uncaught error occurs during a pipeline run. [ipdb](https://pypi.org/project/ipdb/) could be integrated in the same manner. - -For guides on how to set up debugging with IDEs, please visit the [guide for debugging in VSCode](./set_up_vscode.md#debugging) and the [guide for debugging in PyCharm](./set_up_pycharm.md#debugging). - -## Debugging a node - -To start a debugging session when an uncaught error is raised within your `node`, implement the `on_node_error` [Hook specification](/api/kedro.framework.hooks): - -```python -import pdb -import sys -import traceback - -from kedro.framework.hooks import hook_impl - - -class PDBNodeDebugHook: - """A hook class for creating a post mortem debugging with the PDB debugger - whenever an error is triggered within a node. The local scope from when the - exception occured is available within this debugging session. - """ - - @hook_impl - def on_node_error(self): - _, _, traceback_object = sys.exc_info() - - # Print the traceback information for debugging ease - traceback.print_tb(traceback_object) - - # Drop you into a post mortem debugging session - pdb.post_mortem(traceback_object) -``` - -You can then register this `PDBNodeDebugHook` in your project's `settings.py`: - -```python -HOOKS = (PDBNodeDebugHook(),) -``` - -## Debugging a pipeline - -To start a debugging session when an uncaught error is raised within your `pipeline`, implement the `on_pipeline_error` [Hook specification](/api/kedro.framework.hooks): - -```python -import pdb -import sys -import traceback - -from kedro.framework.hooks import hook_impl - - -class PDBPipelineDebugHook: - """A hook class for creating a post mortem debugging with the PDB debugger - whenever an error is triggered within a pipeline. The local scope from when the - exception occured is available within this debugging session. - """ - - @hook_impl - def on_pipeline_error(self): - # We don't need the actual exception since it is within this stack frame - _, _, traceback_object = sys.exc_info() - - # Print the traceback information for debugging ease - traceback.print_tb(traceback_object) - - # Drop you into a post mortem debugging session - pdb.post_mortem(traceback_object) -``` - -You can then register this `PDBPipelineDebugHook` in your project's `settings.py`: - -```python -HOOKS = (PDBPipelineDebugHook(),) -``` +* [Debugging a Kedro project within a notebook](../notebooks_and_ipython/kedro_and_notebooks.md#debugging-a-kedro-project-within-a-notebook) for information on how to launch an interactive debugger in your notebook. +* [Debugging in VSCode](./set_up_vscode.md#debugging) for information on how to set up VSCode's built-in debugger. +* [Debugging in PyCharm](./set_up_pycharm.md#debugging) for information on using PyCharm's debugging tool. +* [Debugging in the CLI with Kedro Hooks](../hooks/common_use_cases.md#use-hooks-to-debug-your-pipeline) for information on how to automatically launch an interactive debugger in the CLI when an error occurs in your pipeline run. diff --git a/docs/source/faq/faq.md b/docs/source/faq/faq.md index e995444fb7..52f6b28ac0 100644 --- a/docs/source/faq/faq.md +++ b/docs/source/faq/faq.md @@ -14,7 +14,7 @@ This is a growing set of technical FAQs. The [product FAQs on the Kedro website] ## Working with Jupyter -* [How can I debug a Kedro project in a Jupyter notebook](../notebooks_and_ipython/kedro_and_notebooks.md#debugging-with-debug-and-pdb)? +* [How can I debug a Kedro project in a Jupyter notebook](../notebooks_and_ipython/kedro_and_notebooks.md#debugging-a-kedro-project-within-a-notebook)? * [How do I connect a Kedro project kernel to other Jupyter clients like JupyterLab](../notebooks_and_ipython/kedro_and_notebooks.md#ipython-jupyterlab-and-other-jupyter-clients)? ## Kedro project development diff --git a/docs/source/hooks/common_use_cases.md b/docs/source/hooks/common_use_cases.md index c8a2df094f..0cda6c6731 100644 --- a/docs/source/hooks/common_use_cases.md +++ b/docs/source/hooks/common_use_cases.md @@ -201,7 +201,7 @@ HOOKS = (AzureSecretsHook(),) Note: `DefaultAzureCredential()` is Azure's recommended approach to authorise access to data in your storage accounts. For more information, consult the [documentation about how to authenticate to Azure and authorize access to blob data](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python). ``` -## Use a Hook to read `metadata` from `DataCatalog` +## Use Hooks to read `metadata` from `DataCatalog` Use the `after_catalog_created` Hook to access `metadata` to extend Kedro. ```python @@ -214,3 +214,77 @@ class MetadataHook: for dataset_name, dataset in catalog.datasets.__dict__.items(): print(f"{dataset_name} metadata: \n {str(dataset.metadata)}") ``` + +## Use Hooks to debug your pipeline +You can use Hooks to launch a [post-mortem debugging session](https://docs.python.org/3/library/pdb.html#pdb.post_mortem) with [`pdb`](https://docs.python.org/3/library/pdb.html) using [Kedro Hooks](../hooks/introduction.md) when an error occurs during a pipeline run. [ipdb](https://pypi.org/project/ipdb/) could be integrated in the same manner. + +### Debugging a node + +To start a debugging session when an error is raised within your `node` that is not caught, implement the `on_node_error` [Hook specification](/api/kedro.framework.hooks): + +```python +import pdb +import sys +import traceback + +from kedro.framework.hooks import hook_impl + + +class PDBNodeDebugHook: + """A hook class for creating a post mortem debugging with the PDB debugger + whenever an error is triggered within a node. The local scope from when the + exception occured is available within this debugging session. + """ + + @hook_impl + def on_node_error(self): + _, _, traceback_object = sys.exc_info() + + # Print the traceback information for debugging ease + traceback.print_tb(traceback_object) + + # Drop you into a post mortem debugging session + pdb.post_mortem(traceback_object) +``` + +You can then register this `PDBNodeDebugHook` in your project's `settings.py`: + +```python +HOOKS = (PDBNodeDebugHook(),) +``` + +### Debugging a pipeline + +To start a debugging session when an error is raised within your `pipeline` that is not caught, implement the `on_pipeline_error` [Hook specification](/api/kedro.framework.hooks): + +```python +import pdb +import sys +import traceback + +from kedro.framework.hooks import hook_impl + + +class PDBPipelineDebugHook: + """A hook class for creating a post mortem debugging with the PDB debugger + whenever an error is triggered within a pipeline. The local scope from when the + exception occured is available within this debugging session. + """ + + @hook_impl + def on_pipeline_error(self): + # We don't need the actual exception since it is within this stack frame + _, _, traceback_object = sys.exc_info() + + # Print the traceback information for debugging ease + traceback.print_tb(traceback_object) + + # Drop you into a post mortem debugging session + pdb.post_mortem(traceback_object) +``` + +You can then register this `PDBPipelineDebugHook` in your project's `settings.py`: + +```python +HOOKS = (PDBPipelineDebugHook(),) +``` diff --git a/docs/source/notebooks_and_ipython/kedro_and_notebooks.md b/docs/source/notebooks_and_ipython/kedro_and_notebooks.md index a6cf4590b5..66337143ce 100644 --- a/docs/source/notebooks_and_ipython/kedro_and_notebooks.md +++ b/docs/source/notebooks_and_ipython/kedro_and_notebooks.md @@ -209,14 +209,8 @@ You don't need to restart the kernel for the `catalog`, `context`, `pipelines` a For more details, run `%reload_kedro?`. -## Useful to know (for advanced users) -Each Kedro project has its own Jupyter kernel so you can switch between Kedro projects from a single Jupyter instance by selecting the appropriate kernel. - -To ensure that a Jupyter kernel always points to the correct Python executable, if one already exists with the same name `kedro_`, then it is replaced. - -You can use the `jupyter kernelspec` set of commands to manage your Jupyter kernels. For example, to remove a kernel, run `jupyter kernelspec remove `. -### Debugging with %debug and %pdb +## Debugging a Kedro project within a notebook You can use the `%debug` [line magic](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-debug) to launch an interactive debugger in your Jupyter notebook. Declare it before a single-line statement to step through the execution in debug mode. You can use the argument `--breakpoint` or `-b` to provide a breakpoint. The follow sequence occurs when `%debug` runs immediately after an error occurs: @@ -264,6 +258,12 @@ Some examples of the possible commands that can be used to interact with the ipd For more information, use the `help` command in the debugger, or take at the [ipdb repository](https://github.com/gotcha/ipdb) for guidance. +## Useful to know (for advanced users) +Each Kedro project has its own Jupyter kernel so you can switch between Kedro projects from a single Jupyter instance by selecting the appropriate kernel. + +To ensure that a Jupyter kernel always points to the correct Python executable, if one already exists with the same name `kedro_`, then it is replaced. + +You can use the `jupyter kernelspec` set of commands to manage your Jupyter kernels. For example, to remove a kernel, run `jupyter kernelspec remove `. ### Managed services