Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create IDE plugins for major editors #2821

Closed
astrojuanlu opened this issue Jul 18, 2023 · 20 comments
Closed

Create IDE plugins for major editors #2821

astrojuanlu opened this issue Jul 18, 2023 · 20 comments
Labels
Issue: Feature Request New feature or improvement to existing feature Type: Parent Issue

Comments

@astrojuanlu
Copy link
Member

astrojuanlu commented Jul 18, 2023

We have evidence that users struggle when assigning catalog entries to node functions. For example, #2726:

when dataset names are strings, IDEs can't check that they actually match. I ran into this problem before when some code changes lead to the wrong dataset being used. When dataset names are variable names, this kind of issues will be caught sooner.

This is something that an IDE extension could help with. There is already a kedro-lsp extension (see #712 (comment)), but doesn't appear to be maintained anymore.

Such extension could help with other things, to be defined.

Evidence markers

  • Helps with retention of existing users; most users of this are IDE users and they're already Kedro users
  • Might help with visibility if it's featured on the marketplace (not sure)
@astrojuanlu astrojuanlu added the Issue: Feature Request New feature or improvement to existing feature label Jul 18, 2023
@astrojuanlu
Copy link
Member Author

@datajoely
Copy link
Contributor

@astrojuanlu excited to see this move ever so slightly forward.

The number one feature for me is the link between our "magic string" input/output/parameter references in node definitions and their YAML counterpart in the catalog. IDE users are used to ⌘ Command + Click symbols and jumping to their definition.

  • The pre-requisite for building this is to store the YAML line no/cursor position of the catalog entry at load-time.
  • I've held off building this myself (for a couple of years at least) since we the config loader was in flux and only now feels like it's reaching a stable future design.
    • I think some of this metadate we'd need is hidden by the OmeagaConf.load method
    • This also gets complicated when resolving the 'winning' key after environment resolution, potentially even more so once we take factories into account.
    • If you look at the implementation for @limdauto's kedro-lsp the majority of the work is spent building a YAML scanner.

If we were to store the file/line number reference in the live catalog object we could do exciting things.

@astrojuanlu
Copy link
Member Author

I had another user complain about hard navigation/autocomplete/typo detection for dataset names. Creating an IDE plugin as stated in this issue is of course one of the possible solutions, maybe others could be explored.

@noklam
Copy link
Contributor

noklam commented Sep 9, 2023 via email

@astrojuanlu
Copy link
Member Author

Nice work from the DVC folks on their VSCode extension

Screenshot 2023-12-07 at 12 28 16 Screenshot 2023-12-07 at 12 28 30

@astrojuanlu
Copy link
Member Author

The pre-requisite for building this is to store the YAML line no/cursor position of the catalog entry at load-time.

For reference, ruamel.yaml provides the cursor information:

In [11]: from ruamel.yaml import YAML

In [12]: yaml = YAML()

In [13]: data = yaml.load("""
    ...:      # testing line and column based on SO
    ...:      # http://stackoverflow.com/questions/13319067/
    ...:      - key1: item 1
    ...:        key2: item 2
    ...:      - key3: another item 1
    ...:        key4: another item 2
    ...:         """)

In [14]: data
Out[14]: [{'key1': 'item 1', 'key2': 'item 2'}, {'key3': 'another item 1', 'key4': 'another item 2'}]

In [15]: type(data)
Out[15]: ruamel.yaml.comments.CommentedSeq

In [16]: data[0].lc
Out[16]: LineCol(3, 7)

In [17]: type(data[0])
Out[17]: ruamel.yaml.comments.CommentedMap

However:

I think some of this metadate we'd need is hidden by the OmeagaConf.load method

Indeed, OmegaConf.load I/O is coupled with PyYAML unless an already loaded object is directly provided:

https://github.com/omry/omegaconf/blob/fd730509ef10a074f97b1738c630720157ceeeab/omegaconf/omegaconf.py#L183-L193

I think this supports the idea of separating the loading from the resolving part as proposed in #2481

@astrojuanlu
Copy link
Member Author

Similar case of trying to parse YAML and give good error messages:

The initial version of rattler-build used serde & serde_yaml to parse the recipe. That worked OK, but was limited because we could not get great error messages out of serde_yaml, since it doesn't report the locations (line & column) of the encountered issues.

https://prefix.dev/blog/rattler_build_a_new_parser

@astrojuanlu
Copy link
Member Author

However, the primary downside for me was the requirement to set up configurations using YAML.I would prefer it to be closed within a Python script because editor completion.

https://www.reddit.com/r/kaggle/comments/18j4c0e/what_pipeline_libraries_do_you_recommend_for/?utm_source=share&utm_medium=web2x&context=3

@inigohidalgo
Copy link
Contributor

Just saw this issue linked from the slack conversation.

This would solve my number one problem with kedro as a user. I raised this in a user interview some months back, and also in various conversations: the disconnect between string objects in python and the objects which those strings represent within kedro. Refactoring is a huge hassle whenever it involves changing the shape of pipelines as there always end up being orphan datasets.

So this issue has a huge upvote from me.

@noklam
Copy link
Contributor

noklam commented Mar 8, 2024

There are many technical complexity for this, i.e. dynamic generated config/catalog

  • dataset patterns
  • variable interpolation
  • resolver

Nonetheless, I believe it's a huge improvement and we don't necessary wait until we have a full solution. I spent quite a bit of time to look at the original kedro-lsp and finally make something that runs in 0.19.x. I'm pretty excited about this.
lsp

@datajoely
Copy link
Contributor

The tool tip is awesome here! I guess dataset factories would work the same, I still think it would be nice to have an equivalent of dbt compile where you could jump to the resolved config.

@noklam
Copy link
Contributor

noklam commented Mar 8, 2024

IDE Plugins are very broad, I have seen a few things mentions here and recalled some discussion in the past:

IDE support:

  • VSCode (seems to be the easiest)
  • PyCharm suport LSP
  • Notebook - There are jupyter-lsp, is it possible to make some features work on notebook too?
  • terminal (niche but possible)

Backward compatibility: I am not sure yet how to make VSCode plugin that backward compatible, or maybe we shouldn't care this so soon because this can be a good reason to drive more 0.19 adoption.

User Research: (??) - there are many possibilities and unknown here

@datajoely
Copy link
Contributor

Honestly I think VS Code is the right call for any initial MVP - you can just point to DVC and Databricks making the same call

@astrojuanlu
Copy link
Member Author

Yeah the title of this issue is too broad. Let's start with an LSP for Kedro, basically bringing kedro-lsp back to life and working with Kedro 0.19 + docs for how to set it up on VSCode. That's already a massive usability boost for folks.

@noklam shall we open a separate issue for it?

@noklam
Copy link
Contributor

noklam commented Mar 8, 2024

#3691 - let's move the LSP specific discussion here

@noklam
Copy link
Contributor

noklam commented Apr 22, 2024

https://blog.jetbrains.com/platform/2023/07/lsp-for-plugin-developers/

I did some research today and found that in theory Pycharm support LSPs, but it's only limited to paid users which is disappointing.

@datajoely
Copy link
Contributor

Disappointing but I think it's still a great step especially if we target an enterprise user persona

@astrojuanlu
Copy link
Member Author

VSCode plugin exists already, see #3691

@astrojuanlu
Copy link
Member Author

Considering this done for now, if we ever decide to target other editors we can model those after the existing VS Code extension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature Type: Parent Issue
Projects
Archived in project
Development

No branches or pull requests

5 participants