Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: profiling in EDA tool #954

Merged
merged 47 commits into from
Apr 6, 2024
Merged

Conversation

SmiteDeluxe
Copy link
Contributor

@SmiteDeluxe SmiteDeluxe commented Mar 30, 2024

Closes #929

Summary of Changes

Communication from EDA -> Runner in own file that allows appending of code to the executed pipeline to get and generate new info.
This results in full profiling being done already, fetched when needed and displayed in webview.
Also includes start of from webview triggered actions that append code, in the form of filters. But this is to be picked up in new issue.

Other changes include refactorings and fixes for visual and edge case bugs.

Important to run this:

Some environments might need a change in the runner for images (plots) to work, that will be pushed to the Runner repo in PR Safe-DS/Runner#63.

To PipelineManager:

from safeds.data.image.containers import Image
import torch

To beginning of save_placeholder method:

if isinstance(value, Image):
    value = Image(value._image_tensor, torch.device("cpu"))

- runnerApi that now handles any eda vscode requests to runner, for starter getPlaceholderValue
     - edaPanel now renamed panelsMap to instancesMap that also saves RunnerApi instance
     - RunnerApi instance takes: services, pipelinePath (which is needed to get the document of pipeline that is to be extended)
- along with above no longer option for PanelIdentifier or pipelineId to be undefined, not needed anymore without dev methods that start blank eda sessions and this way more safe and solid
- revaling already exisiting panel bug where undefined state, found out it's because the _update() method is only fully executed after current state was already found
     - new variable updateHtmlDone that is set to false on either creating new panel or revealing current one and to true on _update() done; constructCurrentState will only be executed once that variable is true or with a timeout of 10s
- filters:
    - for the filter to decide between: 1. search string 2. value range 3. distinct value, the tableView must decide for each column if numerical => value range or categorical => many values => search string OR little Values => distinct value
        - for that profiling for numerical items (that show % for example) now have new property "interpretation" which can be "warn" (for missing val), "category" (for this) or "default"
        - faster than iterating through categorical cols to count values for huge data, since we have this info on profiling generation already
    - whereas for value range find min max ourselves, since otherwise would need more in pipeline and more placeholder value queries, and this should be faster
filters then in own component that decides what to show and calls vscode to initiate the runner code execution
- as deliberated before kind of: selections are now not cleared by clicks anywhere anymore but only clicks on main cells, as the global window listener to clear was getting too convoluted, now don't need the rowClicked or columnClicked params anymore
    - preventClicks store is now set to false on context menu close with 100ms delay to allow time to prevent clicks
    - handleRightClickEnd decides per menu if to close and set those cleanup things or not, like for filter it doesn't if clicked anywhere in context menu by looping over html elements and their parents
- pipelineId renamed to piprlineExecutionId for more sense
multiple pipelines:
    - eda from context now gets exact ast node of placeholder with range of the executed context and from there the pipeline container => pipeline name
    - pipeline name is passed to execute pipeline which is now needed
    - also sent to eda where 1. it is used for tableIdentifier as pipelineName + '.' + tableName (tableName new param used and passed outside of tableIdentifier) and 2. it is passed to runner for pipeline execution
    - in Runner the pipeline in question is found and in front of it's closing } the new code is then added
- getStateByPlaceholder now getTableByPlaceholder and transform to state obj in eda class, whee table name and table identifier are known, this method now only relevant stuff
- createOrShow async as well as calling register command methods
- runnerApi now instance var of panel
- got rid of an excess profiling banner div:
    - Now apparently now methods for table resize/TableSpace needed anymore, all handled by html itself
    - Meaning I also cannot change much about that mechanism, other than the min width, set by setting width on startup of the elements
    - Also means a lot less code and complexity!
    - savedColumnWidths now is a svelte store, that the table subscribes to, meaning it updates correctly, only needed on resize and reorder (when letting the col go); also now no manual setting of stlyle for this anymore
    - the automatic handling by html meant that reordering did not properly take out of table, so now a "reorderPrototype" of a column header that is used for the under cursor display and updates with relevant data, while column is made "display: none" in table
    - Min width maybe as initial width if it is being streched, then increasing size of a col will not result in others shrinking, but how to decide if in full view or not?
    - Also full view makes scrolling for fixed stuff lag?? Mabye visible scroll bar or extra tiny div
- Fixed that full view makes fixed stuff lag by making table width 100.1% instead of 100%, so always tiny bit out of view that causes scroll to exist
- increased scroll buffer a bit to make more fluent
- now you cannot see the table text through the borders of the headers/profiling anymore if table scrolled
    - have an absolute div at 100% with at top that is bright bg color
normal height = 2 * rowHeight
    - if profiling expanded then delayed (since height animation of profilingInfo) setting of height to 2 * rowHeight + profilingInfo height, not complete height as not including for example borders but enough to cover all bg space that let's text through
    - top prop of this also = scrollTop
- changed profiling to always include value, not name, as we display values
- thus image string also as "value" as well as string when just using "text" type (prev. "name" type)
- profiling placeholder name gen now not random but with codegen prefix + incr counter number
@SmiteDeluxe
Copy link
Contributor Author

@lars-reimann @WinPlay02 Now the Table hashing appears to be working but the getColumn() command fails because of another reason:
image

My code is still:

package b

from safeds.data.tabular.containers import Table

pipeline mainpipeline {
    val Tithhghgani = Table.fromCsvFile("/home/jonas/titanic.csv");
    val test = Tithhghgani.getColumn("Sex");
    val test2 = test.missingValueRatio();
    val Cereaddfkf5fl = Table.fromCsvFile("/home/jonas/cereal.csv");
}

pipeline secondary {
    val Titadssfsffjgfg3d832 = Table.fromCsvFile("/home/jonas/netflix_titles.csv");
    val Cereaf2gfgfff8dl = Table.fromCsvFile("/home/jonas/tweets.csv");
}

And my steps were to merge main into this branch, run npm i and npm run langium:generate and then pull Runner and run poetry install

@WinPlay02
Copy link
Contributor

@SmiteDeluxe Could you attach the full log here? The messages before the error would be helpful to find the issue

@SmiteDeluxe
Copy link
Contributor Author

@WinPlay02 Here you go
image

@WinPlay02
Copy link
Contributor

@WinPlay02 Here you go image

@SmiteDeluxe I see, thanks. I'm looking into a fix

packages/safe-ds-vscode/src/extension/mainClient.ts Outdated Show resolved Hide resolved
packages/safe-ds-vscode/src/extension/mainClient.ts Outdated Show resolved Hide resolved
packages/safe-ds-vscode/src/extension/mainClient.ts Outdated Show resolved Hide resolved
packages/safe-ds-vscode/src/extension/mainClient.ts Outdated Show resolved Hide resolved
packages/safe-ds-vscode/src/extension/mainClient.ts Outdated Show resolved Hide resolved
packages/safe-ds-vscode/src/extension/mainClient.ts Outdated Show resolved Hide resolved
lars-reimann pushed a commit that referenced this pull request Apr 3, 2024
- fix: generating memoized class member calls (on non-static members)
should still use the class for python code generation
- added testcase
------------------
Fixes the (last) issue in #954

---------

Co-authored-by: megalinter-bot <[email protected]>
WinPlay02 and others added 4 commits April 3, 2024 22:17
- addToAndExecutePipeline now also rejects on runtime error, thus don't have to wait for timeout
- RunnerAPI now gets passed pipelineNode from which then the end of pipeline is found, as previous approach could lead to bugs if not exactly matching pattern
- some refactoring
@SmiteDeluxe
Copy link
Contributor Author

@WinPlay02 Now the getColumn() call on it's own works, just like missingValueRatio(), the problem is that a chained call of .getColumn("Sex").missingValueRatio() does not work and throws the error below:
image

code:

package b

from safeds.data.tabular.containers import Table
from safeds.data.tabular.containers import Column

pipeline mainpipeline {
    val Titanghijjihfgk = Table.fromCsvFile("/home/jonas/titanic.csv");
    val test = Titanghijjihfgk.getColumn("Sex").missingValueRatio();
    val Cereaddfkf5fl = Table.fromCsvFile("/home/jonas/cereal.csv");
}

pipeline secondary {
    val Titadssfsffjgfg3d832 = Table.fromCsvFile("/home/jonas/netflix_titles.csv");
    val Cereaf2gfgfff8dl = Table.fromCsvFile("/home/jonas/tweets.csv");
}

This worked before with my own fake stubs.

@WinPlay02
Copy link
Contributor

@WinPlay02 Now the getColumn() call on it's own works, just like missingValueRatio(), the problem is that a chained call of .getColumn("Sex").missingValueRatio() does not work and throws the error below: image

code:

package b

from safeds.data.tabular.containers import Table
from safeds.data.tabular.containers import Column

pipeline mainpipeline {
    val Titanghijjihfgk = Table.fromCsvFile("/home/jonas/titanic.csv");
    val test = Titanghijjihfgk.getColumn("Sex").missingValueRatio();
    val Cereaddfkf5fl = Table.fromCsvFile("/home/jonas/cereal.csv");
}

pipeline secondary {
    val Titadssfsffjgfg3d832 = Table.fromCsvFile("/home/jonas/netflix_titles.csv");
    val Cereaf2gfgfff8dl = Table.fromCsvFile("/home/jonas/tweets.csv");
}

This worked before with my own fake stubs.

@SmiteDeluxe A fix is prepared in #987

@SmiteDeluxe
Copy link
Contributor Author

SmiteDeluxe commented Apr 5, 2024

@WinPlay02 @lars-reimann profiling is working again now!

@lars-reimann Can you whenever you have time look over the open conversations again and see if there is anything else needed before we can merge?

@lars-reimann lars-reimann changed the title feat: add code to pipelines (#929) & in the end full profiling and refactors/fixes feat: profiling in EDA tool Apr 6, 2024
Copy link
Member

@lars-reimann lars-reimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congratulations on finishing this. It looks great and works well.

@lars-reimann lars-reimann merged commit 854122c into main Apr 6, 2024
7 checks passed
@lars-reimann lars-reimann deleted the 929-eda-append-code-to-pipelines branch April 6, 2024 17:53
lars-reimann pushed a commit that referenced this pull request Apr 6, 2024
## [0.10.0](v0.9.0...v0.10.0) (2024-04-06)

### Features

* add settings to enable inlay hints individually ([#992](#992)) ([b0f3e62](b0f3e62))
* filter suggestions by node type ([#999](#999)) ([8d22e67](8d22e67)), closes [#998](#998)
* forbid instance and static class members with same name ([#988](#988)) ([7fa6fd4](7fa6fd4))
* improved completion provider ([#997](#997)) ([61e776b](61e776b)), closes [#41](#41)
* inlay hints for inferred types of lambda parameters ([#993](#993)) ([c064e0e](c064e0e))
* mark entire type cast as wrong if cast is impossible ([#991](#991)) ([72d4e2e](72d4e2e))
* profiling in EDA tool ([#954](#954)) ([854122c](854122c)), closes [#929](#929)
* require `safe-ds-runner>=0.8.0,<0.9.0` ([#976](#976)) ([1003e6c](1003e6c))
* resolve name paths in `{[@link](https://github.com/link) }` tags in documentation ([#978](#978)) ([b59d6f0](b59d6f0))

### Bug Fixes

* catch internal errors caused by wrong synthetic nodes created by completion provider ([#1001](#1001)) ([8a6ab99](8a6ab99))
* chained memoized calls ([#987](#987)) ([df89291](df89291))
* correctly import declarations for member functions ([#983](#983)) ([79f9b08](79f9b08))
* error in Python generator for assignments with class/enum variant call as RHS ([#977](#977)) ([46b2bb2](46b2bb2)), closes [#975](#975)
* generation of memoized class member calls ([#982](#982)) ([ed06aef](ed06aef))
* generation of Python imports ([#979](#979)) ([f69d836](f69d836)), closes [#974](#974)
* invalid Python code generated for constructor calls ([#981](#981)) ([c7d006f](c7d006f)), closes [#980](#980)
* Python generation for type casts ([#1000](#1000)) ([621ab86](621ab86))
@lars-reimann
Copy link
Member

🎉 This PR is included in version 0.10.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@lars-reimann lars-reimann added the released Included in a release label Apr 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 💡 New feature or request released Included in a release vscode 🔨 Issues regarding tools like the VS Code extension
Projects
None yet
Development

Successfully merging this pull request may close these issues.

EDA: Create logic for appending code to pipelines
4 participants