Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace invalid symbols in the labels for metadata visualization #1670

Merged
merged 9 commits into from
Nov 13, 2023

Conversation

fealho
Copy link
Member

@fealho fealho commented Nov 7, 2023

CU-86ayfuk2q
Resolve #1625.

@codecov-commenter
Copy link

codecov-commenter commented Nov 7, 2023

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (3e0de8e) 96.97% compared to head (c1280ac) 96.95%.
Report is 5 commits behind head on main.

❗ Current head c1280ac differs from pull request most recent head 8e8a6e6. Consider uploading reports for the commit 8e8a6e6 to get more accurate results

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1670      +/-   ##
==========================================
- Coverage   96.97%   96.95%   -0.03%     
==========================================
  Files          47       47              
  Lines        4263     4265       +2     
==========================================
+ Hits         4134     4135       +1     
- Misses        129      130       +1     
Files Coverage Δ
sdv/metadata/visualization.py 41.30% <25.00%> (+0.39%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fealho fealho marked this pull request as ready for review November 7, 2023 20:17
@fealho fealho requested a review from a team as a code owner November 7, 2023 20:17
@fealho fealho requested review from R-Palazzo and amontanez24 and removed request for a team November 7, 2023 20:17
@amontanez24
Copy link
Contributor

Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a test but otherwise looks good!

@@ -105,10 +107,10 @@ def visualize_graph(nodes, edges, filepath=None):
)

for name, label in nodes.items():
digraph.node(name, label=graphviz.escape(label))
digraph.node(name, label=_replace_special_characters(label))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it doesn't address the problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. It seems like it works if you replace '>' with '\>'. Then in the output graph the label looks correct
Screenshot 2023-11-08 at 9 51 51 PM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, this approach removes the backslash from the column name. So if a column has \< in it, the output will be <.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's ok since prior to this change if they had '\>' in a label, it would only show '>' anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About this, can we add an integration test where we fit and sample a synthesizer after the visualize() to ensure we're not breaking the metadata validation or the fit and sample with the change

Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an integration test for this case

@@ -105,10 +107,10 @@ def visualize_graph(nodes, edges, filepath=None):
)

for name, label in nodes.items():
digraph.node(name, label=graphviz.escape(label))
digraph.node(name, label=_replace_special_characters(label))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's ok since prior to this change if they had '\>' in a label, it would only show '>' anyway.

Comment on lines 20 to 21
data1 = pd.DataFrame({'>': ['a', 'b', 'c']})
data2 = pd.DataFrame({'>': ['a', 'b', 'c']})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test both symbols? and maybe the symbols mixed with text

Copy link
Contributor

@R-Palazzo R-Palazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!



def test_visualize_graph_for_multi_table():
"""Test it runs when a column name contains symbols."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put a different message that the test before to be more specific

@fealho fealho merged commit 9bd0e57 into main Nov 13, 2023
@fealho fealho deleted the issue-1625-update-visualization branch November 13, 2023 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to visualize metadata (Error: bad label format and CalledProcessError)
4 participants