Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace invalid symbols in the labels for metadata visualization #1670

Merged
merged 9 commits into from
Nov 13, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions sdv/metadata/visualization.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,8 @@ def _get_graphviz_extension(filepath):

return None, None

def _replace_special_characters(string):
return string.replace('<', '_less_than_').replace('>', '_greater_than_')

def visualize_graph(nodes, edges, filepath=None):
"""Plot metadata usign graphviz.
Expand Down Expand Up @@ -105,10 +107,10 @@ def visualize_graph(nodes, edges, filepath=None):
)

for name, label in nodes.items():
digraph.node(name, label=graphviz.escape(label))
digraph.node(name, label=_replace_special_characters(label))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it doesn't address the problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. It seems like it works if you replace '>' with '\>'. Then in the output graph the label looks correct
Screenshot 2023-11-08 at 9 51 51 PM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, this approach removes the backslash from the column name. So if a column has \< in it, the output will be <.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's ok since prior to this change if they had '\>' in a label, it would only show '>' anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About this, can we add an integration test where we fit and sample a synthesizer after the visualize() to ensure we're not breaking the metadata validation or the fit and sample with the change


for parent, child, label in edges:
digraph.edge(parent, child, label=graphviz.escape(label), arrowhead='oinv')
digraph.edge(parent, child, label=_replace_special_characters(label), arrowhead='oinv')

if filename:
digraph.render(filename=filename, cleanup=True, format=graphviz_extension)
Expand Down