Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make SDV compatible with SDMetrics 0.12.1 #1650

Closed
npatki opened this issue Oct 24, 2023 · 0 comments · Fixed by #1663
Closed

Make SDV compatible with SDMetrics 0.12.1 #1650

npatki opened this issue Oct 24, 2023 · 0 comments · Fixed by #1663
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Oct 24, 2023

Problem Description

SDMetrics 0.12.0 hasn't been released yet (as of writing this issue). When it is released, we'll have to make some updates in the SDV visualizations wrappers to support the new functionality.

Note that SDV has a pinned requirement of sdmetrics<0.12. So we'll need to bump this up too when we're done.

Expected behavior

Update the visualization functions for both Single Table and Multi Table.

  • get_column_plot:
    • Add an optional parameter called plot_type
      • By default, we should determine the plot type based on the metadata. (a) if sdtype is numerical/datetime, use 'distplot', (b) if sdtype is categorical/boolean, use 'bar', (c) otherwise, the data is incompatible so raise an Error.
      • If the user provides a plot type, then their provided value overrides the logic above
    • For datetime columns, SDV should convert the data to a datetime64 dtype, using the provided datetime_format from the metadata. Pass the converted data along to the SDMetrics visualization
  • get_column_pair_plot
    • Similar to above, add an optional parameter called plot_type
      • By default, determine plot type based on meatdata. (a) if both sdtypes are numerical/datetime, then use 'scatter', (b) if both sdtypes are categorical/boolean, then use 'heatmap', (c) if one is numerical/datetime and the other is categorical/boolean, use 'box', (d) otherwise, raise an Error because the data is incompatible
      • If the user provides a plot type, t hen their provided value overrides the logic above
    • For datetime columns, SDV should convert the data to a datetime64 dtype, using the provided datetime_format from the metadata. Pass the converted data along to the SDMetrics visualization
  • (multi-table only) get_cardinality_plot
    • Add an optional parameter called plot_type
      • By default, the plot type should be 'bar' in all cases
      • If the user provides a plot type, then pass that along instead

Additional Context

For the error messages, we may want to create a new type of error for visualizations.

>>> fig = get_column_plot(
    real_data=real_data,
    synthetic_data=synthetic_data,
    column_name='user_id',
    metadata=metadata,
)

VisualizationUnavailableError: The column 'user_id' has sdtype 'id', which does not have a supported visualization. To visualize this data anyways, please add a 'plot_type'.
@npatki npatki added the feature request Request for a new feature label Oct 24, 2023
@amontanez24 amontanez24 added this to the 1.6.0 milestone Oct 31, 2023
@amontanez24 amontanez24 changed the title Make SDV compatible with SDMetrics 0.12.0 Make SDV compatible with SDMetrics 0.12.1 Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants