Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address SettingWithCopyWarning (HMASynthesizer) #1557

Closed
npatki opened this issue Aug 23, 2023 · 3 comments · Fixed by #1685
Closed

Address SettingWithCopyWarning (HMASynthesizer) #1557

npatki opened this issue Aug 23, 2023 · 3 comments · Fixed by #1685
Assignees
Labels
data:multi-table Related to multi-table, relational datasets maintenance Tasks related to infrastructure & dependencies
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Aug 23, 2023

Environment Details

  • SDV version: 1.4.0 (latest)
  • Python version: 3.10
  • Operating System: Linux (Google Colab)

Error Description

When sampling from the HMASynthesizer, I receive a SettingWithCopyWarning that is coming directly from pandas. We should update the code based on the pandas recommendations.

Steps to reproduce

from sdv.datasets.demo import download_demo
from sdv.multi_table import HMASynthesizer

real_data, metadata = download_demo(
    modality='multi_table',
    dataset_name='fake_hotels'
)

synthesizer = HMASynthesizer(metadata)
synthesizer.fit(real_data)
synthetic_data = synthesizer.sample(scale=2)

Output:

/usr/local/lib/python3.10/dist-packages/sdv/sampling/hierarchical_sampler.py:107: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  sampled_rows[foreign_key] = parent_row[parent_key]
@npatki npatki added maintenance Tasks related to infrastructure & dependencies data:multi-table Related to multi-table, relational datasets labels Aug 23, 2023
@npatki
Copy link
Contributor Author

npatki commented Nov 7, 2023

Confirming that this is also happening on the latest SDV, v1.6.0 that was just released today.

@pvk-developer
Copy link
Member

@npatki I was not able to replicate it on this google colab could you share more details about your environment / colab notebook to replicate this?

@npatki
Copy link
Contributor Author

npatki commented Nov 14, 2023

@pvk-developer you're right. I'm also unable to replicate now. Perhaps Google Colab updated some of its dependencies.

Though wonder if we could address the caveats that pandas is showing us? The message seemed to indicate this link had more info: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

If we aren't following the recommended practice, we should probably update it anyways.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:multi-table Related to multi-table, relational datasets maintenance Tasks related to infrastructure & dependencies
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants