Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling should not create a file called ‘.sample.csv.temp’ by default #2029

Closed
cynddl opened this issue May 25, 2024 · 1 comment
Closed
Labels
bug Something isn't working feature:sampling Related to generating synthetic data after a model is built resolution:duplicate This issue or pull request already exists

Comments

@cynddl
Copy link

cynddl commented May 25, 2024

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • SDV version: 1.13.1
  • Python version: 3.11.9
  • Operating System: macOS (arm)

Error Description

Calling model.sample on any single table model (at least, I didn't check others) always writes to the disk an output file, even if output_file_path is set to None. The docstring mentions:

output_file_path (str or None):
                The file to periodically write sampled rows to. If None, does not
                write rows anywhere.

so I'd assume no file would be created.

Either the function definition for sample() should be updated to set output_file_path=DISABLE_TMP_FILE by default, either setting None should defaults to DISABLE_TMP_FILE?

(In my case, the temporary file is problematic because I use sdv is with ray (which attempts to serialise data across multiple processes—and crashes when sdv creates this file called ".sample.csv.temp". I'd much prefer not writing to the disk, than tinkering with unique temporary files in order to not crash Python)

Steps to reproduce

N/A

See also

#1310

@cynddl cynddl added bug Something isn't working new Automatic label applied to new issues labels May 25, 2024
@npatki
Copy link
Contributor

npatki commented May 31, 2024

Hi @cynddl, thanks for the details.

I've consolidate both this and the previous #1310 into a new feature request with an updated API definition. You can follow along any updates in #2042 as we track the fix. Thanks.

@npatki npatki closed this as completed May 31, 2024
@npatki npatki added resolution:duplicate This issue or pull request already exists feature:sampling Related to generating synthetic data after a model is built and removed new Automatic label applied to new issues labels May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature:sampling Related to generating synthetic data after a model is built resolution:duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants