Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the purpose of train and test set? #1

Open
matemato opened this issue Jan 3, 2025 · 2 comments
Open

What is the purpose of train and test set? #1

matemato opened this issue Jan 3, 2025 · 2 comments

Comments

@matemato
Copy link

matemato commented Jan 3, 2025

Hi,

I want to measure the diversity of my generated synthetic dataset of generated faces. I am using the FFHQ dataset as the real dataset.
Could you please explain the purpose of the train and test set when running your script?

Thank you so much!

@MischaD
Copy link
Owner

MischaD commented Jan 7, 2025

Hi,

if you want to compare two different synthetic datasets, there is no purpose for it. The one with the higher unadjusted IRS score will be the more diverse.

There are two potential reasons to use it:

  1. You want to compare the diversity of real data to your synthetic dataset. Then you should add a test set.
  2. You want the diversity to be interpretable in terms of the percentage of the diversity of the original dataset. Then, we need a real reference dataset to adjust for the lack of diversity that comes from the feature extractor (e.g., SwAV). Refer to Section 3.4 for more information.

Hope that helps.

@matemato
Copy link
Author

matemato commented Jan 9, 2025

Thank you for your answer!

Could you elaborate further on what should be passed as a) train set, b) test set, and c) synthetic data for the 3 examples you mentioned:

  1. Comparing the diversity of 2 different synthetic datasets
  2. Comparing the diversity of real data to a synthetic dataset
  3. Comparing the diversity to be interpretable in terms of the percentage of the diversity of the original dataset.

i.e. what to pass down as arguments when running your script:

results = run("path/to/train", "path/to/test", "path/to/synth", "out/path", "results.json", config)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants