Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report fitting progress for the HMASynthesizer #1440

Closed
npatki opened this issue May 25, 2023 · 0 comments · Fixed by #1446
Closed

Report fitting progress for the HMASynthesizer #1440

npatki opened this issue May 25, 2023 · 0 comments · Fixed by #1446
Assignees
Labels
data:multi-table Related to multi-table, relational datasets feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented May 25, 2023

Problem Description

The HMASynthesizer does not offer a verbosity option today. If there is a large dataset, it may take some time to learn the model (i.e. fit the data) and I am just waiting with a blank screen. It would be better if there was a progress report I could see.

Expected behavior

Add a verbose parameter to the initialization of the HMASynthesizer. If set to True, then it should print out the progress during all fitting stages. Let's set the parameter to True by default. Users can always turn this off.

>>> synth = HMASynthesizer(metadata, verbose=True)
>>> synth.fit(data)

Preprocess Tables: 100%|█████████████████████████████| 3/3 [00:33<00:10, 229.00it/s]

Learning relationships:
(1/2) Tables 'A' and 'B' ('user_id'): 100%|█████████████████████████████| 1000/1000 [00:33<00:10, 229.00it/s]
(2/2) Tables 'B' and 'C' ('session_id'): 100%|█████████████████████████████| 1200/1200 [00:33<00:10, 229.00it/s]

Modeling Tables: 100%|█████████████████████████████| 1/1 [00:33<00:10, 229.00it/s]

Additional context (fit)

Let r be the total number of relationships in the multi table schema. Then the total number of steps is r+2. Each step will have a new progress bar.

Additional Context (preprocress)

When preprocessing, only step 1 is performed.

>>> synth = HMASynthesizer(metadata, verbose=True)
>>> processed_data = synth.preprocess(data)

Preprocess Tables: 100%|█████████████████████████████| 3/3 [00:33<00:10, 229.00it/s]

Additional Context (fit_processed_data)

When fitting processed data, the preprocessing is already done, so there are only r+1 remaining steps.

>>> synth.fit_processed_data(processed_data)
Learning relationships:
(1/2) Tables 'A' and 'B' ('user_id'): 100%|█████████████████████████████| 1000/1000 [00:33<00:10, 229.00it/s]
(2/2) Tables 'B' and 'C' ('session_id'): 100%|█████████████████████████████| 1200/1200 [00:33<00:10, 229.00it/s]

Modeling Tables: 100%|█████████████████████████████| 1/1 [00:33<00:10, 229.00it/s]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:multi-table Related to multi-table, relational datasets feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants