Report fitting progress for the `HMASynthesizer` #1440

npatki · 2023-05-25T14:13:27Z

Problem Description

The HMASynthesizer does not offer a verbosity option today. If there is a large dataset, it may take some time to learn the model (i.e. fit the data) and I am just waiting with a blank screen. It would be better if there was a progress report I could see.

Expected behavior

Add a verbose parameter to the initialization of the HMASynthesizer. If set to True, then it should print out the progress during all fitting stages. Let's set the parameter to True by default. Users can always turn this off.

>>> synth = HMASynthesizer(metadata, verbose=True)
>>> synth.fit(data)

Preprocess Tables: 100%|█████████████████████████████| 3/3 [00:33<00:10, 229.00it/s]

Learning relationships:
(1/2) Tables 'A' and 'B' ('user_id'): 100%|█████████████████████████████| 1000/1000 [00:33<00:10, 229.00it/s]
(2/2) Tables 'B' and 'C' ('session_id'): 100%|█████████████████████████████| 1200/1200 [00:33<00:10, 229.00it/s]

Modeling Tables: 100%|█████████████████████████████| 1/1 [00:33<00:10, 229.00it/s]

Additional context (fit)

Let r be the total number of relationships in the multi table schema. Then the total number of steps is r+2. Each step will have a new progress bar.

Step 1: The first step will be preprocessing. The progress bar should increment for each table that we preprocess. (https://github.com/sdv-dev/SDV/blob/master/sdv/multi_table/base.py#L307)
Next r steps: For each relationship, show a new progress bar. The bar should increment as we iterate over each parent row to compute the extension over a child’s foreign key (https://github.com/sdv-dev/SDV/blob/master/sdv/multi_table/hma.py#L63)
Final step: The final step is modeling each root table. The progress bar should increment as we iterate over each root table that is being fitted (https://github.com/sdv-dev/SDV/blob/master/sdv/multi_table/hma.py#L186)

Additional Context (preprocress)

When preprocessing, only step 1 is performed.

>>> synth = HMASynthesizer(metadata, verbose=True)
>>> processed_data = synth.preprocess(data)

Preprocess Tables: 100%|█████████████████████████████| 3/3 [00:33<00:10, 229.00it/s]

Additional Context (fit_processed_data)

When fitting processed data, the preprocessing is already done, so there are only r+1 remaining steps.

>>> synth.fit_processed_data(processed_data)
Learning relationships:
(1/2) Tables 'A' and 'B' ('user_id'): 100%|█████████████████████████████| 1000/1000 [00:33<00:10, 229.00it/s]
(2/2) Tables 'B' and 'C' ('session_id'): 100%|█████████████████████████████| 1200/1200 [00:33<00:10, 229.00it/s]

Modeling Tables: 100%|█████████████████████████████| 1/1 [00:33<00:10, 229.00it/s]

The text was updated successfully, but these errors were encountered:

npatki added feature request Request for a new feature data:multi-table Related to multi-table, relational datasets labels May 25, 2023

pvk-developer mentioned this issue May 30, 2023

Report fitting progress for HMASynthesizer #1446

Merged

npatki mentioned this issue Jun 1, 2023

Communicate the training progress during fit #579

Closed

pvk-developer closed this as completed in #1446 Jun 1, 2023

amontanez24 assigned pvk-developer Jun 6, 2023

amontanez24 added this to the 1.2.0 milestone Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report fitting progress for the `HMASynthesizer` #1440

Report fitting progress for the `HMASynthesizer` #1440

npatki commented May 25, 2023 •

edited

Loading

Report fitting progress for the HMASynthesizer #1440

Report fitting progress for the HMASynthesizer #1440

Comments

npatki commented May 25, 2023 • edited Loading

Problem Description

Expected behavior

Additional context (fit)

Additional Context (preprocress)

Additional Context (fit_processed_data)

Report fitting progress for the `HMASynthesizer` #1440

Report fitting progress for the `HMASynthesizer` #1440

npatki commented May 25, 2023 •

edited

Loading