-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle PARSynthesizer model if sequence_index is missing #114
Conversation
@@ -181,7 +182,8 @@ def assemble_sequences( | |||
groupby_columns = entity_columns[0] if len(entity_columns) == 1 else entity_columns | |||
for _, sequence in data.groupby(groupby_columns): | |||
sequence.drop(entity_columns, axis=1, inplace=True) | |||
if context_columns: | |||
missing_columns = [col for col in context_columns if col not in sequence.columns] | |||
if context_columns and not missing_columns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we check the other columns that are in sequence instead of skipping over? Or is the fake column the only one in context_columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are missing context columns being passed to PAR in the first place? My understanding is that the UUID column gets added so that we have a dummy column for the context synthesizer. Presumably it should have either (1) added a dummy constant column to the data or (2) not be passed along to PAR at all.
Looking into this more, I think the problem is actually that we're adding the UUID column to |
Why does the UUID column need to be added for modeling purposes? Seems like the issue is resolved and all tests pass (with the exception of a unit test checking for the added column) when I remove the the added UUID column, so I am not sure if it is still needed. |
I think the problem is that we can't fit on an empty dataframe, so when there's no context columns we have to add a dummy column to correctly create the context model without erroring out. |
resolves sdv-dev/SDV#1972
CU-86b08wr44
When sequence index is missing, par.py adds a constant column to allow for modeling as seen here. The added context column does not exist in the data though causing
KeyErrors
. Added a check to prevent failures.