-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for JSON #107
Support for JSON #107
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this PR is dense :)
It looks great! I got through most of it, added some nits and I think it would be easier to continue by walking through this together. I'm going to set up some time next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Thanks for adding some logging, I think we could add more debug
over time (even around core relational), at least for me, it would make it easier to understand what's happening when I run it.
…ce to invented root
Co-authored-by: Piotr Mlocek <[email protected]>
This reverts commit 6494ba3.
How it works
Whenever a table is added (via
add_table
), we check to see if any columns in the table contain structured data—this can be actual dictionaries/lists or a valid JSON string. If it does, we invent 1-N tables that flatten the nested data by lifting dictionary keys to standalone columns and moving lists to new child tables. We train using all the flattened tables, and at the very end reassemble generated data to the original nested format.What's user-facing?
This is designed to be nearly completely transparent to users, but there are some public-facing things to note:
evaluations
dict property onMultiTable
.list_all_tables
method now takes an optional "scope" parameter to return different subsets of the known tables. By default we will expose the invented table names and not the original table name since this method is used so frequently byMultiTable
(which generally needs exactly that), but users can get their original table names by passingscope="public"
|scope=Scope.PUBLIC
.All the manual modification methods (e.g.
set_primary_key
,remove_foreign_key
, etc.) have been updated to work correctly when referencing a source-with-JSON table. The public interface is unchanged. (This accounts for the majority of the changes in therelational.core
module.)