Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for JSON #107

Merged
merged 62 commits into from
May 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
a3620f0
Add unflatten dep
mikeknep May 1, 2023
1dc2f4c
copy/paste Sami's JSON code
mikeknep May 3, 2023
169b4ab
Style formatting on Sami's code
mikeknep May 3, 2023
a729477
Adjust JSON code to fit into relational package
mikeknep May 3, 2023
fd025a7
Update RelationalData plus some tests
mikeknep May 3, 2023
edbccf8
Restore output tables shapes at conclusion of transforms run
mikeknep May 3, 2023
596a50f
wip: scope
mikeknep May 3, 2023
b82e60e
json: expose root table name as property
mikeknep May 5, 2023
5cd093c
core: set root table name on table metadata
mikeknep May 5, 2023
180d702
json: invented metadata stores root and original table names
mikeknep May 5, 2023
1a4159e
core: only invented tables exist in the graph
mikeknep May 5, 2023
e6d4127
Add db fixture that contains JSON
mikeknep May 5, 2023
c45cbb1
Add Evaluatable scope
mikeknep May 5, 2023
8df636e
json (111): only add PKID column for invented CHILD tables
mikeknep May 5, 2023
e414e48
tests (111): update tests so that PKID only expected on invented CHIL…
mikeknep May 5, 2023
570d76f
core: introduce invented scope + change debug summary format
mikeknep May 5, 2023
b80e3e7
multitable: scope evaluations to appropriate tables
mikeknep May 5, 2023
32bd2c1
json: handle source table with empty/None PK
mikeknep May 5, 2023
9e908ac
core: get_public_name, + when asking for foreign keys defer json sour…
mikeknep May 5, 2023
71924bd
Scope to public tables in report
mikeknep May 5, 2023
3744ef7
Use new public_name method in mt evaluations
mikeknep May 5, 2023
787aa48
Reshape the synthetic output at the end
mikeknep May 5, 2023
244b8ca
Extract json tests to their own test file
mikeknep May 5, 2023
7597c9d
We don't need to store name on the table metadata
mikeknep May 5, 2023
2fbb0fc
Set some scopes
mikeknep May 5, 2023
3a10d80
Add test for inveted children foreign keys
mikeknep May 5, 2023
4a42838
Extract bball rel_data to fixture
mikeknep May 5, 2023
6dcfcaf
Add tests documenting JSON PKs
mikeknep May 5, 2023
49ce92d
Support removing FKs from tables with JSON
mikeknep May 8, 2023
912d32b
Move invented tables static suffix to autouse fixture
mikeknep May 8, 2023
fe214de
Add list value to bball jsonl fixture
mikeknep May 8, 2023
842af43
Support (re)setting PK on table with JSON data
mikeknep May 8, 2023
78c59f8
Extract method
mikeknep May 8, 2023
b0d7621
Ensure FKs are retained after resetting PK
mikeknep May 8, 2023
6bb8dcd
Extract method to remove reljson
mikeknep May 8, 2023
977c423
Support updating existing JSON-containing data
mikeknep May 8, 2023
14a5b25
Support update table data with/out JSON
mikeknep May 8, 2023
60c40ce
Fix bug with index/PK column
mikeknep May 8, 2023
f6d9b16
Support asking for FKs with user-supplied table names
mikeknep May 9, 2023
72cda43
Tweak table relationships in report
mikeknep May 9, 2023
b3727c1
Refactor RelJson initial parsing to classmethod
mikeknep May 10, 2023
c551210
Reduce surface area
mikeknep May 10, 2023
f303699
Use simpler attr for property
mikeknep May 10, 2023
5ec9626
Return add commands from ingest so we can carry less on the instance
mikeknep May 10, 2023
23ae6f7
RelJson data in backup/restore
mikeknep May 10, 2023
fef2442
Remove RelationalData to/from filesystem methods
mikeknep May 10, 2023
7a075a8
Expect dict, not internal type, in backup
mikeknep May 10, 2023
f4cba98
RelJson understands numpy arrays as lists
mikeknep May 10, 2023
23d823b
Handle case of restoring without all tables (e.g. model failed)
mikeknep May 11, 2023
442a393
Add type hints to json module
mikeknep May 11, 2023
92dacd9
Add some more JSON tests
mikeknep May 12, 2023
42b70e6
Refactor json to leverage RelData where possible
mikeknep May 12, 2023
8c2008d
Go back to table_name_mappings as a list to preserve order
mikeknep May 12, 2023
542836e
Typo fix
mikeknep May 15, 2023
70134bb
TIghten try/except range
mikeknep May 15, 2023
26a1136
Revert "Go back to table_name_mappings as a list to preserve order"
mikeknep May 15, 2023
b99a8e2
lint
mikeknep May 15, 2023
40297f1
Remove unused import
mikeknep May 15, 2023
79a899a
Add some logging to RelationalJson
mikeknep May 15, 2023
db54376
Add function to set relational log level
mikeknep May 15, 2023
41c5852
Add docstrings for Scope enum and its variants
mikeknep May 15, 2023
5aeb8a5
Update method names after rebasing
mikeknep May 16, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ requests~=2.25
scikit-learn~=1.0
smart-open[s3]~=5.2
sqlalchemy~=1.4
unflatten==0.1.1
1 change: 1 addition & 0 deletions src/gretel_trainer/relational/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@
sqlite_conn,
)
from gretel_trainer.relational.core import RelationalData
from gretel_trainer.relational.log import set_log_level
from gretel_trainer.relational.multi_table import MultiTable
35 changes: 33 additions & 2 deletions src/gretel_trainer/relational/backup.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
@dataclass
class BackupRelationalDataTable:
primary_key: List[str]
invented_table_metadata: Optional[dict[str, str]] = None
pimlock marked this conversation as resolved.
Show resolved Hide resolved


@dataclass
Expand All @@ -29,26 +30,52 @@ def from_fk(cls, fk: ForeignKey) -> BackupForeignKey:
)


@dataclass
class BackupRelationalJson:
original_table_name: str
original_primary_key: list[str]
original_columns: list[str]
table_name_mappings: dict[str, str]
invented_table_names: list[str]


@dataclass
class BackupRelationalData:
tables: Dict[str, BackupRelationalDataTable]
foreign_keys: List[BackupForeignKey]
relational_jsons: Dict[str, BackupRelationalJson]

@classmethod
def from_relational_data(cls, rel_data: RelationalData) -> BackupRelationalData:
tables = {}
foreign_keys = []
relational_jsons = {}
for table in rel_data.list_all_tables():
tables[table] = BackupRelationalDataTable(
backup_table = BackupRelationalDataTable(
primary_key=rel_data.get_primary_key(table),
)
if (
invented_table_metadata := rel_data.get_invented_table_metadata(table)
) is not None:
backup_table.invented_table_metadata = asdict(invented_table_metadata)
tables[table] = backup_table
foreign_keys.extend(
[
BackupForeignKey.from_fk(key)
for key in rel_data.get_foreign_keys(table)
]
)
return BackupRelationalData(tables=tables, foreign_keys=foreign_keys)
for key, rel_json in rel_data.relational_jsons.items():
relational_jsons[key] = BackupRelationalJson(
original_table_name=rel_json.original_table_name,
original_primary_key=rel_json.original_primary_key,
original_columns=rel_json.original_columns,
table_name_mappings=rel_json.table_name_mappings,
invented_table_names=rel_json.table_names,
)
return BackupRelationalData(
tables=tables, foreign_keys=foreign_keys, relational_jsons=relational_jsons
)


@dataclass
Expand Down Expand Up @@ -114,6 +141,10 @@ def from_dict(cls, b: Dict[str, Any]):
)
for fk in relational_data.get("foreign_keys", [])
],
relational_jsons={
k: BackupRelationalJson(**v)
for k, v in relational_data.get("relational_jsons", {}).items()
},
)

backup = Backup(
Expand Down
Loading