Write and drop #117

mikeknep · 2023-06-02T21:37:45Z

Currently, the preprocessing in synthetics training creates a training DF for each table, keeps each of those in a dictionary, and passes the dictionary to the next method which writes each to a CSV and submits a model.

With this change, each training DF is written to CSV as soon as it is created, so we don't have to hold on to more than one at a time.

There's also a small change to the independent strategy specifically: instead of asking for the whole source dataframe and then dropping columns, we only ask for the columns we intend to keep. This doesn't make too big a difference now (since the whole source DF is already in memory in the graph), but we anticipate eventually storing source data on disk and loading in the data when needed, and presumably at that point this will provide some greater benefit.

mikeknep · 2023-06-05T16:50:20Z

I did some testing using tracemalloc and a set of tables ranging from about 40–670 MB. Before this change / on main, the peak memory usage during synthetics training was 1.75 GB (1,756,860,969) (and that may have even been a bit artificially low because I cut out some other things to save time). With this change, the peak only reached 0.5 GB (501,836,663).

pimlock

Nice!

mikeknep added 3 commits June 2, 2023 15:26

Don't hang on to training DFs longer than necessary

d6d5a4d

Only load the columns we need, instead of all followed by drop

6867476

Collapse private methods

56c1401

mikeknep requested review from pimlock, tylersbray and johntmyers June 2, 2023 21:37

pimlock approved these changes Jun 5, 2023

View reviewed changes

mikeknep merged commit 7e462e2 into main Jun 5, 2023

mikeknep deleted the write-and-drop branch June 5, 2023 17:50

mikeknep mentioned this pull request Jun 6, 2023

Bettertar #119

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write and drop #117

Write and drop #117

mikeknep commented Jun 2, 2023

mikeknep commented Jun 5, 2023 •

edited

Loading

pimlock left a comment

Write and drop #117

Write and drop #117

Conversation

mikeknep commented Jun 2, 2023

mikeknep commented Jun 5, 2023 • edited Loading

pimlock left a comment

Choose a reason for hiding this comment

mikeknep commented Jun 5, 2023 •

edited

Loading