Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write and drop #117

Merged
merged 3 commits into from
Jun 5, 2023
Merged

Write and drop #117

merged 3 commits into from
Jun 5, 2023

Conversation

mikeknep
Copy link
Contributor

@mikeknep mikeknep commented Jun 2, 2023

Currently, the preprocessing in synthetics training creates a training DF for each table, keeps each of those in a dictionary, and passes the dictionary to the next method which writes each to a CSV and submits a model.

With this change, each training DF is written to CSV as soon as it is created, so we don't have to hold on to more than one at a time.

There's also a small change to the independent strategy specifically: instead of asking for the whole source dataframe and then dropping columns, we only ask for the columns we intend to keep. This doesn't make too big a difference now (since the whole source DF is already in memory in the graph), but we anticipate eventually storing source data on disk and loading in the data when needed, and presumably at that point this will provide some greater benefit.

@mikeknep
Copy link
Contributor Author

mikeknep commented Jun 5, 2023

I did some testing using tracemalloc and a set of tables ranging from about 40–670 MB. Before this change / on main, the peak memory usage during synthetics training was 1.75 GB (1,756,860,969) (and that may have even been a bit artificially low because I cut out some other things to save time). With this change, the peak only reached 0.5 GB (501,836,663).

Copy link
Contributor

@pimlock pimlock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@mikeknep mikeknep merged commit 7e462e2 into main Jun 5, 2023
@mikeknep mikeknep deleted the write-and-drop branch June 5, 2023 17:50
@mikeknep mikeknep mentioned this pull request Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants