Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: cache parsed CSV file data and replace Array with Set to improve code gen performance #168

Merged
merged 3 commits into from
Nov 22, 2021

Conversation

ToddFincannon
Copy link
Collaborator

@ToddFincannon ToddFincannon commented Nov 19, 2021

Fixes #167

@chrispcampbell chrispcampbell changed the title cache parsed CSV file data perf: cache parsed CSV file data Nov 19, 2021
@chrispcampbell chrispcampbell changed the title perf: cache parsed CSV file data perf: cache parsed CSV file data to improve code gen performance Nov 19, 2021
}
let data = B.read(pathname)
csv = parseCsv(data, CSV_PARSE_OPTS)
csvData.set(pathname, csv)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need to worry about caching many large files in memory being an issue for models that read many large CSV files (but that only make use of parts of them). But I suspect the answer is "probably not", and if we do ever encounter such a beast, we can worry about it at that time.

Copy link
Contributor

@chrispcampbell chrispcampbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ToddFincannon I was just about to approve and merge this but saw your latest comment in #167. I can either hold off on merging if you think you have other changes to make as part of this PR (i.e., make this PR a general "make EPS code gen faster" thing), or I can merge this without the "fixes" tag so that we leave the issue open and tackle it with multiple PRs. Let me know which you prefer.

@ToddFincannon
Copy link
Collaborator Author

That optimization reduced the full output genc time by 3x (see #167 for the flame graph). I think it's ready to merge now.

I worried briefly about the memory footprint of caching the CSVs, but it doesn't seem to be a problem. In EPS we cache 700 mostly small CSVs, and computers have lots of memory these days anyway.

@chrispcampbell chrispcampbell changed the title perf: cache parsed CSV file data to improve code gen performance perf: cache parsed CSV file data and replace Array with Set to improve code gen performance Nov 22, 2021
Copy link
Contributor

@chrispcampbell chrispcampbell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement!

@chrispcampbell chrispcampbell merged commit 58a45ba into develop Nov 22, 2021
@chrispcampbell chrispcampbell deleted the todd/167-slow-genc branch November 22, 2021 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generating C code is slow for EPS
2 participants