Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive Memory Allocation Loading Worksheet Rows #111

Closed
zacheryph opened this issue Jan 6, 2023 · 4 comments · Fixed by #115
Closed

Excessive Memory Allocation Loading Worksheet Rows #111

zacheryph opened this issue Jan 6, 2023 · 4 comments · Fixed by #115

Comments

@zacheryph
Copy link

I am noticing excessive temporary memory allocation loading a worksheet. I cannot share said spreadsheet currently. This particular spreadsheet is 5.8MB, single sheet, 50K records. I cannot see anything obvious within rows causing this but it just seems a bit excessive for a 6MB file ;)

Reason I bring this up is it's killing kubernetes pods for us for memory usage when running. I am attempting to bump the limits to get around it.

Anything I can do to help I'm glad to.

Note, memory usage is from Docker stats

Base Rails Console Memory: 168.3MiB
Load Book (book = Creek::Book.new("report.xlsx")): 305.8MiB
Force a GC (GC.start): 210.6MiB
Lookup First Row (book.sheets.first.rows.first): 1.247GiB
Force a GC (GC.start): 222.9MiB

@zacheryph zacheryph changed the title Excessive Memory Allocation Loading File Excessive Memory Allocation Loading Worksheet Rows Jan 6, 2023
@zacheryph
Copy link
Author

This also appears to be new to 2.6.2. I do not see this happening in 2.5.3 so I have rolled back our update.

@pythonicrubyist
Copy link
Owner

How many records are on that sheet?

@zacheryph
Copy link
Author

50K.

With this particular spreadsheet,
with Creek 2.5.3 memory usage (watching docker stats) never exceeds ~ 450-500M
but with Creek 2.6.2 memory usage peaks at 1.1-1.2GB

@md5
Copy link

md5 commented May 3, 2023

We started seeing OOM killer issues in containerized workloads after upgrading to creek-2.6.2 as well and were able to alleviate the issue by downgrading to 2.5.3. Thanks for the report @zacheryph!

I believe the issue was introduced as part of #101 in this commit: 494ed05

Specifically, it's repeatedly building "#{prefix}row", "#{prefix}c", "#{prefix}v", and "#{prefix}t" strings for every node that is read. I think that instead of doing that, it needs to construct row_selector, cell_selector, value_selector, and text_selector strings only when prefix is first set to a non-'' value (with those new variables defaulting to the non-prefixed names like 'row').

While we're micro-optimizing, I think we probably don't want [value_selector, text_selector].include?(node.name) either, since that's constructing an array on every iteration. Directly checking node.name == value_selector || node.name == text_selector would be more performant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants