Excessive Memory Allocation Loading Worksheet Rows #111

zacheryph · 2023-01-06T04:18:30Z

I am noticing excessive temporary memory allocation loading a worksheet. I cannot share said spreadsheet currently. This particular spreadsheet is 5.8MB, single sheet, 50K records. I cannot see anything obvious within rows causing this but it just seems a bit excessive for a 6MB file ;)

Reason I bring this up is it's killing kubernetes pods for us for memory usage when running. I am attempting to bump the limits to get around it.

Anything I can do to help I'm glad to.

Note, memory usage is from Docker stats

Base Rails Console Memory: 168.3MiB
Load Book (book = Creek::Book.new("report.xlsx")): 305.8MiB
Force a GC (GC.start): 210.6MiB
Lookup First Row (book.sheets.first.rows.first): 1.247GiB
Force a GC (GC.start): 222.9MiB

The text was updated successfully, but these errors were encountered:

zacheryph · 2023-01-12T14:38:26Z

This also appears to be new to 2.6.2. I do not see this happening in 2.5.3 so I have rolled back our update.

pythonicrubyist · 2023-01-12T15:46:30Z

How many records are on that sheet?

zacheryph · 2023-01-13T04:04:35Z

50K.

With this particular spreadsheet,
with Creek 2.5.3 memory usage (watching docker stats) never exceeds ~ 450-500M
but with Creek 2.6.2 memory usage peaks at 1.1-1.2GB

md5 · 2023-05-03T04:46:57Z

We started seeing OOM killer issues in containerized workloads after upgrading to creek-2.6.2 as well and were able to alleviate the issue by downgrading to 2.5.3. Thanks for the report @zacheryph!

I believe the issue was introduced as part of #101 in this commit: 494ed05

Specifically, it's repeatedly building "#{prefix}row", "#{prefix}c", "#{prefix}v", and "#{prefix}t" strings for every node that is read. I think that instead of doing that, it needs to construct row_selector, cell_selector, value_selector, and text_selector strings only when prefix is first set to a non-'' value (with those new variables defaulting to the non-prefixed names like 'row').

While we're micro-optimizing, I think we probably don't want [value_selector, text_selector].include?(node.name) either, since that's constructing an array on every iteration. Directly checking node.name == value_selector || node.name == text_selector would be more performant.

zacheryph changed the title ~~Excessive Memory Allocation Loading File~~ Excessive Memory Allocation Loading Worksheet Rows Jan 6, 2023

ThomasSevestre mentioned this issue May 3, 2023

Limit object allocations #115

Merged

pythonicrubyist closed this as completed in #115 May 3, 2023

alexhornick-dt mentioned this issue Oct 26, 2023

NoMethodError: undefined method any? in creek/sheet.rb; related to Memory Usage #121

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive Memory Allocation Loading Worksheet Rows #111

Excessive Memory Allocation Loading Worksheet Rows #111

zacheryph commented Jan 6, 2023

zacheryph commented Jan 12, 2023

pythonicrubyist commented Jan 12, 2023

zacheryph commented Jan 13, 2023

md5 commented May 3, 2023 •

edited

Loading

Excessive Memory Allocation Loading Worksheet Rows #111

Excessive Memory Allocation Loading Worksheet Rows #111

Comments

zacheryph commented Jan 6, 2023

zacheryph commented Jan 12, 2023

pythonicrubyist commented Jan 12, 2023

zacheryph commented Jan 13, 2023

md5 commented May 3, 2023 • edited Loading

md5 commented May 3, 2023 •

edited

Loading