Speeding up lookup of inp sections and bracketed words #117

jackieff · 2020-12-05T01:37:45Z

Addresses #92 by speeding up the search for inp sections and therefore dataframe_from_inp
Changed to search full text file string instead of line iterations

aerispaha · 2021-05-05T16:50:10Z

@jackieff thanks for this!

I like your improved logic. That said, I have two comments:

I think this doesn't totally address improve speed/efficiency of dataframe_from_inp function #92 because with your changes, we still will be scanning the file twice when we call dataframe_from_inp. Unless I'm misunderstanding, I think we could merge these changes but keep improve speed/efficiency of dataframe_from_inp function #92 open.
I wonder if this approach may not be faster when we have very large inp files. Since your method reads the whole text file into memory with f.read(), this might lead to a lot of memory usage in some cases. I wonder if we can benchmark this, or maybe you already did some tests that you can share? Or maybe this doesn't matter since most machines have a lot of RAM these days.

What do you think?

jackieff · 2021-05-05T20:34:23Z

@aerispaha Great points - I'm doing some benchmark tests for # 2 right now, and will look more into # 1. Stay tuned

jackieff · 2021-05-05T20:45:15Z

@aerispaha Seems that # 2 is not a problem, I tested the f.read() section with varying file sizes:
File size 557 KB, with 1000 nodes and 1903 links took 0.01 seconds. File size 5868 KB, with 8749 nodes and 17286 links took 0.07 seconds. File size 28056 KB, with 43745 nodes and 103715 links took 0.35 seconds.

aerispaha · 2021-05-05T21:13:11Z

Regarding the performance of this change, another thing we check is the duration of CI unit tests to get an indirect sense of things. Overall, it looks like this makes swmmio faster, at least for the test cases.

version	test job	duration	🏆
master	linux Python 3.8	10.49s	😢
`31766d6`	linux Python 3.8	8.25s	🏆
master	win Python 3.7	16.91s	😢
`31766d6`	win Python 3.7	12.64	🏆

aerispaha

looks great

jackieff added 2 commits December 4, 2020 20:36

Speeding up lookup of inp sections and bracketed words

4794d52

Merge remote-tracking branch 'upstream/master' into speedup_inp_reading

31766d6

aerispaha self-requested a review May 5, 2021 16:41

Eliminating need to scan inp file twice for dataframe_from_inp

d556e1f

jackieff and others added 3 commits May 5, 2021 17:21

Fixing headers error

339a29d

minor change to code style

1292b22

Reverting to 31766d6

3331597

aerispaha approved these changes May 6, 2021

View reviewed changes

aerispaha merged commit 2c1429a into pyswmm:master May 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up lookup of inp sections and bracketed words #117

Speeding up lookup of inp sections and bracketed words #117

jackieff commented Dec 5, 2020

aerispaha commented May 5, 2021

jackieff commented May 5, 2021 •

edited

Loading

jackieff commented May 5, 2021

aerispaha commented May 5, 2021

aerispaha left a comment

Speeding up lookup of inp sections and bracketed words #117

Speeding up lookup of inp sections and bracketed words #117

Conversation

jackieff commented Dec 5, 2020

aerispaha commented May 5, 2021

jackieff commented May 5, 2021 • edited Loading

jackieff commented May 5, 2021

aerispaha commented May 5, 2021

aerispaha left a comment

Choose a reason for hiding this comment

jackieff commented May 5, 2021 •

edited

Loading