Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeding up lookup of inp sections and bracketed words #117

Merged
merged 6 commits into from
May 6, 2021

Conversation

jackieff
Copy link
Contributor

@jackieff jackieff commented Dec 5, 2020

Addresses #92 by speeding up the search for inp sections and therefore dataframe_from_inp
Changed to search full text file string instead of line iterations

@aerispaha aerispaha self-requested a review May 5, 2021 16:41
@aerispaha
Copy link
Member

@jackieff thanks for this!

I like your improved logic. That said, I have two comments:

  1. I think this doesn't totally address improve speed/efficiency of dataframe_from_inp function #92 because with your changes, we still will be scanning the file twice when we call dataframe_from_inp. Unless I'm misunderstanding, I think we could merge these changes but keep improve speed/efficiency of dataframe_from_inp function #92 open.
  2. I wonder if this approach may not be faster when we have very large inp files. Since your method reads the whole text file into memory with f.read(), this might lead to a lot of memory usage in some cases. I wonder if we can benchmark this, or maybe you already did some tests that you can share? Or maybe this doesn't matter since most machines have a lot of RAM these days.

What do you think?

@jackieff
Copy link
Contributor Author

jackieff commented May 5, 2021

@aerispaha Great points - I'm doing some benchmark tests for # 2 right now, and will look more into # 1. Stay tuned

@jackieff
Copy link
Contributor Author

jackieff commented May 5, 2021

@aerispaha Seems that # 2 is not a problem, I tested the f.read() section with varying file sizes:
File size 557 KB, with 1000 nodes and 1903 links took 0.01 seconds. File size 5868 KB, with 8749 nodes and 17286 links took 0.07 seconds. File size 28056 KB, with 43745 nodes and 103715 links took 0.35 seconds.

@aerispaha
Copy link
Member

Regarding the performance of this change, another thing we check is the duration of CI unit tests to get an indirect sense of things. Overall, it looks like this makes swmmio faster, at least for the test cases.

version test job duration 🏆
master linux Python 3.8 10.49s 😢
31766d6 linux Python 3.8 8.25s 🏆
master win Python 3.7 16.91s 😢
31766d6 win Python 3.7 12.64 🏆

Copy link
Member

@aerispaha aerispaha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great

@aerispaha aerispaha merged commit 2c1429a into pyswmm:master May 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants