Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve upon issues with session caching #2409

Merged
merged 6 commits into from
Aug 17, 2022

Conversation

dbfreem
Copy link
Contributor

@dbfreem dbfreem commented Mar 30, 2022

What was wrong?

@dbfreem:

@fselmo:

  • Multithreading was not being supported with the current implementation, nor with the initial commits here.

How was it fixed?

@dbfreem:

  • Moved the requests session cache to the new SessionCache class.

@fselmo:

  • Use a future thread with a bit more than the default call timeout to close evicted sessions so they aren't closed before a call is finished. This allows for multithreading support for both sync and async.
  • Add testing for multi threading
  • Combine cache_session and get_session methods for sync and async

Todo:

  • Add entry to the release notes
  • Fix newsfragment message to reflect all changes here

web3/_utils/request.py Outdated Show resolved Hide resolved
@kclowes
Copy link
Collaborator

kclowes commented Apr 4, 2022

Thanks @dbfreem! We're all neck deep in other things right now, but will review soon ™️

@fselmo
Copy link
Collaborator

fselmo commented Jul 6, 2022

@dbfreem I rebased this branch in attempts at getting it up-to-date and ready to merge in... can you take another look when you get a chance and make sure this is ready for review? I'm prioritizing this atm... thanks for the patience on this!

@dbfreem
Copy link
Contributor Author

dbfreem commented Jul 7, 2022

Hey @fselmo this looks good to go. This should get the blocking request in good shape in regards to threading but there is still an issue with the async request and threading. This is documented in issue #2446. At least, this refactor of request.py will make the fix for #2446 easier once a direction is decided.

Do you want me to fix that merge conflict? I would go ahead and fix it but wasn't sure where you stood on the rebase, let me know which way you want to go.

@fselmo fselmo force-pushed the feature/request_session_cache branch from f67ae11 to c3d00ed Compare July 7, 2022 21:16
@fselmo
Copy link
Collaborator

fselmo commented Jul 7, 2022

This should get the blocking request in good shape in regards to threading but there is still an issue with the async request and threading. This is documented in issue #2446. At least, this refactor of request.py will make the fix for #2446 easier once a direction is decided.

Sounds good. I just wanted to get this cleaned up and rebased so we can keep the conversation going. I am going to spend a bit more time across these PRs but just wanted to make sure it was rebased appropriately. Thanks 👌

Do you want me to fix that merge conflict? I would go ahead and fix it but wasn't sure where you stood on the rebase, let me know which way you want to go.

I took care of the merge conflict, thanks. We've been merging a lot of formatting changes this week.

@fselmo
Copy link
Collaborator

fselmo commented Aug 2, 2022

@dbfreem I added a commit to this PR that attempts to resolve some of what we have been discussing. Particularly, the race condition issue with many sessions running at once. If the cache filled too quickly then the first session, let's say, would be evicted before it ever has time to make a call. Issuing a task to run at DEFAULT_TIMEOUT + (some arbitrarily small amount of time) allows for ample time to make the call before that session is closed.

I haven't tinkered this much with async so I'm hoping we can chat about it here. I added a test that was failing up until before this commit that basically simulates the conditions I was seeing from #2446. This commit seems to resolve that even for small cache size (even for size=1 which is a pretty extreme case).

I ran with a similar approach as to one you'd posted in making the lock actually async by using the @asynccontextmanager and a ThreadPoolExecutor. Thoughts on this approach?


edit: I tested this against both #2446 and #2407 and it works in both conditions

@fselmo fselmo force-pushed the feature/request_session_cache branch 3 times, most recently from 05a474c to 806b513 Compare August 2, 2022 22:37
@dbfreem
Copy link
Contributor Author

dbfreem commented Aug 3, 2022

This is an interesting approach. I started reading this to see if anyone else has run into these types of issue in high concurrency situations. It made me think that fundamentally, if I get a session from the cache I want to make sure that session is not modified until I am done with it. The reason I bring this up is that even with the approaches already implemented in this PR my session could still get closed while I am using it on the sync and async side.

For instance image I call the get_session method. I should be assured that the session I get back from it will stay open until I am done with it. Right now that guarantee doesn't exist. It is better with this PR but still not perfect. I actually think after reading that article and and thinking through this that the lock should be moved to the get_response_from_post_request method. This would assure that the thread is done with the session before something else could close it.

So that is for sync but what about async? For Async I think there is a fundamental question of do we want to support async multithreaded code. Would someone be running async in multiple threads if not the the async lock may be appropriate shown here. Is there some forum where this question could be asked to consumers of the library? If we will have someone running async in multiple threads then I can't currently think of another way to make this work then what you have proposed. I still need to read through your commit in a little more detail.

I know that's a lot of info, thoughts??

@fselmo
Copy link
Collaborator

fselmo commented Aug 3, 2022

Yeah I think I'd be good with moving the lock to the whole request for sync. I was headed that direction with both but wanted to see if we could tackle multithreading for async first. I think this commit along with allowing the cache size to be customized might be a really good combination answer to our problems. If someone expects many threads for their application and they increase the cache size to match it, they should never run into any issues... and if they still exceed the cache size, this commit should take care of that situation. Thoughts on that?

Would someone be running async in multiple threads if not the the async lock may be appropriate shown here.

Curious what the difference is between asyncio.Lock and the one implemented in my commit (source).

The reason I bring this up is that even with the approaches already implemented in this PR my session could still get closed while I am using it on the sync and async side.

Yeah I think sync can be addressed by moving the lock to the request, as we mentioned above.

For async, can we try to follow a scenario where the async session would be closed before you're done with it? The way I'm seeing this change, and maybe we can even extend the time on the timer thread to make sure of this, is that if a session is evicted from the cache, it is held for at least the amount of time it takes to finish the original call (DEFAULT_TIMEOUT). If a new call is made to the same url and it isn't in the cache, it would create a new session in the cache for that URL. Am I missing something there?


edit: I suppose if you grab the session directly from the cache and try to use it indefinitely... then yeah I guess at that point it might eventually get evicted from the cache and closed which should be expected at some point since there is a limited size cache in the first place. But if you are just making calls from the provider, if you try to call that same URL, a new session would begin. I think ultimately giving the user control of the cache size along with evicting sessions in a future thread would give the most customizable experience that I've seen yet and should allow for pretty seamless multithreading.

Copy link
Collaborator

@kclowes kclowes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a bunch of nits (mostly around comments 😆 ), but don't see anything big!

web3/_utils/request.py Outdated Show resolved Hide resolved
web3/_utils/request.py Outdated Show resolved Hide resolved
web3/_utils/request.py Outdated Show resolved Hide resolved
web3/_utils/request.py Show resolved Hide resolved
tests/core/utilities/test_request.py Outdated Show resolved Hide resolved
tests/core/utilities/test_request.py Outdated Show resolved Hide resolved
web3/_utils/request.py Outdated Show resolved Hide resolved
web3/_utils/request.py Outdated Show resolved Hide resolved
web3/_utils/request.py Outdated Show resolved Hide resolved
fselmo added a commit to dbfreem/web3.py that referenced this pull request Aug 3, 2022
- Use dict.values() over dict.items() since keys are not being used
- Remove assert in favor of a ``logger.warning()``
- Fix some minor blips in comments
- Minor refactor with more descriptive method name for ``_close_evicted_async_sessions()``
fselmo added a commit to dbfreem/web3.py that referenced this pull request Aug 3, 2022
- Use dict.values() over dict.items() since keys are not being used
- Remove assert in favor of a ``logger.warning()``
- Fix some minor blips in comments
- Minor refactor with more descriptive method name for ``_close_evicted_async_sessions()``
@fselmo fselmo force-pushed the feature/request_session_cache branch from be58e81 to a4b36c3 Compare August 3, 2022 23:03
fselmo added a commit to dbfreem/web3.py that referenced this pull request Aug 3, 2022
- Use dict.values() over dict.items() since keys are not being used
- Remove assert in favor of a ``logger.warning()``
- Fix some minor blips in comments
- Minor refactor with more descriptive method name for ``_close_evicted_async_sessions()``
@fselmo fselmo force-pushed the feature/request_session_cache branch from a4b36c3 to 2532bd9 Compare August 3, 2022 23:08
@dbfreem
Copy link
Contributor Author

dbfreem commented Aug 4, 2022

For async, can we try to follow a scenario where the async session would be closed before you're done with it? The way I'm seeing this change, and maybe we can even extend the time on the timer thread to make sure of this, is that if a session is evicted from the cache, it is held for at least the amount of time it takes to finish the original call (DEFAULT_TIMEOUT). If a new call is made to the same url and it isn't in the cache, it would create a new session in the cache for that URL. Am I missing something there?

@fselmo after reading through what you have on the async side I "think" it might work. It is hard to say without running the code, and I haven't had much time lately. Sorry.

@fselmo
Copy link
Collaborator

fselmo commented Aug 4, 2022

@fselmo after reading through what you have on the async side I "think" it might work. It is hard to say without running the code, and I haven't had much time lately. Sorry.

Totally get it, no worries at all. Was just curious to pick your brain about it and run through all the possible scenarios. We can probably add a bit more to the timeout too just to make sure. I'll push up the sync changes we talked about too and try to think of some more tests. Then we can do one last pass through... but I think this is a big improvement on what we have. Thanks again for getting it going.

@fselmo fselmo force-pushed the feature/request_session_cache branch from 791955c to 1b59a30 Compare August 9, 2022 19:53
@fselmo fselmo requested a review from kclowes August 9, 2022 20:07
@fselmo
Copy link
Collaborator

fselmo commented Aug 9, 2022

@kclowes, I think this is ready for a full review now. Maybe checking the latest commits by themselves so you can see all that changed since the last review.

@fselmo fselmo requested a review from pacrob August 9, 2022 20:08
Copy link
Collaborator

@kclowes kclowes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! ⛵ Just to double check, we did verify that this works if there are more URLs than the cache size, right? Like described in #2446?

@fselmo fselmo changed the title moved requests session to SessionCache Improve upon issues with session caching Aug 16, 2022
dbfreem and others added 6 commits August 16, 2022 16:12
- moved requests session to SessionCache
- added lock around get_cache_entry
- adding the extra lock to async too
- rearranged the code to make locking more straight forward
- added newsfragment
- When evicting sessions from the async cache, once the cache has already been cleared, pop out of the lock and issue a ``threading.Timer`` to close the evicted sessions with a bit more time than the `DEFAULT_TIMEOUT` for a call. This ensures any evicted session has time to go through with its call before it is actually closed.
- Use dict.values() over dict.items() since keys are not being used
- Remove assert in favor of a ``logger.warning()``
- Fix some minor blips in comments
- Minor refactor with more descriptive method name for ``_close_evicted_async_sessions()``
- Support multithreading for sync cache
- Combine ``cache_async_session()`` with ``get_async_session()``
@fselmo fselmo force-pushed the feature/request_session_cache branch from 366abbe to 21c8064 Compare August 16, 2022 22:12
@fselmo fselmo merged commit b1fe3d5 into ethereum:master Aug 17, 2022
fselmo added a commit that referenced this pull request Aug 17, 2022
- Use dict.values() over dict.items() since keys are not being used
- Remove assert in favor of a ``logger.warning()``
- Fix some minor blips in comments
- Minor refactor with more descriptive method name for ``_close_evicted_async_sessions()``
pacrob pushed a commit to pacrob/web3.py that referenced this pull request Sep 8, 2022
- Use dict.values() over dict.items() since keys are not being used
- Remove assert in favor of a ``logger.warning()``
- Fix some minor blips in comments
- Minor refactor with more descriptive method name for ``_close_evicted_async_sessions()``
pacrob pushed a commit that referenced this pull request Sep 8, 2022
- Use dict.values() over dict.items() since keys are not being used
- Remove assert in favor of a ``logger.warning()``
- Fix some minor blips in comments
- Minor refactor with more descriptive method name for ``_close_evicted_async_sessions()``
@hotbroker
Copy link

hi,where i can see which version is solved

@pacrob
Copy link
Contributor

pacrob commented Sep 22, 2022

hi,where i can see which version is solved

Release notes are here

fselmo added a commit to fselmo/web3.py that referenced this pull request Oct 21, 2022
@fselmo fselmo mentioned this pull request Oct 21, 2022
1 task
fselmo added a commit to fselmo/web3.py that referenced this pull request Oct 21, 2022
fselmo added a commit to fselmo/web3.py that referenced this pull request Oct 21, 2022
fselmo added a commit to fselmo/web3.py that referenced this pull request Oct 24, 2022
- Use internal ``SessionCache`` over LRU cache for sync requests
- Increase cache size to ``100`` from ``20``
- Use unique cache key identifiers per thread / event pool for async
- Add related tests

Note: The `v6` version of these changes is more robust for async and provides better thread-safety. Due to needing to support python 3.6 in `v5`, these changes could not be back-ported unless they were significantly revised. Being that python 3.6 is deprecated, this did not seem like something worth refactoring. TL;DR - expect better request caching in `v6` :)
fselmo added a commit to fselmo/web3.py that referenced this pull request Oct 24, 2022
- Use internal ``SessionCache`` over LRU cache for sync requests
- Increase cache size to ``100`` from ``20``
- Use unique cache key identifiers per thread / event pool for async
- Add related tests

Note: The `v6` version of these changes is more robust for async and provides better thread-safety. Due to needing to support python 3.6 in `v5`, these changes could not be back-ported unless they were significantly revised. Being that python 3.6 is deprecated, this did not seem like something worth refactoring. TL;DR - expect better request caching in `v6` :)
fselmo added a commit that referenced this pull request Oct 24, 2022
- Use internal ``SessionCache`` over LRU cache for sync requests
- Increase cache size to ``100`` from ``20``
- Use unique cache key identifiers per thread / event pool for async
- Add related tests

Note: The `v6` version of these changes is more robust for async and provides better thread-safety. Due to needing to support python 3.6 in `v5`, these changes could not be back-ported unless they were significantly revised. Being that python 3.6 is deprecated, this did not seem like something worth refactoring. TL;DR - expect better request caching in `v6` :)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

script coredumps when using threads
6 participants