Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Cache for runtime workers #220

Merged
merged 9 commits into from
Feb 21, 2024
Merged

Feature: Cache for runtime workers #220

merged 9 commits into from
Feb 21, 2024

Conversation

mtweiden
Copy link
Contributor

This PR adds the ability to interact with a "runtime cache" for workers. Objects that are not CPickle-able (such as Pytorch modules) can instead be loaded to cache. Doing so ensures that objects are only loaded by each worker a single time. The cache is implemented as a simple Python dictionary.

Use:
In a worker process, items in worker cache can be accessed using the following general outline:

from bqskit.runtime import get_runtime

worker_cache = get_runtime().get_cache()

if 'object_name' not in worker_cache:
    object = load_object()
    worker_cache['object_name'] = object
else:
    object = worker_cache['object_name']

@mtweiden mtweiden requested a review from edyounis January 29, 2024 19:44
Copy link
Member

@edyounis edyounis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the most part LGTM, thanks for adding this feature. Please just update the get_cache documentation.

A worker's local cache is implemented as a dictionary. This function can
be used to check the presence of data, or to place data directly into, a
worker's local cache.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a "worker's local cache"? The documentation here should explain how this can be used while abstracting details of how the runtime works. Consider a user reading this may not know (nor should they know) what a worker is. What is important is that this cache is transient. You can place things in here, to reduce reloading, but they may not be in here later. This informs the user that when using this cache, you should write something like:

def load_expensive_io_model_or_something_or_another(file):
    cache = get_runtime().get_cache()
    if some_file_name_key in cache:
        return that
    # load it; store in cache; return it
    ...

Not saying you can't or shouldn't put implementation details later in the docstring, but the first points should inform the user how to use this method/function/class/etc. In this case, definitely include somewhere that this cache is local to a specific worker and is never serialized/pickled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed reference to workers here. Added example as to how to use the cache.


Returns:
(dict[str, Any]): The worker's local cache implemented
with a dictionary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's totally me being pedantic, feel free to ignore this completely, but I did want to mention that you do not need to say it's implemented as a dictionary when the type information is right there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved references to worker specific cache here.

@edyounis edyounis merged commit 8e7aae5 into main Feb 21, 2024
5 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants