-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#144: Extracted load_models into seperate class #161
#144: Extracted load_models into seperate class #161
Conversation
… changed test setup accordingly
def load_models(self, model_name: str, | ||
current_model_key, | ||
cache_dir, | ||
token_conn_obj) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we provide type annotations for all parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding the documentation strings and stuff: this whole class will be replaced with the one that will be created in #145. so i would rather spend the time to do proper docu for the new class and leave this one as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say the released code should be in a state of completeness. If the class morphs into something else so will the documentation. Plus it doesn't take long to get the docstrings sorted.
cache_dir, | ||
token_conn_obj) -> None: | ||
""" | ||
Load model and tokenizer model from the cached location in bucketfs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "Load the language model and tokenizer model" or "Load the model and tokenizer"
https://github.com/exasol/transformers-extension/issues/43. | ||
|
||
:param model_name: The model name to be loaded | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The model name to be loaded" => "The name of the model to be loaded".
Other parameters' description?
The description of the function doesn't reflect everything that the function is doing. For example, it doesn't say it will create and return a pipeline.
:param model_name: The model name to be loaded | ||
""" | ||
token = False | ||
if token_conn_obj: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
if token_con_obj is not None:
The object may implement some boolean conversion rules that evaluate to False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new bug ticket added here: #163
tokenizer, | ||
task_name, | ||
device | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding type annotations and/or parameter descriptions would be helpful.
self.last_loaded_model = None | ||
self.last_loaded_tokenizer = None | ||
self.model_loader.last_loaded_model = None | ||
self.model_loader.last_loaded_tokenizer = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a clear
method to the ModelLoader setting these two variables to None.
If last_laded_model
and last_loaded_tokenizer
still need to be visible I would make them properties.
BTW, if the last_created_pipeline
keeps references to these objects they won't be garbage-collected. So you probably need to it None too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will move the clear method from base udf also
model_name, cache_dir=cache_dir, use_auth_token=token) | ||
return None | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't you use the actual ModelLoader with a pipeline-like function that does nothing and returns None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the model loader is only initialized at run time of the udf because it gets input about the device thats only known then. i dont know of a way to change the functioncall at that point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all you need is to make a LoadModel
without a pipeline you could do something like this:
class DummyLoadModel(LoadModel):
def __init__(self, base_model, tokenizer, task_name, device):
super().__init__(self, lambda task_name, model, tokenizer, device, framework: None,
base_model, tokenizer, task_name, device)
But I am not sure you need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are completely right, i misunderstood. changed
All Submissions:
[CodeBuild]
to the commit messageFixes #144