Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor artifact.py, fix an issue, identify two others #452

Merged
merged 3 commits into from
Jul 17, 2023
Merged

Refactor artifact.py, fix an issue, identify two others #452

merged 3 commits into from
Jul 17, 2023

Conversation

rec
Copy link
Contributor

@rec rec commented Jul 16, 2023

Description

core/artifact.py got the quality makeover, exposing three issues.

Related Issue(s)

See #408

Additional Notes

I'll do a review, pointing out the key changes.

@rec rec requested a review from blythed July 16, 2023 12:43
# It is likely that other parts of the code are using this cache in an
# undisciplined fashion. We should formalize this cache.
return {
# 'artifact': self.artifact, :-(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole cache is useless, because we weren't caching the actual underlying artifact.

More, if I try to add the artifact to the cache, I get exceptions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe cache is the wrong choice of words.

object_id = r.pop('object_id', None)
try:
# TODO: this won't be any use without 'artifact' being set
return Artifact(**cache[object_id])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was originally returning a dict which is not what the code intended to do.

except KeyError:
pass

# TODO: if `file_id` is not popped but passed to the constructor of Artifact,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also seems to mean we aren't keeping the file_id so we will believe we are replacing files we just "already know about.

@@ -105,7 +106,7 @@ def retriever(self) -> LangchainRetriever:
def chain(self) -> Chain:
assert hasattr(self, 'retriever')
return RetrievalQAWithSourcesChain.from_chain_type(
llm=self.object.a,
llm=t.cast(BaseLanguageModel, self.object.artifact),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't resist adding a cast.

All the changes after this point are uninteresting. :-)

return a

@property
def a(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This short-hand is used all over the code-base. Do the tests pass with that removed?

@@ -25,7 +25,7 @@ def TransformersTrainerConfiguration(identifier: str, *args, **kwargs):
class Pipeline(Model):
@property
def pipeline(self):
return self.object.a
return self.object.artifact
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see you've replaced all of those.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, .a is just too, too brief. And it isn't actually used in that many places.

@rec rec merged commit 9c57208 into superduper-io:main Jul 17, 2023
@rec rec deleted the artifact branch July 17, 2023 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants