Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default repos for qtd #714

Merged
merged 1 commit into from
Aug 17, 2023
Merged

Add default repos for qtd #714

merged 1 commit into from
Aug 17, 2023

Conversation

thejumpman2323
Copy link
Contributor

@thejumpman2323 thejumpman2323 commented Aug 16, 2023

Description

This pr refactors the QTD app to support 3 default repositories and create index for respective repo during startup

and then user can select the repo during query api.

Related Issues

Checklist

  • Is this code covered by new or existing unit tests or integration tests?
  • Did you run make test successfully?
  • Do new classes, functions, methods and parameters all have docstrings?
  • Were existing docstrings updated, if necessary?
  • Was external documentation updated, if necessary?

Additional Notes or Comments

@thejumpman2323 thejumpman2323 requested review from blythed and nenb August 16, 2023 16:40
@thejumpman2323 thejumpman2323 marked this pull request as ready for review August 16, 2023 16:40
@codecov-commenter
Copy link

codecov-commenter commented Aug 16, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (e4aa8d0) 79.01% compared to head (902cf37) 79.01%.

❗ Current head 902cf37 differs from pull request most recent head 2ca17e0. Consider uploading reports for the commit 2ca17e0 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #714   +/-   ##
=======================================
  Coverage   79.01%   79.01%           
=======================================
  Files          79       79           
  Lines        5004     5004           
=======================================
  Hits         3954     3954           
  Misses       1050     1050           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -11,6 +12,8 @@
from superduperdb.db.mongodb.query import Collection


REPOS = {'fastapi': 'https://github.com/tiangolo/fastapi', 'superduperdb':'https://github.com/SuperDuperDB/superduperdb', 'langchain': 'https://github.com/langchain-ai/langchain'}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle logic when repository is not in this dictionary.

@@ -51,3 +55,29 @@ def save_github_md_files_locally(owner, name, documentation_location):
f.write(content)

return Path(f"docs/{name}").glob("*")


def clone_repo(git_url, target_dir):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or which code do we want to use to get the repo?

def save_document_from_git_url(path_to_repo, repo_dir):
if not os.path.exists(path_to_repo):
warn(f"Path to repo: {path_to_repo} does not exist, fetching...")
clone_repo(path_to_repo, os.path.join(repo_dir, path_to_repo))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it looks like we're using clone_repo.

@@ -2,26 +2,27 @@


class FastAPISettings(BaseSettings):
mongo_uri: str
mongo_uri: str = 'mongodb://localhost:27018/'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add port 27017 as default.

context_select = collection.like(
{settings.vector_embedding_key: query.query},
n=settings.nearest_to_query,
vector_index="documentation_index",
vector_index=query.document_index,
).find()

# Step 2: Execute your query
# INSERT INFORMATION HERE
db = request.app.superduperdb
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we want to use asyncio here for performance reasons.

Copy link
Contributor

@nenb nenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a bunch of comments. I think they are mostly personal opinions, so probably we need @blythed to also give some input.

artifacts = _create_ai_text_artifacts()
documents = [Document({settings.vector_embedding_key: v}) for v in artifacts]
db.execute(Collection(settings.mongo_collection_name).insert_many(documents))
with tempfile.TemporaryDirectory() as tmpdir:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to reduce the code here (or else move it to another location). This is my reasoning:

  1. The idea is illustrate to users what a typical SuperDuperDB app will look like.
  2. For this purpose we have created a components module and an artifacts module. We are suggesting that every SuperDuperDB app should also contain these modules.
  3. To explain to users how they can use these modules for other apps, we should try to only put general patterns in these modules. This way, users can understand how these modules can be designed for their own apps.
  4. At the moment, we have a bunch of code referring to repos and temporary directories in this module. This code is specific to this app, and might confuse the user ie for a different app, a user does not need to have logic dealing with repos and temporary directories.
  5. I believe that we should try to replace this code with a simple abstraction, to make it very clear how to use this module in the future. That is what I tried to do (probably not very well) in the original code.

@@ -51,3 +55,29 @@ def save_github_md_files_locally(owner, name, documentation_location):
f.write(content)

return Path(f"docs/{name}").glob("*")


def clone_repo(git_url, target_dir):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, my personal preference is to use the web API, as I feel it looks 'cleaner'. Perhaps @blythed can give his opinion here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks more professional and the user might not have .git installed for some reason.

@@ -2,26 +2,27 @@


class FastAPISettings(BaseSettings):
mongo_uri: str
mongo_uri: str = 'mongodb://localhost:27018/'
Copy link
Contributor

@nenb nenb Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will we connect to the cloud DB here? Do we pass an .env file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just change the URI.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could think about passing a password as an environment variable and formatting it into the mongo_uri

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably me not using pydantic correctly, but for these kinds of things I don't provide a default value and instead set an environment variable, and pydantic seems to detect it.

But this actually feels like a hack now that I think about it. Would be happy for anyone to tell me the best way to change these values at runtime with pydantic.

).find()

# Step 2: Execute your query
# INSERT INFORMATION HERE
db = request.app.superduperdb
db_response, _ = await db.apredict(
db_response, _ = db.predict(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be apredict.

Copy link
Contributor

@nenb nenb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@thejumpman2323 thejumpman2323 merged commit 8a78eb5 into superduper-io:main Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants