-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add default repos for qtd #714
Add default repos for qtd #714
Conversation
Codecov ReportPatch and project coverage have no change.
Additional details and impacted files@@ Coverage Diff @@
## main #714 +/- ##
=======================================
Coverage 79.01% 79.01%
=======================================
Files 79 79
Lines 5004 5004
=======================================
Hits 3954 3954
Misses 1050 1050 ☔ View full report in Codecov by Sentry. |
@@ -11,6 +12,8 @@ | |||
from superduperdb.db.mongodb.query import Collection | |||
|
|||
|
|||
REPOS = {'fastapi': 'https://github.com/tiangolo/fastapi', 'superduperdb':'https://github.com/SuperDuperDB/superduperdb', 'langchain': 'https://github.com/langchain-ai/langchain'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle logic when repository is not in this dictionary.
@@ -51,3 +55,29 @@ def save_github_md_files_locally(owner, name, documentation_location): | |||
f.write(content) | |||
|
|||
return Path(f"docs/{name}").glob("*") | |||
|
|||
|
|||
def clone_repo(git_url, target_dir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or which code do we want to use to get the repo?
def save_document_from_git_url(path_to_repo, repo_dir): | ||
if not os.path.exists(path_to_repo): | ||
warn(f"Path to repo: {path_to_repo} does not exist, fetching...") | ||
clone_repo(path_to_repo, os.path.join(repo_dir, path_to_repo)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it looks like we're using clone_repo
.
@@ -2,26 +2,27 @@ | |||
|
|||
|
|||
class FastAPISettings(BaseSettings): | |||
mongo_uri: str | |||
mongo_uri: str = 'mongodb://localhost:27018/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add port 27017 as default.
context_select = collection.like( | ||
{settings.vector_embedding_key: query.query}, | ||
n=settings.nearest_to_query, | ||
vector_index="documentation_index", | ||
vector_index=query.document_index, | ||
).find() | ||
|
||
# Step 2: Execute your query | ||
# INSERT INFORMATION HERE | ||
db = request.app.superduperdb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we want to use asyncio
here for performance reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a bunch of comments. I think they are mostly personal opinions, so probably we need @blythed to also give some input.
artifacts = _create_ai_text_artifacts() | ||
documents = [Document({settings.vector_embedding_key: v}) for v in artifacts] | ||
db.execute(Collection(settings.mongo_collection_name).insert_many(documents)) | ||
with tempfile.TemporaryDirectory() as tmpdir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to reduce the code here (or else move it to another location). This is my reasoning:
- The idea is illustrate to users what a typical SuperDuperDB app will look like.
- For this purpose we have created a
components
module and anartifacts
module. We are suggesting that every SuperDuperDB app should also contain these modules. - To explain to users how they can use these modules for other apps, we should try to only put general patterns in these modules. This way, users can understand how these modules can be designed for their own apps.
- At the moment, we have a bunch of code referring to repos and temporary directories in this module. This code is specific to this app, and might confuse the user ie for a different app, a user does not need to have logic dealing with repos and temporary directories.
- I believe that we should try to replace this code with a simple abstraction, to make it very clear how to use this module in the future. That is what I tried to do (probably not very well) in the original code.
@@ -51,3 +55,29 @@ def save_github_md_files_locally(owner, name, documentation_location): | |||
f.write(content) | |||
|
|||
return Path(f"docs/{name}").glob("*") | |||
|
|||
|
|||
def clone_repo(git_url, target_dir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, my personal preference is to use the web API, as I feel it looks 'cleaner'. Perhaps @blythed can give his opinion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, looks more professional and the user might not have .git
installed for some reason.
@@ -2,26 +2,27 @@ | |||
|
|||
|
|||
class FastAPISettings(BaseSettings): | |||
mongo_uri: str | |||
mongo_uri: str = 'mongodb://localhost:27018/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will we connect to the cloud DB here? Do we pass an .env
file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You just change the URI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could think about passing a password as an environment variable and formatting it into the mongo_uri
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably me not using pydantic
correctly, but for these kinds of things I don't provide a default value and instead set an environment variable, and pydantic
seems to detect it.
But this actually feels like a hack now that I think about it. Would be happy for anyone to tell me the best way to change these values at runtime with pydantic
.
).find() | ||
|
||
# Step 2: Execute your query | ||
# INSERT INFORMATION HERE | ||
db = request.app.superduperdb | ||
db_response, _ = await db.apredict( | ||
db_response, _ = db.predict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be apredict
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Description
This pr refactors the QTD app to support 3 default repositories and create index for respective repo during startup
and then user can select the repo during query api.
Related Issues
Checklist
make test
successfully?Additional Notes or Comments