You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The "@codebase" context provider allows you to ask questions without explicitly specifying which files should be included as context. Instead, Continue will use embeddings to pull out the most important files to answer your question.
The current implementation uses a fairly simple setup with LanceDB. There is tons of room to improve the indexing and retrieval steps. Most of the code can be found in core/indexing
Here are some of the ideas for how the pipeline can be improved (and you can also contribute by adding your own ideas here!):
Chunking
Code-aware chunking (for example chunking by function or class) (consider using tree-sitter)
Separating the text used for similarity search and the text actually returned (for example, you might write a short preamble summary in the text used for similarity search, or use the reverse of the technique of converting the question to a potential answer before doing search)
Convert the input to some text that is more appropriate for search (e.g. to a possible answer to the question, and then similarity search on that)
Custom embeddings model (currently using ada or sentence transformers (in order to be local))
Re-ranking: retrieve many options and then prune afterward
Improve the re-ranking prompts (currently there is a "remove" prompt that choose which files are irrelevant, and an "include" prompt that says which files are important
Weight chunks by information like commit frequency/recency, file length, etc.
Use other retrieval methods like fuzzy search, ripgrep, etc. to expand the initial pool
Take into account metadata like filename or path
Use code graph to include files that are adjacent to multiple other selected files, or for other reasons
The text was updated successfully, but these errors were encountered:
Possibility to insert documents from outside the repo. Would be good to be able to have a seperate feed channel into db for things like tickets, convos, other docs.
The "@codebase" context provider allows you to ask questions without explicitly specifying which files should be included as context. Instead, Continue will use embeddings to pull out the most important files to answer your question.
The current implementation uses a fairly simple setup with LanceDB. There is tons of room to improve the indexing and retrieval steps. Most of the code can be found in
core/indexing
Here are some of the ideas for how the pipeline can be improved (and you can also contribute by adding your own ideas here!):
The text was updated successfully, but these errors were encountered: