Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix python vectorizer, prompt and other little things #191

Merged
merged 10 commits into from
Jan 24, 2025

Conversation

jfrverdasca
Copy link
Contributor

@jfrverdasca jfrverdasca commented Jan 16, 2025

What this PR change:

  • Add file path to embeddings to allow retrieve files by its name in prompt (like "Add feature X to this_file.py")
  • Using python vectorizer, files with syntax errors not are loaded into embeddings in chunks.
  • Removed useless information added to embeddings when using python vectorizer (due to the way embeddings are used now)
  • Add line numbers to files sent in context following almost the same format of the console command nl -ba file.txt
  • Changed prompt to a more complete one adding a little project overview, general guidance, task guidelines, expected code style, testing and resources description showing better LLM results.
  • Sort files and steps: Now doesn't matter how LLM results response are, they are always sorted by file name and reverse line order to avoid operations that can cause line shifting
  • Added file_handler.py functions tests to ensure that file handle functions act as expected

Reviewers:

Issues:

Print screens showing the changes:

Captura de ecrã 2025-01-16, às 12 01 09

Findings:

Using Chunck Vectorizer and GPT embeddings model (described in the image) this is the result of retrieved embeddings for prompt Add created_at and updated_at fields to User model:
Captura de ecrã 2025-01-16, às 12 03 12

There are the results for the same prompt using nomicembeddings model (described in the image):
Captura de ecrã 2025-01-16, às 12 04 37

As we can see, the models.py file isn't retrieved and the other files are just garbage for what was asked

@jfrverdasca jfrverdasca requested review from JdFSilva, cld-vasconcelos and kenvontucky and removed request for JdFSilva and cld-vasconcelos January 16, 2025 12:06
@jfrverdasca jfrverdasca merged commit 64816be into main Jan 24, 2025
1 check passed
@jfrverdasca jfrverdasca deleted the jv/178/valid_syntax_check branch January 24, 2025 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants