Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Study and discoveries of embeddings and tools #118

Merged
merged 30 commits into from
Oct 24, 2024
Merged

Conversation

jfrverdasca
Copy link
Contributor

@jfrverdasca jfrverdasca commented Oct 8, 2024

What this PR change:

  • Add a Jupyter notebook with all findings about embeddings optimization and tools
  • Add a Jupyter notebook to allow run different embeddings types and document findings, with examples and results
  • Add a Python parser to allow structured embeddins of python code
  • Add a structured python code embeddins logic

Reviewers:

Issues:

Print screens showing the changes:

Captura de ecrã 2024-10-18, às 16 19 33

@jfrverdasca jfrverdasca self-assigned this Oct 8, 2024
* Add /labs/parsers/python with a class to parse python code files
* Add chunck vectorizer
* Add python vectorizer
* Add vectorizer factory to fast changing embeddins method
* Add embeddins jupyter notebook with results from both methods
* Add makefile zsh command to allow docker compose comands (and other) with envars set
@jfrverdasca jfrverdasca linked an issue Oct 18, 2024 that may be closed by this pull request
@jfrverdasca jfrverdasca marked this pull request as ready for review October 18, 2024 15:41
labs/database/vectorize/factory.py Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@jfrverdasca jfrverdasca merged commit 12301d9 into main Oct 24, 2024
1 check passed
@jfrverdasca jfrverdasca deleted the 113/embeddings branch October 24, 2024 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test the best approach for embeddings
5 participants