Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Improve package/binary size by remove jinja2 #1063

Closed
nguyenhoangthuan99 opened this issue Aug 23, 2024 · 3 comments · Fixed by #1289
Closed

chore: Improve package/binary size by remove jinja2 #1063

nguyenhoangthuan99 opened this issue Aug 23, 2024 · 3 comments · Fixed by #1289
Assignees
Labels
P1: important Important feature / fix
Milestone

Comments

@nguyenhoangthuan99
Copy link
Contributor

  • The jinja2 make the binary double size because contains too many deps from boost. Because jinja2 only parse model from gguf model. We can remove this part and pass to cortex.llamacpp to handle to reduce size
@imtuyethan imtuyethan transferred this issue from another repository Sep 2, 2024
@freelerobot freelerobot changed the title Improve package/binary size by remove jinja2 chore: Improve package/binary size by remove jinja2 Sep 6, 2024
@freelerobot freelerobot added the P1: important Important feature / fix label Sep 6, 2024
@freelerobot
Copy link
Contributor

@nguyenhoangthuan99 can you elaborate on this issue?
Are you talking about only including jinja2 in cortex.llamacpp, instead of the overall cortexcpp packager or soemthing else?

@dan-menlo dan-menlo moved this from Planning to Scheduled in Menlo Sep 8, 2024
@nguyenhoangthuan99
Copy link
Contributor Author

Problem

  • cortex-cpp using Jinja2-cpp lib to parse chat format from GGUF file. This will help us to run model from every source.
  • The jinja2-cpp is the only cpp lib that can support render jinja2 template that can build multi platform, but it has many deps from boosts.
  • Llama.cpp also support parse these jinja2-template to chat format internally -> if we want to use that feature we have to build llamacpp along with cortex-cpp -> this is also not recommended because the llamacpp repo is too large.

Solution

  • All model with gguf file format only run with cortex.llamacpp engines, for that reason, we will move the part parse chat template for cortex.llamacpp engines. And this part will be executed during runtime (when user start a model using cortex.llamacpp engine, it will parse chat template).

  • This solution require more effort and can save 60 Mb of binary file.

@nguyenhoangthuan99 nguyenhoangthuan99 moved this from Scheduled to In Progress in Menlo Sep 23, 2024
@nguyenhoangthuan99 nguyenhoangthuan99 moved this from In Progress to In Review in Menlo Sep 23, 2024
@github-project-automation github-project-automation bot moved this from In Review to Completed in Menlo Sep 23, 2024
@nguyenhoangthuan99 nguyenhoangthuan99 moved this from Completed to QA in Menlo Sep 23, 2024
@gabrielle-ong
Copy link
Contributor

closing issue, thanks @nguyenhoangthuan99

@gabrielle-ong gabrielle-ong moved this from Review + QA to Completed in Menlo Oct 3, 2024
@gabrielle-ong gabrielle-ong added this to the v1.0.0 milestone Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1: important Important feature / fix
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants