NativeChat is a simple, win32 native frontend for llama-cli. It redirects the console input and output of llama-cli into a gui interface. By using llama-cli, you can explore small models directly from your old laptop, with no need for an expensive graphics card.
-
Install C++ Runtime:
Locate and install the C++ runtime in theredist
directory (VC_redist 2015-2022.x64.exe
). -
Download Required Models:
Download the necessary GGUF models from Hugging Face and place them in themodels
directory. (qwen2.5-0.5b-instruct-q5_k_m.gguf)
-
Run
NativeChat.exe
. -
Select the required model and prompt.
- When changing the model or prompt, the program will restart automatically.
- To run another instance, simply execute the program again.
-
Once the model is loaded, the Send button will become active.
- The lower text area allows you to send input to the model.
- The upper text area displays the model's response.
NativeChat supports single model with various LoRA adapters rather than needing fine-tuned models for each task. To utilize adapters:
- Use the
convert_lora_to_gguf.py
script from llamacpp to convert the adapter into a GGUF file. - Place this adapter GGUF file in the
adapters
directory. - In the
adapters
directory, create a UTF-8 text file named after your model (e.g.,"your_model".txt
). - List the adapter file names intended for use with that model in the text file, separated by new lines (
\r\n
).
- Download nomic-embed-text-v1.5.Q8_0.gguf and place it in the program directory.
- Create a UTF-8 formatted text file containing the knowledge data, with each paragraph separated by
\r\n\r\n
. - Import this text file into the vector database using the Add to Vector Database option.
- Enable the Answer Using Vector Database option to utilize embedded text responses.
-
Show Initialization: To view the system prompt on loading, enable the "Show Initialization" option in the settings.
-
Using CUDA for Improved Performance:
For larger models or improved inference speed, replace the following files with their CUDA-equivalent versions. By default, supplied files only utilize the CPU, allowing simple models to run on older systems.ggml.dll
llama.dll
llama-cli.exe
llama-embedding.exe
- If the program fails to load or crashes, delete the
settings.cfg
file. - Use Task Manager to terminate any non-responsive
llama-cli.exe
processes.
NativeChat installation is portable. You can run multiple instances from different folders independently.