Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All functionality has been consolidated into a single file for CLI/UI/Checkpointing and Added fix for issue 702 and added code for that as well, added instructions in local_inference README.md #757

Merged
merged 17 commits into from
Nov 20, 2024

Conversation

himanshushukla12
Copy link
Contributor

@himanshushukla12 himanshushukla12 commented Oct 29, 2024

What does this PR do?

This PR adds detailed instructions for using the multi_modal_infer.py script to generate text from images after fine-tuning the Llama 3.2 vision model. The script supports merging PEFT adapter weights from a specified path. The changes include:

  • Adding a new section in the

LLM_finetuning_overview.md

file under the "Inference" heading.

  • Providing a usage example for running the inference script with the necessary parameters.

Fixes # (issue)

Feature/Issue validation/testing

Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

  • Test A: Verified that the code-merge-inference.py script runs successfully with the provided example command.
    Logs for Test A:
    python multi_modal_infer.py \
        --image_path "path/to/your/image.png" \
        --prompt_text "Your prompt text here" \
        --temperature 1 \
        --top_p 0.5 \
        --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
        --hf_token "your_hugging_face_token" \
        --finetuning_path "path/to/your/finetuned/model"

Output:

Loading checkpoint shards: 100%|██████████████████| 5/5 [00:03<00:00,  1.40it/s]
Loading adapter from 'PATH/to/save/PEFT/model'...
Adapter merged successfully with the pre-trained model.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Thanks for contributing 🎉!

@init27
Copy link
Contributor

init27 commented Oct 29, 2024

@himanshushukla12 is our community legend! Thanks for another PR! :)

@himanshushukla12
Copy link
Contributor Author

@init27 Thank you for the recognition😊

@wukaixingxp wukaixingxp self-requested a review October 30, 2024 18:33
Copy link
Contributor

@wukaixingxp wukaixingxp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for another PR that is super helpful for our users to their fine-tuned Lora checkpoints for inference. I noticed that for vision model we are having 2 inference script: multi_modal_infer.py and multi_modal_infer_gradio_UI.py (Thanks to your help!). I wonder if it is better to add the Lora ability on top of them, instead of creating a new script. Having 3 inference scripts for vision model may be a little confusing for new users. Maybe later when I have time, we can work together to also merge all the inference scripts under local_inference into one script that can handle both text model and vision model, and have options for --gradio_ui and --lora_adaptor. Ideally it will be much easier for user to just learn one script that can handle everything. Let me know if you have any suggestion! Thank you again for this great PR.

recipes/quickstart/finetuning/LLM_finetuning_overview.md Outdated Show resolved Hide resolved
recipes/quickstart/finetuning/code-merge-inference.py Outdated Show resolved Hide resolved
@himanshushukla12
Copy link
Contributor Author

@wukaixingxp All changes are successfully implemented and is described below

Model Overview

  • Base model: meta-llama/Llama-3.2-11B-Vision-Instruct
  • Uses PEFT library (v0.13.1) for efficient fine-tuning
  • Supports vision-language tasks with instruction capabilities

Key Features in

multi_modal_infer.py

All functionality has been consolidated into a single file with three main modes:

  1. Basic Inference
python multi_modal_infer.py \
    --image_path "path/to/image.jpg" \
    --prompt_text "Describe this image" \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --hf_token "your_token"
  1. Gradio UI Mode
python multi_modal_infer.py \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --hf_token "your_token" \
    --gradio_ui
  1. LoRA Fine-tuning Integration
python multi_modal_infer.py \
    --image_path "path/to/image.jpg" \
    --prompt_text "Describe this image" \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --hf_token "your_token" \
    --finetuning_path "path/to/lora/weights"

Key Improvements

  • Single file implementation instead of multiple scripts
  • Dynamic LoRA loading through UI toggle
  • Integrated model state management
  • Unified command line interface
  • Interactive web UI with parameter controls
  • Support for both CLI and UI-based workflows

Kindly let me know if there is something left. We can figure it out.

@himanshushukla12 himanshushukla12 changed the title Added fix for issue 702 and added code for that as well, added instructions in LLM_finetuning_overview.md as well All functionality has been consolidated into a single file and Added fix for issue 702 and added code for that as well, added instructions in LLM_finetuning_overview.md as well Nov 2, 2024
@himanshushukla12 himanshushukla12 changed the title All functionality has been consolidated into a single file and Added fix for issue 702 and added code for that as well, added instructions in LLM_finetuning_overview.md as well All functionality has been consolidated into a single file for CLI/UI/Checkpointing and Added fix for issue 702 and added code for that as well, added instructions in local_inference /README.md as well Nov 2, 2024
@himanshushukla12 himanshushukla12 marked this pull request as draft November 2, 2024 21:25
@himanshushukla12
Copy link
Contributor Author

Test of all the three modes

  1. Basic Inference
python multi_modal_infer.py \
    --image_path "path/to/image.jpg" \
    --prompt_text "Describe this image" \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --hf_token "your_token"
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.30it/s]
Generated Text: end_header_id|>

The image presents a complex network diagram comprising 10 nodes, each represented by a distinct colored square... (continued)
  1. Gradio UI Mode
python multi_modal_infer.py \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --hf_token "your_token" \
    --gradio_ui

Output:

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.32it/s]
/home/llama-recipes/venvLlamaRecipes/lib/python3.10/site-packages/gradio/components/chatbot.py:222: UserWarning: You have not specified a value for the `type` parameter. Defaulting to the 'tuples' format for chatbot messages, but this is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style 'role' and 'content' keys.
  warnings.warn(
* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
  1. LoRA Fine-tuning Integration
python multi_modal_infer.py \
    --image_path "path/to/image.jpg" \
    --prompt_text "Describe this image" \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --hf_token "your_token" \
    --finetuning_path "path/to/lora/weights"

Output:

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.31it/s]
Loading adapter from '/home/llama-recipes/PATH2/to/save/PEFT/model/'...
Adapter merged successfully
Generated Text: end_header_id|>

@himanshushukla12 himanshushukla12 marked this pull request as ready for review November 2, 2024 22:10
@himanshushukla12
Copy link
Contributor Author

@wukaixingxp please check the PR I'm waiting for your response. Thank you.

@wukaixingxp
Copy link
Contributor

@himanshushukla12 Thanks for this PR, sorry for the late reply. Can we remove the --hf_token and use environment variable to control the HF login instead? Ideally, we should support users to use every possible way listed here to get their authorization from HF, eg: if a user get their authorization from huggingface-cli login they should be able to run our script as well.

@himanshushukla12
Copy link
Contributor Author

himanshushukla12 commented Nov 19, 2024

HF login using CLI

Login using huggingface-cli login

Loading model: meta-llama/Llama-3.2-11B-Vision-Instruct

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 

Test of all the three modes

  1. Basic Inference
python multi_modal_infer.py \
    --image_path "path/to/image.jpg" \
    --prompt_text "Describe this image" \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct"
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.30it/s]
Generated Text: end_header_id|>

The image presents a complex network diagram comprising 10 nodes, each represented by a distinct colored square... (continued)
  1. Gradio UI Mode
python multi_modal_infer.py \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --gradio_ui

Output:

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.32it/s]
/home/llama-recipes/venvLlamaRecipes/lib/python3.10/site-packages/gradio/components/chatbot.py:222: UserWarning: You have not specified a value for the `type` parameter. Defaulting to the 'tuples' format for chatbot messages, but this is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style 'role' and 'content' keys.
  warnings.warn(
* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
  1. LoRA Fine-tuning Integration
python multi_modal_infer.py \
    --image_path "path/to/image.jpg" \
    --prompt_text "Describe this image" \
    --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct" \
    --finetuning_path "path/to/lora/weights"

Output:

Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.31it/s]
Loading adapter from '/home/llama-recipes/PATH/to/save/PEFT/model/'...
Adapter merged successfully
Generated Text: end_header_id|>

The changes ensure seamless CLI login:

  • If the user is already logged in, the process will run normally.
  • If not logged in, a prompt will appear asking for the token.

All the requested changes are merged, please check and let me know. @wukaixingxp

@wukaixingxp wukaixingxp self-requested a review November 20, 2024 03:49
Copy link
Contributor

@wukaixingxp wukaixingxp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this great work! Everything looks good to me

@wukaixingxp wukaixingxp merged commit 7579b61 into meta-llama:main Nov 20, 2024
4 checks passed
@himanshushukla12 himanshushukla12 changed the title All functionality has been consolidated into a single file for CLI/UI/Checkpointing and Added fix for issue 702 and added code for that as well, added instructions in local_inference /README.md as well All functionality has been consolidated into a single file for CLI/UI/Checkpointing and Added fix for issue 702 and added code for that as well, added instructions in local_inference README.md as well Nov 20, 2024
@himanshushukla12 himanshushukla12 changed the title All functionality has been consolidated into a single file for CLI/UI/Checkpointing and Added fix for issue 702 and added code for that as well, added instructions in local_inference README.md as well All functionality has been consolidated into a single file for CLI/UI/Checkpointing and Added fix for issue 702 and added code for that as well, added instructions in local_inference README.md Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants