Skip to content
This repository has been archived by the owner on Aug 30, 2024. It is now read-only.

Add support for large prompts that don't fit in cmd line #133

Merged
merged 6 commits into from
Mar 14, 2024

Conversation

aahouzi
Copy link
Member

@aahouzi aahouzi commented Feb 20, 2024

Type of Change

  • Testing the prompt evaluation phase requires large number of tokens (e.g: +1000 tokens) that usually don't fit in cmd line, this PR allows user to provide large prompts via a txt file just like llama.cpp

Description

  • This PR adds support for large prompts via txt files just like llama.cpp, it's a useful feature to test Neural Speed during the prompt evaluation phase.

Expected Behavior & Potential Risk

  • N/A

How has this PR been tested?

# Run script
NEURAL_SPEED_VERBOSE=1 python scripts/run.py <huggingface-llama2> --weight_dtype int4 --compute_dtype int8 --group_size -1 -f summarize-keynote-2105-tokens.txt --ctx_size 2109

# Inference script
NEURAL_SPEED_VERBOSE=1 python scripts/inference.py --model_name llama2 -m llama_files/ne_llama_int4.bin -n 512 -f summarize-keynote-2105-tokens.txt --ctx_size 2109

Dependency Change?

  • N/A

@aahouzi
Copy link
Member Author

aahouzi commented Mar 13, 2024

@zhenwei-intel I think this PR is ready to be merged, can you please complete the review ?

@hshen14 hshen14 merged commit e76a58e into intel:main Mar 14, 2024
6 checks passed
@aahouzi aahouzi deleted the large_prompt_feat branch March 17, 2024 12:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants