-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix!: update eval-tasks -> benchmarks #1032
Conversation
Why not just move the code over to /eval? Also, isnt this a breaking change? we should probably deprecate the eval-task methods first? |
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update CHANGELOG.md to reflect on the breaking change, thanks!
Lets rename tasks to benchmarks as discussed offline. |
We can update the PR title with an "fix!: xxx" to indicate breaking changes |
Plan for deprecation:
|
Kinda related but not 100% sure if considered breaking: #1023. It might be a good idea to change |
I actually want to combine /vector-dbs and /vector-io into one API (same thing about shields and safety)… We had chatted about this unification a while ago @raghotham.
|
Sure. My PR #1023 only modifies the Python classes. We can probably work on merging those two APIs separately |
Let us please come up with a plan for all APIs (will inference and models be merged as well?). Also, it will be good to consider that how we would handle things when we eventually add a notion of namespacing for resources (like projects) to help with access control etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, when do you want to update the eval notebook to use the new APIs ?
@@ -28,7 +28,7 @@ class Api(Enum): | |||
vector_dbs = "vector_dbs" | |||
datasets = "datasets" | |||
scoring_functions = "scoring_functions" | |||
eval_tasks = "eval_tasks" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we not leave this as when we are is in the deprecation time ?
@@ -8,12 +8,12 @@ | |||
from modules.api import llama_stack_api | |||
|
|||
|
|||
def eval_tasks(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename this file to benchmarks.py
eval_stack[Api.eval], | ||
eval_stack[Api.eval_tasks], | ||
eval_stack[Api.benchmarks], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep one test with the old eval_task APIs , so that we know that it still works ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Api enum is used for server implementation only. I prefer that (1) we keep the server side internal implementation naming consistent with moving to all "benchmarks". (2) Keep backward compatibility with still supporting the /eval-tasks endpoint but mark it deprecated, this is make it in a sense that client SDK still works without any updates (3) It is confusing and error prone to have these 2 conventions living in the codebase for long, so I think we should completely deprecate as soon as possible for next release so that people writing new eval providers do not get confused and frustrated with the future refactors.
The eval notebook with old API works with no client update. I think should update notebook to use new APIs after client update & package release. |
type: Literal["benchmark"] = "benchmark" | ||
eval_candidate: EvalCandidate | ||
num_examples: Optional[int] = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_examples no longer needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this comes from the "Remove differentiation between app v.s. benchmark eval task config. Now we only have BenchmarkConfig". num_examples
is still being kept in BenchmarkConfig
.
self, | ||
task_id: str, | ||
input_rows: List[Dict[str, Any]], | ||
scoring_functions: List[str], | ||
task_config: EvalTaskConfig, | ||
task_config: BenchmarkConfig, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task_config: BenchmarkConfig, | |
benchmark_config: BenchmarkConfig, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the deprecated function which its param name must be kept the same, otherwise we would not keep backward compat
# What does this PR do? Support listing all for `llama stack list-providers`. For ease of reading, sort the output rows by type. Before the change. ``` llama stack list-providers usage: llama stack list-providers [-h] {inference,safety,agents,vector_io,datasetio,scoring,eval,post_training,tool_runtime,telemetry} llama stack list-providers: error: the following arguments are required: api ``` After the change. ``` +---------------+----------------------------------+----------------------------------------------------------------------------------+ | API Type | Provider Type | PIP Package Dependencies | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | agents | inline::meta-reference | matplotlib,pillow,pandas,scikit-learn,aiosqlite,psycopg2-binary,redis | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | datasetio | inline::localfs | pandas | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | datasetio | remote::huggingface | datasets | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | eval | inline::meta-reference | | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | inline::meta-reference | accelerate,blobfile,fairscale,torch,torchvision,transformers,zmq,lm-format- | | | | enforcer,sentence-transformers | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | inline::meta-reference-quantized | accelerate,blobfile,fairscale,torch,torchvision,transformers,zmq,lm-format- | | | | enforcer,sentence-transformers,fbgemm-gpu,torchao==0.5.0 | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | inline::sentence-transformers | sentence-transformers | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | inline::vllm | vllm | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::bedrock | boto3 | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::cerebras | cerebras_cloud_sdk | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::databricks | openai | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::fireworks | fireworks-ai | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::groq | groq | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::hf::endpoint | huggingface_hub,aiohttp | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::hf::serverless | huggingface_hub,aiohttp | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::nvidia | openai | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::ollama | ollama,aiohttp | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::runpod | openai | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::sambanova | openai | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::tgi | huggingface_hub,aiohttp | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::together | together | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | inference | remote::vllm | openai | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | post_training | inline::torchtune | torch,torchtune==0.5.0,torchao==0.8.0,numpy | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | safety | inline::code-scanner | codeshield | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | safety | inline::llama-guard | | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | safety | inline::meta-reference | transformers,torch --index-url https://download.pytorch.org/whl/cpu | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | safety | inline::prompt-guard | transformers,torch --index-url https://download.pytorch.org/whl/cpu | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | safety | remote::bedrock | boto3 | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | scoring | inline::basic | | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | scoring | inline::braintrust | autoevals,openai | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | scoring | inline::llm-as-judge | | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | telemetry | inline::meta-reference | opentelemetry-sdk,opentelemetry-exporter-otlp-proto-http | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | tool_runtime | inline::code-interpreter | | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | tool_runtime | inline::rag-runtime | | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | tool_runtime | remote::bing-search | requests | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | tool_runtime | remote::brave-search | requests | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | tool_runtime | remote::model-context-protocol | mcp | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | tool_runtime | remote::tavily-search | requests | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | tool_runtime | remote::wolfram-alpha | requests | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | vector_io | inline::chromadb | blobfile,chardet,pypdf,tqdm,numpy,scikit- | | | | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url | | | | https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,chromadb | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | vector_io | inline::faiss | blobfile,chardet,pypdf,tqdm,numpy,scikit- | | | | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url | | | | https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,faiss-cpu | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | vector_io | inline::meta-reference | blobfile,chardet,pypdf,tqdm,numpy,scikit- | | | | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url | | | | https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,faiss-cpu | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | vector_io | remote::chromadb | blobfile,chardet,pypdf,tqdm,numpy,scikit- | | | | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url | | | | https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,chromadb- | | | | client | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | vector_io | remote::pgvector | blobfile,chardet,pypdf,tqdm,numpy,scikit- | | | | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url | | | | https://download.pytorch.org/whl/cpu,sentence-transformers --no- | | | | deps,psycopg2-binary | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | vector_io | remote::qdrant | blobfile,chardet,pypdf,tqdm,numpy,scikit- | | | | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url | | | | https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,qdrant- | | | | client | +---------------+----------------------------------+----------------------------------------------------------------------------------+ | vector_io | remote::weaviate | blobfile,chardet,pypdf,tqdm,numpy,scikit- | | | | learn,scipy,nltk,sentencepiece,transformers,torch torchvision --index-url | | | | https://download.pytorch.org/whl/cpu,sentence-transformers --no-deps,weaviate- | | | | client | +---------------+----------------------------------+----------------------------------------------------------------------------------+ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Manually. [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <[email protected]>
# What does this PR do? This adds a note to ensure pull requests follow the conventional commits format, along with a link to that format, in CONTRIBUTING.md. One of the pull-request checks enforces PR titles that match this format, so it's good to be upfront about this expectation before a new developer opens a PR. Signed-off-by: Ben Browning <[email protected]>
# What does this PR do? The remote-vllm provider was not passing logprobs options from CompletionRequest or ChatCompletionRequests through to the OpenAI client parameters. I manually verified this, as well as observed this provider failing `TestInference::test_completion_logprobs`. This was filed as issue #1073. This fixes that by passing the `logprobs.top_k` value through to the parameters we pass into the OpenAI client. Additionally, this fixes a bug in `test_text_inference.py` where it mistakenly assumed chunk.delta were of type `ContentDelta` for completion requests. The deltas are of type `ContentDelta` for chat completion requests, but for basic completion requests the deltas are of type string. This test was likely failing for other providers that did properly support logprobs because of this latter issue in the test, which was hit while fixing the above issue with the remote-vllm provider. (Closes #1073) ## Test Plan First, you need a vllm running. I ran one locally like this: ``` vllm serve meta-llama/Llama-3.2-3B-Instruct --port 8001 --enable-auto-tool-choice --tool-call-parser llama3_json ``` Next, run test_text_inference.py against this vllm using the remote vllm provider like this: ``` VLLM_URL="http://localhost:8001/v1" python -m pytest -s -v llama_stack/providers/tests/inference/test_text_inference.py --providers "inference=vllm_remote" ``` Before my change, the test failed with this error: ``` llama_stack/providers/tests/inference/test_text_inference.py:155: in test_completion_logprobs assert 1 <= len(response.logprobs) <= 5 E TypeError: object of type 'NoneType' has no len() ``` After my change, the test passes. [//]: # (## Documentation) Signed-off-by: Ben Browning <[email protected]>
# What does this PR do? This commit enhances the signal handling mechanism in the server by improving the `handle_signal` (previously handle_sigint) function. It now properly retrieves the signal name, ensuring clearer logging when a termination signal is received. Additionally, it cancels all running tasks and waits for their completion before stopping the event loop, allowing for a more graceful shutdown. Support for handling SIGTERM has also been added alongside SIGINT. Before the changes, handle_sigint used asyncio.run(run_shutdown()). However, asyncio.run() is meant to start a new event loop, and calling it inside an existing one (like when running Uvicorn) raises an error. The fix replaces asyncio.run(run_shutdown()) with an async function scheduled on the existing loop using loop.create_task(shutdown()). This ensures that the shutdown coroutine runs within the current event loop instead of trying to create a new one. Furthermore, this commit updates the project dependencies. `fastapi` and `uvicorn` have been added to the development dependencies in `pyproject.toml` and `uv.lock`, ensuring that the necessary packages are available for development and execution. Closes: #1043 Signed-off-by: Sébastien Han <[email protected]> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan Run a server and send SIGINT: ``` INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml Using config file: llama_stack/templates/ollama/run.yaml Run configuration: apis: - agents - datasetio - eval - inference - safety - scoring - telemetry - tool_runtime - vector_io container_image: null datasets: [] eval_tasks: [] image_name: ollama metadata_store: db_path: /Users/leseb/.llama/distributions/ollama/registry.db namespace: null type: sqlite models: - metadata: {} model_id: meta-llama/Llama-3.2-3B-Instruct model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType - llm provider_id: ollama provider_model_id: null - metadata: embedding_dimension: 384 model_id: all-MiniLM-L6-v2 model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType - embedding provider_id: sentence-transformers provider_model_id: null providers: agents: - config: persistence_store: db_path: /Users/leseb/.llama/distributions/ollama/agents_store.db namespace: null type: sqlite provider_id: meta-reference provider_type: inline::meta-reference datasetio: - config: {} provider_id: huggingface provider_type: remote::huggingface - config: {} provider_id: localfs provider_type: inline::localfs eval: - config: {} provider_id: meta-reference provider_type: inline::meta-reference inference: - config: url: http://localhost:11434 provider_id: ollama provider_type: remote::ollama - config: {} provider_id: sentence-transformers provider_type: inline::sentence-transformers safety: - config: {} provider_id: llama-guard provider_type: inline::llama-guard scoring: - config: {} provider_id: basic provider_type: inline::basic - config: {} provider_id: llm-as-judge provider_type: inline::llm-as-judge - config: openai_api_key: '********' provider_id: braintrust provider_type: inline::braintrust telemetry: - config: service_name: llama-stack sinks: console,sqlite sqlite_db_path: /Users/leseb/.llama/distributions/ollama/trace_store.db provider_id: meta-reference provider_type: inline::meta-reference tool_runtime: - config: api_key: '********' max_results: 3 provider_id: brave-search provider_type: remote::brave-search - config: api_key: '********' max_results: 3 provider_id: tavily-search provider_type: remote::tavily-search - config: {} provider_id: code-interpreter provider_type: inline::code-interpreter - config: {} provider_id: rag-runtime provider_type: inline::rag-runtime vector_io: - config: kvstore: db_path: /Users/leseb/.llama/distributions/ollama/faiss_store.db namespace: null type: sqlite provider_id: faiss provider_type: inline::faiss scoring_fns: [] server: port: 8321 tls_certfile: null tls_keyfile: null shields: [] tool_groups: - args: null mcp_endpoint: null provider_id: tavily-search toolgroup_id: builtin::websearch - args: null mcp_endpoint: null provider_id: rag-runtime toolgroup_id: builtin::rag - args: null mcp_endpoint: null provider_id: code-interpreter toolgroup_id: builtin::code_interpreter vector_dbs: [] version: '2' INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:213: Resolved 31 providers INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-inference => ollama INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-inference => sentence-transformers INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: models => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inference => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-vector_io => faiss INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-safety => llama-guard INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: shields => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: safety => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: vector_dbs => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: vector_io => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => brave-search INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => tavily-search INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => code-interpreter INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => rag-runtime INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: tool_groups => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: tool_runtime => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: agents => meta-reference INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-datasetio => huggingface INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-datasetio => localfs INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: datasets => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: datasetio => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: telemetry => meta-reference INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-scoring => basic INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-scoring => llm-as-judge INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-scoring => braintrust INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: scoring_functions => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: scoring => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-eval => meta-reference INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: eval_tasks => __routing_table__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: eval => __autorouted__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inspect => __builtin__ INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:216: INFO 2025-02-12 10:21:03,723 llama_stack.providers.remote.inference.ollama.ollama:148: checking connectivity to Ollama at `http://localhost:11434`... INFO 2025-02-12 10:21:03,734 httpx:1740: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200 OK" INFO 2025-02-12 10:21:03,843 faiss.loader:148: Loading faiss. INFO 2025-02-12 10:21:03,865 faiss.loader:150: Successfully loaded faiss. INFO 2025-02-12 10:21:03,868 faiss:173: Failed to load GPU Faiss: name 'GpuIndexIVFFlat' is not defined. Will not load constructor refs for GPU indexes. Warning: `bwrap` is not available. Code interpreter tool will not work correctly. INFO 2025-02-12 10:21:04,315 datasets:54: PyTorch version 2.6.0 available. INFO 2025-02-12 10:21:04,556 httpx:1740: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200 OK" INFO 2025-02-12 10:21:04,557 llama_stack.providers.utils.inference.embedding_mixin:42: Loading sentence transformer for all-MiniLM-L6-v2... INFO 2025-02-12 10:21:07,202 sentence_transformers.SentenceTransformer:210: Use pytorch device_name: mps INFO 2025-02-12 10:21:07,202 sentence_transformers.SentenceTransformer:218: Load pretrained SentenceTransformer: all-MiniLM-L6-v2 INFO 2025-02-12 10:21:09,500 llama_stack.distribution.stack:102: Models: all-MiniLM-L6-v2 served by sentence-transformers INFO 2025-02-12 10:21:09,500 llama_stack.distribution.stack:102: Models: meta-llama/Llama-3.2-3B-Instruct served by ollama INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::equality served by basic INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::regex_parser_multiple_choice_answer served by basic INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::subset_of served by basic INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-correctness served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-relevancy served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-similarity served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-entity-recall served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-precision served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-recall served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-relevancy served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::factuality served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::faithfulness served by braintrust INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: llm-as-judge::405b-simpleqa served by llm-as-judge INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: llm-as-judge::base served by llm-as-judge INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::code_interpreter served by code-interpreter INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::rag served by rag-runtime INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::websearch served by tavily-search INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:106: Serving API eval POST /v1/eval/tasks/{task_id}/evaluations DELETE /v1/eval/tasks/{task_id}/jobs/{job_id} GET /v1/eval/tasks/{task_id}/jobs/{job_id}/result GET /v1/eval/tasks/{task_id}/jobs/{job_id} POST /v1/eval/tasks/{task_id}/jobs Serving API agents POST /v1/agents POST /v1/agents/{agent_id}/session POST /v1/agents/{agent_id}/session/{session_id}/turn DELETE /v1/agents/{agent_id} DELETE /v1/agents/{agent_id}/session/{session_id} GET /v1/agents/{agent_id}/session/{session_id} GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}/step/{step_id} GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id} Serving API scoring_functions GET /v1/scoring-functions/{scoring_fn_id} GET /v1/scoring-functions POST /v1/scoring-functions Serving API safety POST /v1/safety/run-shield Serving API inspect GET /v1/health GET /v1/inspect/providers GET /v1/inspect/routes GET /v1/version Serving API tool_runtime POST /v1/tool-runtime/invoke GET /v1/tool-runtime/list-tools POST /v1/tool-runtime/rag-tool/insert POST /v1/tool-runtime/rag-tool/query Serving API datasetio POST /v1/datasetio/rows GET /v1/datasetio/rows Serving API shields GET /v1/shields/{identifier} GET /v1/shields POST /v1/shields Serving API eval_tasks GET /v1/eval-tasks/{eval_task_id} GET /v1/eval-tasks POST /v1/eval-tasks Serving API models GET /v1/models/{model_id} GET /v1/models POST /v1/models DELETE /v1/models/{model_id} Serving API datasets GET /v1/datasets/{dataset_id} GET /v1/datasets POST /v1/datasets DELETE /v1/datasets/{dataset_id} Serving API vector_io POST /v1/vector-io/insert POST /v1/vector-io/query Serving API inference POST /v1/inference/chat-completion POST /v1/inference/completion POST /v1/inference/embeddings Serving API tool_groups GET /v1/tools/{tool_name} GET /v1/toolgroups/{toolgroup_id} GET /v1/toolgroups GET /v1/tools POST /v1/toolgroups DELETE /v1/toolgroups/{toolgroup_id} Serving API vector_dbs GET /v1/vector-dbs/{vector_db_id} GET /v1/vector-dbs POST /v1/vector-dbs DELETE /v1/vector-dbs/{vector_db_id} Serving API scoring POST /v1/scoring/score POST /v1/scoring/score-batch Serving API telemetry GET /v1/telemetry/traces/{trace_id}/spans/{span_id} GET /v1/telemetry/spans/{span_id}/tree GET /v1/telemetry/traces/{trace_id} POST /v1/telemetry/events GET /v1/telemetry/spans GET /v1/telemetry/traces POST /v1/telemetry/spans/export Listening on ['::', '0.0.0.0']:5001 INFO: Started server process [65372] INFO: Waiting for application startup. INFO: ASGI 'lifespan' protocol appears unsupported. INFO: Application startup complete. INFO: Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit) ^CINFO: Shutting down INFO: Finished server process [65372] Received signal SIGINT (2). Exiting gracefully... INFO 2025-02-12 10:21:11,215 __main__:151: Shutting down ModelsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down InferenceRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ShieldsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down SafetyRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down VectorDBsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down VectorIORouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ToolGroupsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ToolRuntimeRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down MetaReferenceAgentsImpl INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DatasetsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DatasetIORouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down TelemetryAdapter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ScoringFunctionsRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ScoringRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down EvalTasksRoutingTable INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down EvalRouter INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DistributionInspectImpl ``` [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <[email protected]>
# What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] Since the subcommands used `MODEL_ID`, it would be better to use it in `model list` and make it easy to find it. ``` $ llama model verify-download --help usage: llama model verify-download [-h] --model-id MODEL_ID << $ llama model describe --help usage: llama model describe [-h] -m MODEL_ID << $ llama download --help --model-id MODEL_ID See `llama model list` or `llama model list --show-all` for the list of available models before: $ llama model list +-----------------------------------------+-----------------------------------------------------+----------------+ | Model Descriptor | Hugging Face Repo | Context Length | +-----------------------------------------+-----------------------------------------------------+----------------+ after: $ llama model list +-----------------------------------------+-----------------------------------------------------+----------------+ | Model Descriptor | Model ID | Context Length | +-----------------------------------------+-----------------------------------------------------+----------------+ | Llama3.1-8B | meta-llama/Llama-3.1-8B | 128K | +-----------------------------------------+-----------------------------------------------------+----------------+ ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] [//]: # (## Documentation) Signed-off-by: reidliu <[email protected]> Co-authored-by: reidliu <[email protected]>
Can you just make sure both notebooks work e2e before merging, just to double confirm since a bunch of changes have come in after the initial test. |
# What does this PR do? - Update `/eval-tasks` to `/benchmarks` -⚠️ Remove differentiation between `app` v.s. `benchmark` eval task config. Now we only have `BenchmarkConfig`. The overloaded `benchmark` is confusing and do not add any value. Backward compatibility is being kept as the "type" is not being used anywhere. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan - This change is backward compatible - Run notebook test with ``` pytest -v -s --nbval-lax ./docs/getting_started.ipynb pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb ``` <img width="846" alt="image" src="https://github.com/user-attachments/assets/d2fc06a7-593a-444f-bc1f-10ab9b0c843d" /> [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) --------- Signed-off-by: Ihar Hrachyshka <[email protected]> Signed-off-by: Ben Browning <[email protected]> Signed-off-by: Sébastien Han <[email protected]> Signed-off-by: reidliu <[email protected]> Co-authored-by: Ihar Hrachyshka <[email protected]> Co-authored-by: Ben Browning <[email protected]> Co-authored-by: Sébastien Han <[email protected]> Co-authored-by: Reid <[email protected]> Co-authored-by: reidliu <[email protected]> Co-authored-by: Yuan Tang <[email protected]>
What does this PR do?
/eval-tasks
to/benchmarks
app
v.s.benchmark
eval task config. Now we only haveBenchmarkConfig
. The overloadedbenchmark
is confusing and do not add any value. Backward compatibility is being kept as the "type" is not being used anywhere.Test Plan