RFC-0001-The-Llama-Stack #8

raghotham · 2024-07-23T15:46:06Z

As part of the Llama 3.1 release, Meta is releasing an RFC for ‘Llama Stack’, a comprehensive set of interfaces / API for ML developers building on top of Llama foundation models. We are looking for feedback on where the API can be improved, any corner cases we may have missed and your general thoughts on how useful this will be. Ultimately, our hope is to create a standard for working with Llama models in order to simplify the developer experience and foster innovation across the Llama ecosystem.

joshcarp · 2024-07-23T19:09:22Z

One question I've got about the lifecycle is around the monitoring/human feedback portion of this diagram and if this proposal could look into adding a more complete observability standard for development and deployment:

I think it would be a pretty good idea to include some standard around how observability is taken into account in this lifecycle, and there have been first efforts to address this in the gen-ai semantic conventions in OpenTelemetry

One issue with the current OpenTelemetry semantic conventions that exist currently are:

They only are for black box LLMs (Anthropic, OpenAI, Gemini, Cohere)
They currently only expose superficial attributes (model, max_tokens, temperature, response, request)

With work, especially around activation steering with papers like Activation Addition and the sparse auto encoder work done by Anthropic and OpenAI it's only a matter of time before we get better information about the internals of these models. Last month Anthropic also released a waitlist for their Beta Steering API which will hopefully release feature clamping and monitoring.

I think recently there’s been a lot of work showing that the single shot that we allow models to have when answering a prompt is fundamentally a simplistic way of using the models, and I imagine that a strong agent framework would allow for monitoring and observability of internal states which is a richer, continuous and differentiable representation of the models. Unfortunately API models are fundamentally restricted to the “Chat API” era - they can’t give any access to the underlying activation space because that’s their multi-billion dollar intellectual property. This is where I see open source models completely leap frogging in terms of capabilities as having this access would allow for so much more than we’ve got today.

What I would like:

It would be nice to have things like a tracing setup that was extendible that could incorporate useful information we know about the models today, as well as allowing for future improvements:

feature.<feature-name> attributes which are some sort of activation monitoring, possibly from dictionary learning approaches building on work from OpenAI and Anthropic.
- In the Anthropic paper the interesting ones that were able to be extracted were “code correctness” features and “internal conflict”. Obviously this would be invaluable for deployment monitoring and guard railing.
Perplexity - The ability for the monitoring and observability to report on the likelihood of the sequence output or the prompt input. This monitoring would allow for the ability to correlate a quantitative measure with the downstream llm performance
- I’m not exactly sure about how to go about this, but it would also be nice to have something around “pretraining perplexity” vs “fine tuning perplexity” - some high stakes applications you would want to have the ability to monitor the KL divergence between the fine tuning set and the deployed behaviour to see hoe much out of distribution requests are made. If there’s a high KL divergence but the downstream performance is still good it might still indicate that either your model is being used for something unintended (see memes about using amazon chat for generating js code), or your fine tuning set isn’t diverse enough (in which your evaluation sets are also incomplete).
Hallucination detection like To Believe or not to Believe your LLM - I’m not sure how this would be implemented, but hallucination detection in agentic systems would be extremely important, especially around the “in context” vs “in weight” difference that this paper was able to introduce.

I’ve been thinking about this a lot, and whilst there are tools like transformer_lens for development and research, it doesn’t seem like there are many tools or efforts out there that allow for this for deployed systems. This might also be a bunch of scope creep on this spec, but observability/monitoring solutions are an essential part of any complex software system and it feels like the LLM agent world hasn’t caught up yet and a lot of these applications have been limiting some of the development I want to do on other downstream tasks.

prassanna-ravishankar · 2024-07-29T21:48:44Z

Couple of things I'm interested in

Asynchronous post-training: A High level protocol combining the post training API with the inference API to granularly update the model as and when an input comes in, and provide a new model
Model monitoring: I like the idea @joshcarp mentioned, an entire observability stack would be helpful, but also with additional hooks to capture ML specific metrics such as drift during inference.
- Inference hardware monitoring would also be super useful, to monitor GPU/CPU via
Prompt management: Mechanisms to store prompts and evaluate prompts across model checkpoints.
Experiment tracking callbacks: Callbacks or hooks to plug into rich logging services for ML experimentation (Aim, Mlflow, wandb), relevant only for training and post-training.
Export interface: Mechanisms to export the model to various target hardware (almost like a real toolchain 😉 )
Pipe interface: Allowing various model instances to pipe their output to each other (like in deepstream), enabling simple agential applications and more complex systems.

ashwinb · 2024-08-21T01:53:18Z

@joshcarp

Thanks for the pointer. This is a great addition! We have as an update to this PR itself, added an Observability API which can be used from fine-tuning. Please take a look and let us know what you think.

ashwinb · 2024-08-21T01:53:52Z

@prassanna-ravishankar

We are just starting right now :) and we'd like to make sure we cover the basics well and provide enough value so the ecosystem finds this worth building and adopting. If things go well, all you suggest could arrive.

cubxxw · 2024-08-23T13:11:27Z

At present, I am pushing the implementation of LLM production. I think there are many production level standards missing, which makes it difficult for me to know what is right. I hope llama can be standardized by using open source.

Mustafsjamal

I'm going back to work now and then I'm going back and then I can come get the kids from work school if that you would like me and you can go just to get the kids out and then I we can could you just come let me in and the @prassanna-ravishankar

fix typo

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 23, 2024

raghotham linked an issue Jul 23, 2024 that may be closed by this pull request

RFC-0001 - Llama Stack #6

Closed

HamidShojanazeri mentioned this pull request Aug 14, 2024

How to run the model? meta-llama/llama-models#82

Closed

RFC-0001-The-Llama-Stack

417ba2a

ashwinb force-pushed the RFC-0001-The-Llama-Stack branch from e7cd58a to 417ba2a Compare August 15, 2024 16:59

ashwinb added 2 commits August 15, 2024 13:50

Add OpenAPI generation utility, update SPEC to reflect latest types

1f5eb9f

First cut at an observability API

124b2c1

ashwinb force-pushed the RFC-0001-The-Llama-Stack branch from cc3614f to 124b2c1 Compare August 16, 2024 00:30

ashwinb added 2 commits August 20, 2024 18:58

Merge remote-tracking branch 'origin/main' into RFC-0001-The-Llama-Stack

75bbe78

llama3_1 -> llama3

c736e5b

ashwinb merged commit 2232bfa into main Aug 21, 2024
3 checks passed

ashwinb deleted the RFC-0001-The-Llama-Stack branch August 21, 2024 02:01

Mustafsjamal reviewed Oct 17, 2024

View reviewed changes

heyjustinai pushed a commit that referenced this pull request Nov 19, 2024

Update README.md (#8)

5a9c457

fix typo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC-0001-The-Llama-Stack #8

RFC-0001-The-Llama-Stack #8

raghotham commented Jul 23, 2024 •

edited

Loading

joshcarp commented Jul 23, 2024 •

edited

Loading

prassanna-ravishankar commented Jul 29, 2024

ashwinb commented Aug 21, 2024

ashwinb commented Aug 21, 2024

cubxxw commented Aug 23, 2024

Mustafsjamal left a comment

RFC-0001-The-Llama-Stack #8

RFC-0001-The-Llama-Stack #8

Conversation

raghotham commented Jul 23, 2024 • edited Loading

joshcarp commented Jul 23, 2024 • edited Loading

prassanna-ravishankar commented Jul 29, 2024

ashwinb commented Aug 21, 2024

ashwinb commented Aug 21, 2024

cubxxw commented Aug 23, 2024

Mustafsjamal left a comment

Choose a reason for hiding this comment

raghotham commented Jul 23, 2024 •

edited

Loading

joshcarp commented Jul 23, 2024 •

edited

Loading