Name	Name	Last commit message	Last commit date
parent directory ..
Lesson1	Lesson1
Lesson2	Lesson2
Lesson3	Lesson3
Lesson4	Lesson4
Lesson5	Lesson5
Lesson6	Lesson6
Lesson7	Lesson7
README.md	README.md

Name

Last commit message

Last commit date

Efficiently Serving LLMs

Gain a ground-up understanding of how to serve LLM applications in production.

Learn how Large Language Models (LLMs) repeatedly predict the next token, and how techniques like KV caching can greatly speed up text generation.
Write code to efficiently serve LLM applications to a large number of users, and examine the tradeoffs between quickly returning the output of the model and serving many users at once.
Explore the fundamentals of Low Rank Adapters (LoRA) and see how Predibase builds their LoRAX framework inference server to serve multiple fine-tuned models at once.
Intermediate
Travis Addair
Prerequisite recommendation: Intermediate Python knowledge.

Videos