Skip to content

Latest commit

 

History

History

EfficientlyServingLLMs

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Efficiently Serving LLMs

https://learn.deeplearning.ai/courses/efficiently-serving-llms/

Gain a ground-up understanding of how to serve LLM applications in production.

  • Learn how Large Language Models (LLMs) repeatedly predict the next token, and how techniques like KV caching can greatly speed up text generation.

  • Write code to efficiently serve LLM applications to a large number of users, and examine the tradeoffs between quickly returning the output of the model and serving many users at once.

  • Explore the fundamentals of Low Rank Adapters (LoRA) and see how Predibase builds their LoRAX framework inference server to serve multiple fine-tuned models at once.

  • Intermediate

  • Travis Addair

  • Prerequisite recommendation: Intermediate Python knowledge.

Videos