https://learn.deeplearning.ai/courses/efficiently-serving-llms/
Gain a ground-up understanding of how to serve LLM applications in production.
-
Learn how Large Language Models (LLMs) repeatedly predict the next token, and how techniques like KV caching can greatly speed up text generation.
-
Write code to efficiently serve LLM applications to a large number of users, and examine the tradeoffs between quickly returning the output of the model and serving many users at once.
-
Explore the fundamentals of Low Rank Adapters (LoRA) and see how Predibase builds their LoRAX framework inference server to serve multiple fine-tuned models at once.
-
Intermediate
-
Travis Addair
-
Prerequisite recommendation: Intermediate Python knowledge.
- https://dyckms5inbsqq.cloudfront.net/Predibase/predibase-c1/predibase_c1_01/video/predibase_c1_01_720p/predibase_c1_01_720p.m3u8
- https://dyckms5inbsqq.cloudfront.net/Predibase/predibase-c1/predibase_c1_02/video/predibase_c1_02_720p/predibase_c1_02_720p.m3u8
- https://dyckms5inbsqq.cloudfront.net/Predibase/predibase-c1/predibase_c1_03/video/predibase_c1_03_720p/predibase_c1_03_720p.m3u8
- https://dyckms5inbsqq.cloudfront.net/Predibase/predibase-c1/predibase_c1_04/video/predibase_c1_04_720p/predibase_c1_04_720p.m3u8
- https://dyckms5inbsqq.cloudfront.net/Predibase/predibase-c1/predibase_c1_05/video/predibase_c1_05_720p/predibase_c1_05_720p.m3u8
- https://dyckms5inbsqq.cloudfront.net/Predibase/predibase-c1/predibase_c1_06/video/predibase_c1_06_720p/predibase_c1_06_720p.m3u8
- https://dyckms5inbsqq.cloudfront.net/Predibase/predibase-c1/predibase_c1_07/video/predibase_c1_07_720p/predibase_c1_07_720p.m3u8
- https://dyckms5inbsqq.cloudfront.net/Predibase/predibase-c1/predibase_c1_08/video/predibase_c1_08_720p/predibase_c1_08_720p.m3u8
- https://dyckms5inbsqq.cloudfront.net/Predibase/predibase-c1/predibase_c1_09/video/predibase_c1_09_720p/predibase_c1_09_720p.m3u8