Skip to content

Latest commit

 

History

History
104 lines (65 loc) · 8.94 KB

README.md

File metadata and controls

104 lines (65 loc) · 8.94 KB

Databricks ML in Action

no-image

This is the code repository for Databricks ML in Action, published by Packt.

Learn how Databricks supports the entire ML lifecycle end to end from data ingestion to the model deployment

What is this book about?

Discover what makes the Databricks Data Intelligence Platform the go-to choice for top-tier machine learning solutions. Databricks ML in Action presents cloud-agnostic, end-to-end examples with hands-on illustrations of executing data science, machine learning, and generative AI projects on the Databricks Platform.

This book covers the following exciting features:

  • Set up a workspace for a data team planning to perform data science
  • Monitor data quality and detect drift
  • Use autogenerated code for ML modeling and data exploration
  • Operationalize ML with feature engineering client, AutoML, VectorSearch, Delta Live Tables, AutoLoader, and Workflows
  • Integrate open-source and third-party applications, such as OpenAI’s ChatGPT, into your AI projects
  • Communicate insights through Databricks SQL dashboards and Delta Sharing
  • Explore data and models through the Databricks marketplace

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders. For example, Chapter02.

The code will look like the following:

import opendatasets as od

od.download("https://www.kaggle.com/competitions/store-sales-time-series-forecasting/data",raw_data_path)

dbutils.fs.ls(raw_data_path + "/store-sales-time-series-forecasting/")

Following is what you need for this book: This book is for machine learning engineers, data scientists, and technical managers seeking hands-on expertise in implementing and leveraging the Databricks Data Intelligence Platform and its Lakehouse architecture to create data products.

With the following software and hardware list you can run all code files present in the book (Chapter 2-8).

Software and Hardware List

Chapter Software required OS required
2-8 Databricks Windows, macOS, or Linux
2-8 Python and its associated libraries Windows, macOS, or Linux

Related products

Code Samples and Chapter Links

Each chapter folder contains code examples shared in the book using one of our data sources, and the links shared in the README file:

What do you need to run the examples?

You will need a Databricks environment and permissions to run a cluster in order to follow along. There is a Databricks community Edition that you can use to run the provided notebooks and code. However, we do use Unity Catalog which is available only on paid versions so some features will not work.

Disclaimer

The authors will do its best to keep the code and examples provided as up-to-date as possible, but we understand that you may encounter outdated snippets or other issues. Please post your enquiries in the issues page should you require further assistance.

About the authors

Stephanie Rivera has worked in big data and machine learning since 2011. She collaborates with teams and companies as they design their data intelligence platform as a Sr. Solutions Architect for Databricks.

Previously Stephanie was the VP, Data Intelligence for a global company, ingesting in 20+ terabytes of data daily. She led the data science, data engineering, and business intelligence teams.

Her data career has also included contributing to and leading a team in creating software that teaches people to explore fictional planets using data science algorithms. Stephanie authored numerous sections of Booz Allen Hamilton’s publication, The Field Guide to Data Science.

I want to thank my loving partner, Rami Alba Lucio, Databricks coworkers, family, and friends for their unwavering support.

Mandy Baker began her career in data 8 years ago. She loves leveraging her skills as a data scientist to orchestrate transformative journeys for companies across diverse industries as a Solutions Architect for Databricks. Her experiences have brought her from large corporations to small startups and everything in between. Mandy is a graduate of Carnegie Mellon University and the University of Washington.

Thank you to my partner Emmanuel, my parents, sisters, and friends for their enduring love and support.

Hayley Horn started her data career 15 years ago as a data quality consultant on enterprise data integration projects. As a data scientist, she specialized in customer insights and strategy, and presented at Data Science and AI conferences in the US and Europe. She is currently a Sr. Solutions Architect for Databricks, with expertise in data science and technology modernization.

A graduate of the MS Data Science program at Southern Methodist University in Dallas, Texas, USA, she is now a capstone advisor to students in their final semesters of the program.

I’d like to thank my husband, Kevin, and my sons Dyson and Dalton for their encouragement and enthusiastic support.

Anastasia Prokaieva began her career 9 years ago, as a research scientist at CEA (France), focusing on large data analysis and satellite data assimilation, treating terabytes of data. She has been working within the big data analysis and machine learning domain since then. In 2021, she joined Databricks and became the regional AI subject matter expert.
On a daily basis, Anastasia consults Databricks users on best practices implementation of AI projects end-to-end, she delivers trainings, workshops to democratize AI. Anastasia holds two MSc degrees in theoretical physics and energy science.

I would like to thank my partner, Julien, and my family for their tremendous support. My gratitude to my talented teammates all around the globe, as you inspire me every day!

Thanks for purchasing the book!

Hope you enjoy it!