Join our Meetup group for more events! https://www.meetup.com/data-umbrella
- Transcript: https://github.com/data-umbrella/event-transcripts/blob/main/2022/63-eric-testing.md
- Meetup Event: https://www.meetup.com/data-umbrella/events/287467730/
- Video: https://youtu.be/bJGgVoV4GTc
- Transcriber: ? [needs a transcriber]
- Slides: https://tinyurl.com/test-sdm
- Mariatta's video on continuous integration and unit testing: https://youtu.be/vLBr_AfomUY
- GitHub Actions Tutorial: https://youtu.be/d48WGkePFq0
This presentation covers:
- Importance of software testing in OSS software: correctness, reliability, contracts against breakages in the future.
- Why software testing actually matters for data scientists' work as well, with a case study from my daily work.
- Where to get practice with software testing
- Navigating the tradeoff between immediate velocity and long-term productivity when deciding how much to test.
Eric is a Principal Data Scientist at Moderna supporting research data science. Prior to Moderna, he was at the Novartis Institutes for Biomedical Research conducting biomedical data science research with a focus on using Bayesian statistical methods in the service of making medicines for patients. Prior to Novartis, he was an Insight Health Data Fellow in the summer of 2017 and defended his doctoral thesis in the Department of Biological Engineering at MIT in the spring of 2017.
Eric is also an open-source software developer and has led the development of pyjanitor
, a clean API for cleaning data in Python, and nxviz
, a visualization package for NetworkX. In addition, he gives back to the open-source community through code contributions to multiple projects.
His personal life motto is found in the Gospel of Luke 12:48.
- LinkedIn: https://www.linkedin.com/in/ericmjl/
- Twitter: https://twitter.com/ericmjl
- GitHub: https://github.com/ericmjl/
00:00 Introduction by @BerylKanali - Data Umbrella team
08:33 Test your work! Opening, goals, introduction, and agenda by Eric Ma
12:21 Testing in Software
12:56 Why do testing?
15:33 What does a test look like?
18:34 How do I make tests automated?
20:45 What benefits do I get?
23:48 What kind of tests exist? Unit test, Execution test, Integration test
27:26 Testing in data science
27:43 Testing machine learning model code
32:29 Testing data a.k.a. data validation
34:36 Testing pipeline code
36:13 Mock-up realistic fake data
37:57 Philosophy - integrating testing into your work
40:06 Resources - YouTube videos
41:09 Summary
59:41 Q&A session
#python #testing #softwaretesting