Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

Latest commit

 

History

History
51 lines (29 loc) · 3.39 KB

Readme_Deprecated.md

File metadata and controls

51 lines (29 loc) · 3.39 KB

Team Data Science Process from Microsoft

NOTE: This page is deprecated.

Please visit the new site for Team Data Science Process (TDSP) at: https://aka.ms/tdsp


Overview | Lifecycle | Roles & Tasks | Project Template | Project Execution | Data Science Utilities


This repository contains the Team Data Science Process (TDSP) from Microsoft. TDSP is an agile, iterative, data science process for executing and delivering advanced analytics solutions. It is designed to to improve collaboration and efficiency of data science teams in enterprise organizations. It is supported through four key components:

  • a data science lifecycle definition
  • a standardized project structrure (project documentation and reporting templates)
  • infrasctructure for project execution (compute and storage infrastructure, code repositories, etc.)
  • tools for data science project tasks (version control, data exploration and modeling, work planning. etc.)

For execution of data science projects, TDSP provides guidelines on how to structure collaborative teams and tasks for data science projects, and execute data science projects using Agile planning and version control.

To perform certain stages of a data science project efficiently and semi-automated manner, TDSP also provides data exploration and (semi)automated modeling tools in R and Python. These also provide standardized reports or artifacts.

TDSP resources on Azure

We provide documentation and end-to-end data science process walkthroughs and templates using different platforms and tools on Azure, such as Azure ML, HDInsight, Microsoft R server, SQL-server, Azure Data Lake etc.

In particular, here are instructions on how to execute data science life cycle steps in Azure ML.

Contributing to TDSP

We believe that with the help of the data science community, we can make TDSP even better, and can benefit more enterprises and individual data scientists to be more efficient. We welcome contributions to TDSP, either on documentation or on workflow or implementing TDSP on different tools for versioning or work items management. Feel free to contribute pages at TDSP/wiki.

If you have some useful data science tools and utilities to share, we encourage you to contribute to the TDSP-Utilities Github repository.

Release Notes

This is version 0.1.2 of TDSP. Version 0.1.1 was released in September 2016. We are continuously improving TDSP based on our further accumulated experience, and customer feedback.

Questions or suggestions

Should you have any questions or suggestions, please create a new discussion thread on the Issues Tab.


TDSP_LIFECYCLE


Last updated: Aug 15, 2017