diff --git a/content/blog/2020-05-04-dvc-3-years-and-1-0-release.md b/content/blog/2020-05-04-dvc-3-years-and-1-0-release.md new file mode 100644 index 0000000000..1c70be9e19 --- /dev/null +++ b/content/blog/2020-05-04-dvc-3-years-and-1-0-release.md @@ -0,0 +1,162 @@ +--- +title: DVC 3 years anniversary and 1.0 pre-release +date: 2020-05-04 +description: | + Today, we’ve got three big things to announce: πŸŽ‰ 3rd-year anniversary of DVC, + πŸš€ DVC 1.0 pre-release is ready and ⭐ 5000 GitHub starts. +descriptionLong: | + Today, we’ve got three big things to announce. + - πŸŽ‰ 3rd-year anniversary of DVC + - πŸš€ DVC 1.0 pre-release is ready + - ⭐ DVC has reached 5K GitHub starts (coincidently on the same day) +picture: static/uploads/images/2020-05-04/5k_stars.png +pictureComment: +author: ../authors/dmitry_petrov.md +commentsUrl: https://discuss.dvc.org/t/dvc-3-years-and-1-0-release/ +tags: + - birthday + - users + - MLOps + - CI/CD +--- + +## 3 years anniversary! + +3 years ago on **May 4th, 2017** the first DVC blog post was published +[Data Version Control beta release: iterative machine learning](https://blog.dataversioncontrol.com/data-version-control-beta-release-iterative-machine-learning-a7faf7c8be67). +It was the first DVC tutorial and now we make the redirect to the new tutorial +page to not to confuse users. + +[The first DVC discussion on Reddit](https://www.reddit.com/r/Python/comments/698ian/dvc_data_scientists_collaboration_and_iterative/) +A few days later it was +[republished on other sources](https://www.kdnuggets.com/2017/05/data-version-control-iterative-machine-learning.html) + +Today, DVC gets recognized at professional conferences: people spot our logo, +and sometimes even our faces, and want to chat. There's much more content about +DVC coming from bloggers than from inside our organization. We're seeing more +and more job postings that list DVC as a requirement, and we're showing up in +[data science textbooks](https://www.amazon.com/Learn-Python-Building-Science-Applications/dp/1789535360). +When we find a new place DVC is mentioned, we celebrate in our Slack - we've +come a long way! + +The data science and ML space is fast-paced and vibrant, and we're proud that +DVC is making an impact on discussions about best practices for healthy, +sustainable ML. Every week, we chat with companies and research groups using DVC +to make their teams more productive. We're proud to be part of the growing MLOps +movement: so far, a majority of CI/CD for ML projects are implemented with DVC +under the hood. + +I can confidently say that DVC wouldn't have been possible without a lot of help +from our community. Thank you to everyone who has supported us: + +**DVC core team.** The core team of the project takes the majority of the +development activities, constantly brings new ideas, documents the product and +always on the first line of user's support. Many users knows that great user +support if one of the "killing features" of DVC. Today the core team consists of +6 brave engineers. + +**DVC contributors.** As of today, the DVC code base has +[126 individual contributors](https://github.com/iterative/dvc/graphs/contributors). +Many of these folks put hours into their PRs. We're grateful for their tenacity +and generosity. + +**Documentation contributors.** Another +[124 people contributed](https://github.com/iterative/dvc.org/graphs/contributors) +to the DVC documentation and website https://dvc.org/doc. Every time a new +person tries out DVC, they benefit from the hard work that's gone into our docs. + +**Active community members.** Active DVC users help our team understand and +better anticipate their needs and identify priorities for development. They +share bright ideas for new features, locate and investigate bugs in code, and +welcome and support new users. + +**People who give DVC a shot.** Today, there are thousands of data scientists, +ML engineers, and developers using DVC on a regular basis. The number of users +is growing every week. Our [Discord channel](http://dvc.org/chat) has almost two +thousand users. Hundreds more connect with us through email and Twitter. To +everyone willing to try out DVC, thank you for the opportunity. + +## DVC 1.0. is the result of 3 years of learning + +All these contributions, big and small, have a collective impact on DVC's +development. I'm happy (and a bit nervous) to announce that a pre-release of a +brand new DVC 1.0 is ready for public beta testing. + +The new DVC is inspired by discussions and contributions from our community - +both fresh ideas and bug reports πŸ˜…. + +Here are some of the features we’re excited to be rolling out soon: + +[**Run-cache (a.k.a. build-cache)**](https://github.com/iterative/dvc/issues/1234) +(the issue was created 1.5 years ago). DVC 1.0 has a "long memory" of DVC +commands runs. This means it can identify if a `dvc repro` has already been run +and save compute time by returning the cached result - even if you didn't Git +commit that past run. We added the run-cache with CI/CD systems and other MLOps +automation tools in mind. No more auto-commits needed after `dvc repro` in the +CI/CD system side. + +[\*\*Multi-stage DVC files.](https://github.com/iterative/dvc/issues/1871) +(created a year ago). We redesigned the DVC-metafile format to make saved +pipelines more interpretable and editable. Pipeline stages are now saved in a +single metafile, with all stages stored together instead of in separate files. +We removed checksums from the pipeline metafile, which improves its +human-readability. + +[**Plots.**](https://github.com/iterative/dvc/issues/3409) Countless users asked +us when we'd support metrics visualizations. Now it's here: DVC 1.0 introduces a +metrics file visualization command, `dvc metrics diff`. DVC plots are powered by +the [Vega-Lite](https://vega.github.io/vega-lite/) graphic library. This +function is designed not only for showing visualizations based on the current +state of your project, but it can also combine multiple plots from your Git +history in a single chart so you can compare results across commits. Users can +visualize how, for example, their model accuracy in the latest commit differs +from another commit (or even multiple commits). + +``` +$ dvc plot diff -d logs.csv HEAD HEAD^ d1e4d848 baseline-march +file:///Users/dmitry/src/plot/logs.html +$ open logs.html +``` + +![](/static/uploads/images/2020-05-04/dvc-plot.png) + +``` +$ dvc plot diff -d logs.csv HEAD HEAD^ d1e4d848 baseline-march \ + -x loss --template scatter +file:///Users/dmitry/src/plot/logs.html +$ open logs.html +``` + +![](/static/uploads/images/2020-05-04/dvc-plot-scatter.png) + +**Data transfer optimizations.** We've done substantial work on optimizing data +management commands, such as `dvc pull \ push \ status -c \ gc -c`. Now, based +on the amount of data, DVC can choose an optimal data remote traversing +strategy. We've introduced mini-indexes to help DVC instantly check data +directories instead of iterating over millions of files. This also speeds up +file adding\removing to large directories. More optimizations are included in +the release based on performance bottlenecks we profiled. + +[**Hyperparameter tracking.**(https://github.com/iterative/dvc/issues/3393) This +feature was actually released in the last DVC 0.93 version (see +[params docs](https://dvc.org/doc/command-reference/params). However, it is an +important step to support configuration files and ML experiments in a more +holistic way. + +I hope our the most active users will find time to check the DVC pre release +version and provide their feeback. The installation instruction is +[on our website](https://dvc.org/doc/install/pre-release). + +## 5000 GitHub stars + +Activity on our GitHub page has grown organically since the DVC repo went public +on May 4th, 2017. Coincidently, today, in the 3rd year anniversary we have +reached 5000 starts: + +![](/static/uploads/images/2020-05-04/5k_github.png) + +## Thank you! + +Thank you again for all DVC contributions, community members, and users. Every +piece of your help is highly appreciated and will bring huge benefits to the +entire ecosystem of data and ML projects. diff --git a/static/uploads/images/2020-05-04/5k_github.png b/static/uploads/images/2020-05-04/5k_github.png new file mode 100644 index 0000000000..1e55f3fad0 Binary files /dev/null and b/static/uploads/images/2020-05-04/5k_github.png differ diff --git a/static/uploads/images/2020-05-04/5k_stars.png b/static/uploads/images/2020-05-04/5k_stars.png new file mode 100644 index 0000000000..91ff7d5b5a Binary files /dev/null and b/static/uploads/images/2020-05-04/5k_stars.png differ diff --git a/static/uploads/images/2020-05-04/dvc-plot-scatter.png b/static/uploads/images/2020-05-04/dvc-plot-scatter.png new file mode 100644 index 0000000000..132eedc90a Binary files /dev/null and b/static/uploads/images/2020-05-04/dvc-plot-scatter.png differ diff --git a/static/uploads/images/2020-05-04/dvc-plot.png b/static/uploads/images/2020-05-04/dvc-plot.png new file mode 100644 index 0000000000..0dc2541e00 Binary files /dev/null and b/static/uploads/images/2020-05-04/dvc-plot.png differ