Skip to content

Commit

Permalink
3 years and 1.0 pre release
Browse files Browse the repository at this point in the history
  • Loading branch information
dmpetrov committed May 3, 2020
1 parent f0c4473 commit 484af63
Show file tree
Hide file tree
Showing 5 changed files with 162 additions and 0 deletions.
162 changes: 162 additions & 0 deletions content/blog/2020-05-04-dvc-3-years-and-1-0-release.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
---
title: DVC 3 years anniversary and 1.0 pre-release
date: 2020-05-04
description: |
Today, we’ve got three big things to announce: 🎉 3rd-year anniversary of DVC,
🚀 DVC 1.0 pre-release is ready and ⭐ 5000 GitHub starts.
descriptionLong: |
Today, we’ve got three big things to announce.
- 🎉 3rd-year anniversary of DVC
- 🚀 DVC 1.0 pre-release is ready
- ⭐ DVC has reached 5K GitHub starts (coincidently on the same day)
picture: static/uploads/images/2020-05-04/5k_stars.png
pictureComment:
author: ../authors/dmitry_petrov.md
commentsUrl: https://discuss.dvc.org/t/dvc-3-years-and-1-0-release/
tags:
- birthday
- users
- MLOps
- CI/CD
---

## 3 years anniversary!

3 years ago on **May 4th, 2017** the first DVC blog post was published
[Data Version Control beta release: iterative machine learning](https://blog.dataversioncontrol.com/data-version-control-beta-release-iterative-machine-learning-a7faf7c8be67).
It was the first DVC tutorial and now we make the redirect to the new tutorial
page to not to confuse users.

[The first DVC discussion on Reddit](https://www.reddit.com/r/Python/comments/698ian/dvc_data_scientists_collaboration_and_iterative/)
A few days later it was
[republished on other sources](https://www.kdnuggets.com/2017/05/data-version-control-iterative-machine-learning.html)

Today, DVC gets recognized at professional conferences: people spot our logo,
and sometimes even our faces, and want to chat. There's much more content about
DVC coming from bloggers than from inside our organization. We're seeing more
and more job postings that list DVC as a requirement, and we're showing up in
[data science textbooks](https://www.amazon.com/Learn-Python-Building-Science-Applications/dp/1789535360).
When we find a new place DVC is mentioned, we celebrate in our Slack - we've
come a long way!

The data science and ML space is fast-paced and vibrant, and we're proud that
DVC is making an impact on discussions about best practices for healthy,
sustainable ML. Every week, we chat with companies and research groups using DVC
to make their teams more productive. We're proud to be part of the growing MLOps
movement: so far, a majority of CI/CD for ML projects are implemented with DVC
under the hood.

I can confidently say that DVC wouldn't have been possible without a lot of help
from our community. Thank you to everyone who has supported us:

**DVC core team.** The core team of the project takes the majority of the
development activities, constantly brings new ideas, documents the product and
always on the first line of user's support. Many users knows that great user
support if one of the "killing features" of DVC. Today the core team consists of
6 brave engineers.

**DVC contributors.** As of today, the DVC code base has
[126 individual contributors](https://github.com/iterative/dvc/graphs/contributors).
Many of these folks put hours into their PRs. We're grateful for their tenacity
and generosity.

**Documentation contributors.** Another
[124 people contributed](https://github.com/iterative/dvc.org/graphs/contributors)
to the DVC documentation and website https://dvc.org/doc. Every time a new
person tries out DVC, they benefit from the hard work that's gone into our docs.

**Active community members.** Active DVC users help our team understand and
better anticipate their needs and identify priorities for development. They
share bright ideas for new features, locate and investigate bugs in code, and
welcome and support new users.

**People who give DVC a shot.** Today, there are thousands of data scientists,
ML engineers, and developers using DVC on a regular basis. The number of users
is growing every week. Our [Discord channel](http://dvc.org/chat) has almost two
thousand users. Hundreds more connect with us through email and Twitter. To
everyone willing to try out DVC, thank you for the opportunity.

## DVC 1.0. is the result of 3 years of learning

All these contributions, big and small, have a collective impact on DVC's
development. I'm happy (and a bit nervous) to announce that a pre-release of a
brand new DVC 1.0 is ready for public beta testing.

The new DVC is inspired by discussions and contributions from our community -
both fresh ideas and bug reports 😅.

Here are some of the features we’re excited to be rolling out soon:

[**Run-cache (a.k.a. build-cache)**](https://github.com/iterative/dvc/issues/1234)
(the issue was created 1.5 years ago). DVC 1.0 has a "long memory" of DVC
commands runs. This means it can identify if a `dvc repro` has already been run
and save compute time by returning the cached result - even if you didn't Git
commit that past run. We added the run-cache with CI/CD systems and other MLOps
automation tools in mind. No more auto-commits needed after `dvc repro` in the
CI/CD system side.

[\*\*Multi-stage DVC files.](https://github.com/iterative/dvc/issues/1871)
(created a year ago). We redesigned the DVC-metafile format to make saved
pipelines more interpretable and editable. Pipeline stages are now saved in a
single metafile, with all stages stored together instead of in separate files.
We removed checksums from the pipeline metafile, which improves its
human-readability.

[**Plots.**](https://github.com/iterative/dvc/issues/3409) Countless users asked
us when we'd support metrics visualizations. Now it's here: DVC 1.0 introduces a
metrics file visualization command, `dvc metrics diff`. DVC plots are powered by
the [Vega-Lite](https://vega.github.io/vega-lite/) graphic library. This
function is designed not only for showing visualizations based on the current
state of your project, but it can also combine multiple plots from your Git
history in a single chart so you can compare results across commits. Users can
visualize how, for example, their model accuracy in the latest commit differs
from another commit (or even multiple commits).

```
$ dvc plot diff -d logs.csv HEAD HEAD^ d1e4d848 baseline-march
file:///Users/dmitry/src/plot/logs.html
$ open logs.html
```

![](/static/uploads/images/2020-05-04/dvc-plot.png)

```
$ dvc plot diff -d logs.csv HEAD HEAD^ d1e4d848 baseline-march \
-x loss --template scatter
file:///Users/dmitry/src/plot/logs.html
$ open logs.html
```

![](/static/uploads/images/2020-05-04/dvc-plot-scatter.png)

**Data transfer optimizations.** We've done substantial work on optimizing data
management commands, such as `dvc pull \ push \ status -c \ gc -c`. Now, based
on the amount of data, DVC can choose an optimal data remote traversing
strategy. We've introduced mini-indexes to help DVC instantly check data
directories instead of iterating over millions of files. This also speeds up
file adding\removing to large directories. More optimizations are included in
the release based on performance bottlenecks we profiled.

[**Hyperparameter tracking.**(https://github.com/iterative/dvc/issues/3393) This
feature was actually released in the last DVC 0.93 version (see
[params docs](https://dvc.org/doc/command-reference/params). However, it is an
important step to support configuration files and ML experiments in a more
holistic way.

I hope our the most active users will find time to check the DVC pre release
version and provide their feeback. The installation instruction is
[on our website](https://dvc.org/doc/install/pre-release).

## 5000 GitHub stars

Activity on our GitHub page has grown organically since the DVC repo went public
on May 4th, 2017. Coincidently, today, in the 3rd year anniversary we have
reached 5000 starts:

![](/static/uploads/images/2020-05-04/5k_github.png)

## Thank you!

Thank you again for all DVC contributions, community members, and users. Every
piece of your help is highly appreciated and will bring huge benefits to the
entire ecosystem of data and ML projects.
Binary file added static/uploads/images/2020-05-04/5k_github.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/uploads/images/2020-05-04/5k_stars.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/uploads/images/2020-05-04/dvc-plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 484af63

Please sign in to comment.