Skip to content

Commit

Permalink
Formatting and change github star image
Browse files Browse the repository at this point in the history
  • Loading branch information
dmpetrov committed May 3, 2020
1 parent 484af63 commit 2967790
Show file tree
Hide file tree
Showing 2 changed files with 59 additions and 45 deletions.
104 changes: 59 additions & 45 deletions content/blog/2020-05-04-dvc-3-years-and-1-0-release.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,30 +4,31 @@ date: 2020-05-04
description: |
Today, we’ve got three big things to announce: 🎉 3rd-year anniversary of DVC,
🚀 DVC 1.0 pre-release is ready and ⭐ 5000 GitHub starts.
descriptionLong: |
Today, we’ve got three big things to announce.
- 🎉 3rd-year anniversary of DVC
- 🚀 DVC 1.0 pre-release is ready
- ⭐ DVC has reached 5K GitHub starts (coincidently on the same day)
picture: static/uploads/images/2020-05-04/5k_stars.png
pictureComment:
author: ../authors/dmitry_petrov.md
commentsUrl: https://discuss.dvc.org/t/dvc-3-years-and-1-0-release/
picture: /static/uploads/images/2020-05-04/5k_stars.png
pictureComment: 5000 GitHub stars
author: dmitry_petrov
commentsUrl: https://discuss.dvc.org/t/april-20-heartbeat/347
tags:
- birthday
- release
- users
- MLOps
- DataOps
- CI/CD
---

## 3 years anniversary!

3 years ago on **May 4th, 2017** the first DVC blog post was published
[Data Version Control beta release: iterative machine learning](https://blog.dataversioncontrol.com/data-version-control-beta-release-iterative-machine-learning-a7faf7c8be67).
It was the first DVC tutorial and now we make the redirect to the new tutorial
It was the first DVC tutorial and later we made a redirect to the new tutorial
page to not to confuse users.

[The first DVC discussion on Reddit](https://www.reddit.com/r/Python/comments/698ian/dvc_data_scientists_collaboration_and_iterative/)
[The first DVC discussion on Reddit](https://www.reddit.com/r/Python/comments/698ian/dvc_data_scientists_collaboration_and_iterative/).
A few days later it was
[republished on other sources](https://www.kdnuggets.com/2017/05/data-version-control-iterative-machine-learning.html)

Expand Down Expand Up @@ -85,60 +86,69 @@ brand new DVC 1.0 is ready for public beta testing.
The new DVC is inspired by discussions and contributions from our community -
both fresh ideas and bug reports 😅.

Here are some of the features we’re excited to be rolling out soon:

[**Run-cache (a.k.a. build-cache)**](https://github.com/iterative/dvc/issues/1234)
(the issue was created 1.5 years ago). DVC 1.0 has a "long memory" of DVC
commands runs. This means it can identify if a `dvc repro` has already been run
and save compute time by returning the cached result - even if you didn't Git
commit that past run. We added the run-cache with CI/CD systems and other MLOps
automation tools in mind. No more auto-commits needed after `dvc repro` in the
CI/CD system side.

[\*\*Multi-stage DVC files.](https://github.com/iterative/dvc/issues/1871)
(created a year ago). We redesigned the DVC-metafile format to make saved
pipelines more interpretable and editable. Pipeline stages are now saved in a
single metafile, with all stages stored together instead of in separate files.
We removed checksums from the pipeline metafile, which improves its
human-readability.

[**Plots.**](https://github.com/iterative/dvc/issues/3409) Countless users asked
us when we'd support metrics visualizations. Now it's here: DVC 1.0 introduces a
metrics file visualization command, `dvc metrics diff`. DVC plots are powered by
the [Vega-Lite](https://vega.github.io/vega-lite/) graphic library. This
function is designed not only for showing visualizations based on the current
state of your project, but it can also combine multiple plots from your Git
history in a single chart so you can compare results across commits. Users can
visualize how, for example, their model accuracy in the latest commit differs
from another commit (or even multiple commits).
Here are the most significant features we’re excited to be rolling out soon:

### [Run cache](https://github.com/iterative/dvc/issues/1234)

DVC 1.0 has a "long memory" of DVC commands runs. This means it can identify if
a `dvc repro` has already been run and save compute time by returning the cached
result - even if you didn't Git commit that past run.

We added the run-cache with CI/CD systems and other MLOps automation tools in
mind. No more auto-commits needed after `dvc repro` in the CI/CD system side.

### [Multi-stage DVC files](https://github.com/iterative/dvc/issues/1871)

We redesigned the DVC-metafile format to make saved pipelines more interpretable
and editable. Pipeline stages are now saved in a single metafile, with all
stages stored together instead of in separate files.

Another significan step was the removed checksums from the pipeline metafile.
This improves its human-readability.

### [Plots](https://github.com/iterative/dvc/issues/3409)

Countless users asked us when we'd support metrics visualizations. Now it's
here: DVC 1.0 introduces a metrics file visualization command,
`dvc metrics diff`.

DVC plots are powered by the [Vega-Lite](https://vega.github.io/vega-lite/)
graphic library. This function is designed not only for showing visualizations
based on the current state of your project, but it can also combine multiple
plots from your Git history in a single chart so you can compare results across
commits. Users can visualize how, for example, their model accuracy in the
latest commit differs from another commit (or even multiple commits).

```
$ dvc plot diff -d logs.csv HEAD HEAD^ d1e4d848 baseline-march
$ dvc plot diff -d logs.csv HEAD HEAD^ d1e4d848 baseline_march
file:///Users/dmitry/src/plot/logs.html
$ open logs.html
```

![](/static/uploads/images/2020-05-04/dvc-plot.png)

```
$ dvc plot diff -d logs.csv HEAD HEAD^ d1e4d848 baseline-march \
$ dvc plot diff -d logs.csv HEAD HEAD^ d1e4d848 baseline_march \
-x loss --template scatter
file:///Users/dmitry/src/plot/logs.html
$ open logs.html
```

![](/static/uploads/images/2020-05-04/dvc-plot-scatter.png)

**Data transfer optimizations.** We've done substantial work on optimizing data
management commands, such as `dvc pull \ push \ status -c \ gc -c`. Now, based
on the amount of data, DVC can choose an optimal data remote traversing
strategy. We've introduced mini-indexes to help DVC instantly check data
directories instead of iterating over millions of files. This also speeds up
file adding\removing to large directories. More optimizations are included in
the release based on performance bottlenecks we profiled.
### [Data transfer optimizations](https://github.com/iterative/dvc/issues/3488)

[**Hyperparameter tracking.**(https://github.com/iterative/dvc/issues/3393) This
feature was actually released in the last DVC 0.93 version (see
We've done substantial work on optimizing data management commands, such as
`dvc pull \ push \ status -c \ gc -c`. Now, based on the amount of data, DVC can
choose an optimal data remote traversing strategy. We've introduced mini-indexes
to help DVC instantly check data directories instead of iterating over millions
of files. This also speeds up file adding\removing to large directories. More
optimizations are included in the release based on performance bottlenecks we
profiled.

### [Hyperparameter tracking](https://github.com/iterative/dvc/issues/3393)

This feature was actually released in the last DVC 0.93 version (see
[params docs](https://dvc.org/doc/command-reference/params). However, it is an
important step to support configuration files and ML experiments in a more
holistic way.
Expand All @@ -160,3 +170,7 @@ reached 5000 starts:
Thank you again for all DVC contributions, community members, and users. Every
piece of your help is highly appreciated and will bring huge benefits to the
entire ecosystem of data and ML projects.

Stay healthy and safe over in your neck of the woods and be in touch on
[Twitter](https://twitter.com/DVCorg), and our
[Discord channel](https://dvc.org/chat).
Binary file modified static/uploads/images/2020-05-04/5k_github.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 2967790

Please sign in to comment.