diff --git a/content/blog/2020-05-04-dvc-3-years-and-1-0-release.md b/content/blog/2020-05-04-dvc-3-years-and-1-0-release.md index 1c70be9e19..42e31f2f56 100644 --- a/content/blog/2020-05-04-dvc-3-years-and-1-0-release.md +++ b/content/blog/2020-05-04-dvc-3-years-and-1-0-release.md @@ -4,19 +4,21 @@ date: 2020-05-04 description: | Today, weโ€™ve got three big things to announce: ๐ŸŽ‰ 3rd-year anniversary of DVC, ๐Ÿš€ DVC 1.0 pre-release is ready and โญ 5000 GitHub starts. + descriptionLong: | Today, weโ€™ve got three big things to announce. - ๐ŸŽ‰ 3rd-year anniversary of DVC - ๐Ÿš€ DVC 1.0 pre-release is ready - โญ DVC has reached 5K GitHub starts (coincidently on the same day) -picture: static/uploads/images/2020-05-04/5k_stars.png -pictureComment: -author: ../authors/dmitry_petrov.md -commentsUrl: https://discuss.dvc.org/t/dvc-3-years-and-1-0-release/ +picture: /static/uploads/images/2020-05-04/5k_stars.png +pictureComment: 5000 GitHub stars +author: dmitry_petrov +commentsUrl: https://discuss.dvc.org/t/april-20-heartbeat/347 tags: - - birthday + - release - users - MLOps + - DataOps - CI/CD --- @@ -24,10 +26,9 @@ tags: 3 years ago on **May 4th, 2017** the first DVC blog post was published [Data Version Control beta release: iterative machine learning](https://blog.dataversioncontrol.com/data-version-control-beta-release-iterative-machine-learning-a7faf7c8be67). -It was the first DVC tutorial and now we make the redirect to the new tutorial +It was the first DVC tutorial and later we made a redirect to the new tutorial page to not to confuse users. - -[The first DVC discussion on Reddit](https://www.reddit.com/r/Python/comments/698ian/dvc_data_scientists_collaboration_and_iterative/) +[The first DVC discussion on Reddit](https://www.reddit.com/r/Python/comments/698ian/dvc_data_scientists_collaboration_and_iterative/). A few days later it was [republished on other sources](https://www.kdnuggets.com/2017/05/data-version-control-iterative-machine-learning.html) @@ -85,35 +86,41 @@ brand new DVC 1.0 is ready for public beta testing. The new DVC is inspired by discussions and contributions from our community - both fresh ideas and bug reports ๐Ÿ˜…. -Here are some of the features weโ€™re excited to be rolling out soon: - -[**Run-cache (a.k.a. build-cache)**](https://github.com/iterative/dvc/issues/1234) -(the issue was created 1.5 years ago). DVC 1.0 has a "long memory" of DVC -commands runs. This means it can identify if a `dvc repro` has already been run -and save compute time by returning the cached result - even if you didn't Git -commit that past run. We added the run-cache with CI/CD systems and other MLOps -automation tools in mind. No more auto-commits needed after `dvc repro` in the -CI/CD system side. - -[\*\*Multi-stage DVC files.](https://github.com/iterative/dvc/issues/1871) -(created a year ago). We redesigned the DVC-metafile format to make saved -pipelines more interpretable and editable. Pipeline stages are now saved in a -single metafile, with all stages stored together instead of in separate files. -We removed checksums from the pipeline metafile, which improves its -human-readability. - -[**Plots.**](https://github.com/iterative/dvc/issues/3409) Countless users asked -us when we'd support metrics visualizations. Now it's here: DVC 1.0 introduces a -metrics file visualization command, `dvc metrics diff`. DVC plots are powered by -the [Vega-Lite](https://vega.github.io/vega-lite/) graphic library. This -function is designed not only for showing visualizations based on the current -state of your project, but it can also combine multiple plots from your Git -history in a single chart so you can compare results across commits. Users can -visualize how, for example, their model accuracy in the latest commit differs -from another commit (or even multiple commits). +Here are the most significant features weโ€™re excited to be rolling out soon: + +### [Run cache](https://github.com/iterative/dvc/issues/1234) + +DVC 1.0 has a "long memory" of DVC commands runs. This means it can identify if +a `dvc repro` has already been run and save compute time by returning the cached +result - even if you didn't Git commit that past run. + +We added the run-cache with CI/CD systems and other MLOps automation tools in +mind. No more auto-commits needed after `dvc repro` in the CI/CD system side. + +### [Multi-stage DVC files](https://github.com/iterative/dvc/issues/1871) + +We redesigned the DVC-metafile format to make saved pipelines more interpretable +and editable. Pipeline stages are now saved in a single metafile, with all +stages stored together instead of in separate files. + +Another significan step was the removed checksums from the pipeline metafile. +This improves its human-readability. + +### [Plots](https://github.com/iterative/dvc/issues/3409) + +Countless users asked us when we'd support metrics visualizations. Now it's +here: DVC 1.0 introduces a metrics file visualization command, +`dvc metrics diff`. + +DVC plots are powered by the [Vega-Lite](https://vega.github.io/vega-lite/) +graphic library. This function is designed not only for showing visualizations +based on the current state of your project, but it can also combine multiple +plots from your Git history in a single chart so you can compare results across +commits. Users can visualize how, for example, their model accuracy in the +latest commit differs from another commit (or even multiple commits). ``` -$ dvc plot diff -d logs.csv HEAD HEAD^ d1e4d848 baseline-march +$ dvc plot diff -d logs.csv HEAD HEAD^ d1e4d848 baseline_march file:///Users/dmitry/src/plot/logs.html $ open logs.html ``` @@ -121,7 +128,7 @@ $ open logs.html ![](/static/uploads/images/2020-05-04/dvc-plot.png) ``` -$ dvc plot diff -d logs.csv HEAD HEAD^ d1e4d848 baseline-march \ +$ dvc plot diff -d logs.csv HEAD HEAD^ d1e4d848 baseline_march \ -x loss --template scatter file:///Users/dmitry/src/plot/logs.html $ open logs.html @@ -129,16 +136,19 @@ $ open logs.html ![](/static/uploads/images/2020-05-04/dvc-plot-scatter.png) -**Data transfer optimizations.** We've done substantial work on optimizing data -management commands, such as `dvc pull \ push \ status -c \ gc -c`. Now, based -on the amount of data, DVC can choose an optimal data remote traversing -strategy. We've introduced mini-indexes to help DVC instantly check data -directories instead of iterating over millions of files. This also speeds up -file adding\removing to large directories. More optimizations are included in -the release based on performance bottlenecks we profiled. +### [Data transfer optimizations](https://github.com/iterative/dvc/issues/3488) -[**Hyperparameter tracking.**(https://github.com/iterative/dvc/issues/3393) This -feature was actually released in the last DVC 0.93 version (see +We've done substantial work on optimizing data management commands, such as +`dvc pull \ push \ status -c \ gc -c`. Now, based on the amount of data, DVC can +choose an optimal data remote traversing strategy. We've introduced mini-indexes +to help DVC instantly check data directories instead of iterating over millions +of files. This also speeds up file adding\removing to large directories. More +optimizations are included in the release based on performance bottlenecks we +profiled. + +### [Hyperparameter tracking](https://github.com/iterative/dvc/issues/3393) + +This feature was actually released in the last DVC 0.93 version (see [params docs](https://dvc.org/doc/command-reference/params). However, it is an important step to support configuration files and ML experiments in a more holistic way. @@ -160,3 +170,7 @@ reached 5000 starts: Thank you again for all DVC contributions, community members, and users. Every piece of your help is highly appreciated and will bring huge benefits to the entire ecosystem of data and ML projects. + +Stay healthy and safe over in your neck of the woods and be in touch on +[Twitter](https://twitter.com/DVCorg), and our +[Discord channel](https://dvc.org/chat). diff --git a/static/uploads/images/2020-05-04/5k_github.png b/static/uploads/images/2020-05-04/5k_github.png index 1e55f3fad0..5a51c2e99f 100644 Binary files a/static/uploads/images/2020-05-04/5k_github.png and b/static/uploads/images/2020-05-04/5k_github.png differ