Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3 years and 1.0 pre release #1213

Merged
merged 14 commits into from
May 4, 2020
198 changes: 198 additions & 0 deletions content/blog/2020-05-04-dvc-3-years-and-1-0-release.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
---
title: DVC 3 Years Anniversary and 1.0 Pre-release
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we omit Anniversary - to make it fit into a single line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "DVC Year 3 and 1.0 Pre-release"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DVC 3 Years and 1.0 Pre-release ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DVC 3 Years 🎉 and 1.0 Pre-release 🚀 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about 3 years of DVC and 1.0 Pre-release?

date: 2020-05-04
description: |
Today, we’ve got three big things to announce: 🎉 3rd-year anniversary of DVC,
🚀 DVC 1.0 pre-release is ready and ⭐ 5000 GitHub starts.

descriptionLong: |
Today, we’ve got three big things to announce.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today, we've got three big announcements.


🎉 3rd-year anniversary of DVC

🚀 DVC 1.0 pre-release is ready

⭐ DVC has reached 5K GitHub starts (coincidently on the same day)

We are sharing our learnings from this journey and how they affected the new
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last sentence is a bit awkward. Maybe, We'll share what we've learned from our journey and how DVC is growing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to make a connection between the learning and the features we release. How about We'll share what we've learned from our journey, how it helped for the release and how DVC is growing.?

DVC 1.0 release.
picture: 2020-05-04/5k_stars.png
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a 2x size image to make it sharp

pictureComment: 5000 GitHub stars
author: dmitry_petrov
commentsUrl: https://discuss.dvc.org/t/dvc-3-years-anniversary-and-1-0-pre-release/374
tags:
- Release
- MLOps
- DataOps
- CI/CD
---

## 3 years anniversary!

3 years ago on **May 4th, 2017** the first DVC blog post was published
[Data Version Control beta release: iterative machine learning](https://blog.dataversioncontrol.com/data-version-control-beta-release-iterative-machine-learning-a7faf7c8be67).
It was the first DVC tutorial and later we made a redirect to the new tutorial
page to not to confuse users.
[The first DVC discussion on Reddit](https://www.reddit.com/r/Python/comments/698ian/dvc_data_scientists_collaboration_and_iterative/).
A few days later it was
[republished on other sources](https://www.kdnuggets.com/2017/05/data-version-control-iterative-machine-learning.html)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think showing these old posts makes a very strong point. I would condense it a lot:

Three years ago on **May 4th, 2017**, I published the [first blog post about DVC](https://www.kdnuggets.com/2017/05/data-version-control-iterative-machine-learning.html). Until that point, DVC was a private project between myself and two others, my cofounder Ivan and our collaborator, Ruslan. Today, things look very different.

or some variant of this


Today, DVC gets recognized at professional conferences: people spot our logo,
and sometimes even our faces, and want to chat. There's much more content about
DVC coming from bloggers than from inside our organization. We're seeing more
and more job postings that list DVC as a requirement, and we're showing up in
[data science textbooks](https://www.amazon.com/Learn-Python-Building-Science-Applications/dp/1789535360).
When we find a new place DVC is mentioned, we celebrate in our Slack - we've
come a long way!

The data science and ML space is fast-paced and vibrant, and we're proud that
DVC is making an impact on discussions about best practices for healthy,
sustainable ML. Every week, we chat with companies and research groups using DVC
to make their teams more productive. We're proud to be part of the growing MLOps
movement: so far, a majority of CI/CD for ML projects are implemented with DVC
under the hood.

I can confidently say that DVC wouldn't have been possible without a lot of help
from our community. Thank you to everyone who has supported us:

**DVC core team.** The core team of the project takes the majority of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think take a bigger scope- mention that the team is now 12 full-time people or however large we technically are (even though they're not all "core DVC engineering"; this is more about Iterative).

development activities, constantly brings new ideas, documents the product and
always on the first line of user's support. Many users knows that great user
Copy link
Contributor

@elleobrien elleobrien May 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar rewrite:
The DVC team has been the force driving our project's evolution- we've grown from 2 to 12 full-time engineers, developers, and data scientists. We often get feedback about how fast our team answers user questions- we've been told our user support is one of DVC's "killer features". It's all thanks to this amazing team.

support if one of the "killing features" of DVC. Today the core team consists of
6 brave engineers.

**DVC contributors.** As of today, the DVC code base has
[126 individual contributors](https://github.com/iterative/dvc/graphs/contributors).
Many of these folks put hours into their PRs. We're grateful for their tenacity
and generosity.

**Documentation contributors.** Another
[124 people contributed](https://github.com/iterative/dvc.org/graphs/contributors)
to the [DVC documentation](https://dvc.org/doc) and
[the website](https://dvc.org/). Every time a new person tries out DVC, they
benefit from the hard work that's gone into our docs.

**Active community members.** Active DVC users help our team understand and
better anticipate their needs and identify priorities for development. They
share bright ideas for new features, locate and investigate bugs in code, and
welcome and support new users.

**People who give DVC a shot.** Today, there are thousands of data scientists,
ML engineers, and developers using DVC on a regular basis. The number of users
is growing every week. Our [Discord channel](http://dvc.org/chat) has almost two
thousand users. Hundreds more connect with us through email and Twitter. To
everyone willing to try out DVC, thank you for the opportunity.

## DVC 1.0. is the result of 3 years of learning
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

All these contributions, big and small, have a collective impact on DVC's
development. I'm happy (and a bit nervous) to announce that a pre-release of a
brand new DVC 1.0 is ready for public beta testing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
brand new DVC 1.0 is ready for public beta testing.
brand new DVC 1.0 is ready for public alpha testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Alpha testing is carried out in a lab environment and usually, the testers are internal employees of the organization" https://www.guru99.com/alpha-beta-testing-demystified.html


The new DVC is inspired by discussions and contributions from our community -
both fresh ideas and bug reports 😅.

Here are the most significant features we’re excited to be rolling out soon:

### [Run cache](https://github.com/iterative/dvc/issues/1234)

_Learnings:_ Forcing users to make Git commits for each ML experiment creates
too much overhead.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved

DVC 1.0 has a "long memory" of DVC commands runs. This means it can identify if
a `dvc repro` has already been run and save compute time by returning the cached
result - _even if you didn't Git commit that past run_.

We added the run-cache with CI/CD systems and other MLOps and DataOps automation
tools in mind. No more auto-commits needed after `dvc repro` in the CI/CD system
side.

### [Multi-stage DVC files](https://github.com/iterative/dvc/issues/1871)

_Learnings:_ ML pipelines evolve much faster than data engineering pipelines.

We redesigned the DVC-metafile format to make saved pipelines more interpretable
and editable. Pipeline stages are now saved in a single metafile, with all
stages stored together instead of in separate files.
Comment on lines +117 to +119
Copy link
Contributor

@jorgeorpinel jorgeorpinel May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redesigned the DVC-metafile format

But regular .dvc files will continue to exist independently for some commands right? Like dvc add/import @dmpetrov

This comment was marked as resolved.


Another significan step was the removed checksums from the pipeline metafile.
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
This improves its human-readability.

### [Plots](https://github.com/iterative/dvc/issues/3409)

_Learnings:_ Versioning metrics and plots are no less important than data
versioning.

Countless users asked us when we'd support metrics visualizations. Now it's
here: DVC 1.0 introduces a metrics file visualization command,
`dvc metrics diff`. DVC plots are powered by the
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
[Vega-Lite](https://vega.github.io/vega-lite/) graphic library.

This function is designed not only for showing visualizations based on the
current state of your project, but it can also combine multiple plots from your
Git history in a single chart so you can compare results across commits. Users
can visualize how, for example, their model accuracy in the latest commit
differs from another commit (or even multiple commits).

```dvc
$ dvc plot diff -d logs.csv HEAD HEAD^ d1e4d848 baseline_march
file:///Users/dmitry/src/plot/logs.html
$ open logs.html
```

![](/uploads/images/2020-05-04/dvc-plot.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make them a bit bigger?


```dvc
$ dvc plot diff -d logs.csv HEAD HEAD^ d1e4d848 baseline_march \
-x loss --template scatter
file:///Users/dmitry/src/plot/logs.html
$ open logs.html
```

![](/uploads/images/2020-05-04/dvc-plot-scatter.png)

### [Data transfer optimizations](https://github.com/iterative/dvc/issues/3488)

_Learnings:_ In ML projects, data transfer optimization is still the king.

We've done substantial work on optimizing data management commands, such as
`dvc pull \ push \ status -c \ gc -c`. Now, based on the amount of data, DVC can
choose an optimal data remote traversing strategy.

[Wini-indexes](https://github.com/iterative/dvc/issues/2147) were introduced to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to put link to @pmrowla report

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provided the link and the plot. Looks very solid 😎

help DVC instantly check data directories instead of iterating over millions of
files. This also speeds up file adding\removing to large directories.

More optimizations are included in the release based on performance bottlenecks
we profiled.

### [Hyperparameter tracking](https://github.com/iterative/dvc/issues/3393)

_Learnings:_ ML pipeline steps depends only on a subset of config file.

This feature was actually released in the last DVC 0.93 version (see
[params docs](https://dvc.org/doc/command-reference/params). However, it is an
important step to support configuration files and ML experiments in a more
holistic way.

I hope our the most active users will find time to check the DVC pre release
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
version and provide their feeback. The installation instruction is
[on our website](https://dvc.org/doc/install/pre-release).

## 5000 GitHub stars

Activity on our GitHub page has grown organically since the DVC repo went public
on May 4th, 2017. Coincidently, today, in the 3rd year anniversary we have
reached 5000 starts:

![](/uploads/images/2020-05-04/5k_github.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should use my owl gif https://gph.is/g/Z5BgyGD

Copy link
Member Author

@dmpetrov dmpetrov May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andronovhopf do you have the same gif without 5K starts on the top by any chance? :)

Copy link
Member Author

@dmpetrov dmpetrov May 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great but does not work as the front image. The issue is the resolution I guess or gif might not be supported as the front image. Keeping the old one for now. And looking for other options.


## Thank you!

Thank you again for all DVC contributions, community members, and users. Every
jorgeorpinel marked this conversation as resolved.
Show resolved Hide resolved
piece of your help is highly appreciated and will bring huge benefits to the
entire ecosystem of data and ML projects.

Stay healthy and safe over in your neck of the woods and be in touch on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stay healthy and safe, wherever you are in the world. And be in touch on our Twitter and Discord channel!

[Twitter](https://twitter.com/DVCorg), and our
[Discord channel](https://dvc.org/chat).
Binary file added static/uploads/images/2020-05-04/5k_github.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/uploads/images/2020-05-04/5k_stars.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/uploads/images/2020-05-04/dvc-plot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.