Remote files not removed when checking out earlier commits. #3739
Labels
feature request
Requesting a new feature
good first issue
help wanted
p2-medium
Medium priority, should be done, but less important
So I encountered what seems like a bug while trying to use DVC to track changes to files in a remote S3 folder, and the kind folk in Discord thought it was worth posting here. When a remote file is committed, if I checkout a commit prior to the first commit where the remote file is added, the file isn't removed. This isn't the case for local files, which do get removed. I would've anticipated that local file tracking and remote file tracking would have identical behavior, but I'm posting here just to make sure. My overarching goal is to track S3 files as part of getting part of a data lake under version control, where those files stay on S3 and never move locally. I received some advice in Discord on how to start rolling a fix; do you guys think this feature is worth me writing a pull request for?
Steps to Reproduce: (Tracked Data File Not Deleted Remotely)
git init
dvc init
dvc remote add s3cache s3://cache_folder
dvc config cache.s3 s3cache
dvc push -r s3cache
git commit -m "first commit"
(add data_file.txt to s3://data_folder/)
dvc add s3cache s3://data_folder/data_file.txt
git add data_file.txt
git commit -m "second commit"
dvc push -r s3cache
git checkout HEAD~1
dvc checkout
Expected Behavior:
That data_file.txt would disappear from S3 once I checked out a commit prior to the file's existence.
Observed Behavior:
The file remains untouched. This behavior is inconsistent with how tracked files behave locally.
Steps to Reproduce: (Tracked Data File is Deleted Locally)
git add
dvc add
git commit -m "first commit"
touch data_file.txt
dvc add data_file.txt
git add data_file.txt.dvc .gitignore
git commit -m "second commit"
git checkout HEAD~1
dvc checkout
DVC Version: 0.93.0
Platform: Ubuntu 18.04 LTS (snap install)
The text was updated successfully, but these errors were encountered: