Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote files not removed when checking out earlier commits. #3739

Closed
KaggleShmaggle opened this issue May 4, 2020 · 4 comments
Closed

Remote files not removed when checking out earlier commits. #3739

KaggleShmaggle opened this issue May 4, 2020 · 4 comments
Labels
feature request Requesting a new feature good first issue help wanted p2-medium Medium priority, should be done, but less important

Comments

@KaggleShmaggle
Copy link

So I encountered what seems like a bug while trying to use DVC to track changes to files in a remote S3 folder, and the kind folk in Discord thought it was worth posting here. When a remote file is committed, if I checkout a commit prior to the first commit where the remote file is added, the file isn't removed. This isn't the case for local files, which do get removed. I would've anticipated that local file tracking and remote file tracking would have identical behavior, but I'm posting here just to make sure. My overarching goal is to track S3 files as part of getting part of a data lake under version control, where those files stay on S3 and never move locally. I received some advice in Discord on how to start rolling a fix; do you guys think this feature is worth me writing a pull request for?

Steps to Reproduce: (Tracked Data File Not Deleted Remotely)
git init
dvc init
dvc remote add s3cache s3://cache_folder
dvc config cache.s3 s3cache
dvc push -r s3cache
git commit -m "first commit"
(add data_file.txt to s3://data_folder/)
dvc add s3cache s3://data_folder/data_file.txt
git add data_file.txt
git commit -m "second commit"
dvc push -r s3cache
git checkout HEAD~1
dvc checkout

Expected Behavior:
That data_file.txt would disappear from S3 once I checked out a commit prior to the file's existence.

Observed Behavior:
The file remains untouched. This behavior is inconsistent with how tracked files behave locally.

Steps to Reproduce: (Tracked Data File is Deleted Locally)
git add
dvc add
git commit -m "first commit"
touch data_file.txt
dvc add data_file.txt
git add data_file.txt.dvc .gitignore
git commit -m "second commit"
git checkout HEAD~1
dvc checkout

DVC Version: 0.93.0
Platform: Ubuntu 18.04 LTS (snap install)

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label May 4, 2020
@efiop
Copy link
Contributor

efiop commented May 4, 2020

@efiop
Copy link
Contributor

efiop commented May 4, 2020

The solution is pretty straightforward though: https://discordapp.com/channels/485586884165107732/485596304961962003/705883052060049537 🙂

@efiop efiop added feature request Requesting a new feature good first issue help wanted p2-medium Medium priority, should be done, but less important labels May 4, 2020
@triage-new-issues triage-new-issues bot removed triage Needs to be triaged labels May 4, 2020
@KaggleShmaggle
Copy link
Author

Yeah, since I flagged it, and since I already have an S3 bucket set up to test a fix already, I can take a crack at a fix tonight.

@efiop
Copy link
Contributor

efiop commented May 3, 2021

We'll be reconsidering this scenario #3920 as it is currently higly experimental and we don't reocmmend people to use it. Closing.

@efiop efiop closed this as completed May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requesting a new feature good first issue help wanted p2-medium Medium priority, should be done, but less important
Projects
None yet
Development

No branches or pull requests

2 participants