Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple instances of DVC can repro the same stage file in the same git repo at the same time. is it expected ? #3370

Closed
Ykid opened this issue Feb 20, 2020 · 14 comments
Labels
bug Did we break something? p2-medium Medium priority, should be done, but less important research

Comments

@Ykid
Copy link

Ykid commented Feb 20, 2020

Please provide information about your setup
DVC version(i.e. dvc --version), Platform and method of installation (pip, homebrew, pkg Mac, exe (Windows), DEB(Linux), RPM(Linux))

DVC version: 0.86.4
Python version: 3.7.6
Platform: Linux-5.0.0-37-generic-x86_64-with-debian-buster-sid
Binary: False
Package: pip

to reproduce

mkdir testrepo && cd testrepo
git init
dvc init
git add .
git commit -m "init"
echo "a" >> dep.txt
echo -e "From python\nRUN pip install dvc\nWORKDIR /workspace\nCMD dvc repro 1.dvc" > Dockerfile
docker build -t testdvc .
dvc run --no-exec -f 1.dvc -d dep.txt -o test.txt 'sleep 10; echo "a">> test.txt'

docker run --rm -v $(pwd):/workspace --name testdvcdvc1 testdvc:latest

# in another window, same directory
docker run --rm -v $(pwd):/workspace --name testdvcdvc2 testdvc:latest
@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Feb 20, 2020
@efiop
Copy link
Contributor

efiop commented Feb 20, 2020

@Ykid Local version for that works as expected, errorring out:

(3.7.0-dvc) ➜  dvc git:(master) ✗ dvc repro 1.dvc
ERROR: failed to reproduce '1.dvc': 'test.txt' is busy, it is being blocked by:
  (PID 5662): /home/efiop/.pyenv/versions/3.7.0-dvc/bin/dvc repro 1.dvc

If there are no processes with such PIDs, you can manually remove'.dvc/tmp/rwlock' and try again.

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

with your particular dockerfile, the tolerances are quite lower, because pip install dvc might take a while by itself, so I'm not sure if your dvc repros actually run at the same time or if one starts running after the first one is done. Could you try pre-installing dvc into your container first and then try again? Also, logs would be appreciated as well 🙂

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label Feb 20, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Feb 20, 2020
@Ykid
Copy link
Author

Ykid commented Feb 21, 2020

Could you try pre-installing dvc into your container first and then try again?

this is already the case
echo -e "From python\nRUN pip install dvc\nWORKDIR /workspace\nCMD dvc repro 1.dvc" > Dockerfile
docker build -t testdvc .

when I enable verbose mode, the logs are

2020-02-21 01:13:34,911 DEBUG: PRAGMA user_version;
2020-02-21 01:13:34,911 DEBUG: fetched: [(3,)]
2020-02-21 01:13:34,912 DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
2020-02-21 01:13:34,912 DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
2020-02-21 01:13:34,912 DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
2020-02-21 01:13:34,912 DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
2020-02-21 01:13:34,913 DEBUG: PRAGMA user_version = 3;
2020-02-21 01:13:35,051 DEBUG: Computed stage '1.dvc' md5: 'be5d0955705e41bef367398faf0f9038'
2020-02-21 01:13:35,051 WARNING: DVC-file '1.dvc' changed.
2020-02-21 01:13:35,051 WARNING: Stage '1.dvc' changed.
2020-02-21 01:13:35,061 DEBUG: Removing output 'test.txt' of '1.dvc'.
Running command:
	sleep 10; echo "a">> test.txt
2020-02-21 01:13:35,065 DEBUG: Computed stage '1.dvc' md5: 'be5d0955705e41bef367398faf0f9038'
2020-02-21 01:13:35,065 DEBUG: SELECT count from state_info WHERE rowid=?
2020-02-21 01:13:35,065 DEBUG: fetched: [(0,)]
2020-02-21 01:13:35,065 DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?
2020-02-21 01:13:45,078 DEBUG: PRAGMA user_version;
2020-02-21 01:13:45,079 DEBUG: fetched: [(3,)]
2020-02-21 01:13:45,079 DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
2020-02-21 01:13:45,080 DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
2020-02-21 01:13:45,080 DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
2020-02-21 01:13:45,080 DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
2020-02-21 01:13:45,081 DEBUG: PRAGMA user_version = 3;
2020-02-21 01:13:45,082 DEBUG: Path dep.txt inode 72881240
2020-02-21 01:13:45,082 DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
2020-02-21 01:13:45,082 DEBUG: fetched: []
2020-02-21 01:13:45,083 DEBUG: Path dep.txt inode 72881240
2020-02-21 01:13:45,084 DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
2020-02-21 01:13:45,084 DEBUG: fetched: []
2020-02-21 01:13:45,084 DEBUG: INSERT INTO state(inode, mtime, size, md5, timestamp) VALUES (?, ?, ?, ?, ?)
2020-02-21 01:13:45,092 DEBUG: Adding 'test.txt' to '.gitignore'.
2020-02-21 01:13:45,094 DEBUG: Path test.txt inode 72881248
2020-02-21 01:13:45,094 DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
2020-02-21 01:13:45,094 DEBUG: fetched: []
2020-02-21 01:13:45,095 DEBUG: Path test.txt inode 72881248
2020-02-21 01:13:45,096 DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
2020-02-21 01:13:45,096 DEBUG: fetched: []
2020-02-21 01:13:45,096 DEBUG: INSERT INTO state(inode, mtime, size, md5, timestamp) VALUES (?, ?, ?, ?, ?)
2020-02-21 01:13:45,096 DEBUG: {'test.txt': 'modified'}
2020-02-21 01:13:45,097 DEBUG: Path test.txt inode 72881248
2020-02-21 01:13:45,097 DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
2020-02-21 01:13:45,097 DEBUG: fetched: [('1582247625072605696', '2', '60b725f10c9c85c70d97880dfe8191b3', '1582247625096445952')]
2020-02-21 01:13:45,097 DEBUG: UPDATE state SET timestamp = ? WHERE inode = ?
2020-02-21 01:13:45,098 DEBUG: Computed stage '1.dvc' md5: 'a51d242f1aae57d97ba4cfaa3ec436af'
2020-02-21 01:13:45,106 DEBUG: Saving 'test.txt' to '.dvc/cache/60/b725f10c9c85c70d97880dfe8191b3'.
2020-02-21 01:13:45,106 DEBUG: cache '.dvc/cache/60/b725f10c9c85c70d97880dfe8191b3' expected '60b725f10c9c85c70d97880dfe8191b3' actual 'None'
2020-02-21 01:13:45,108 DEBUG: Cache type 'reflink' is not supported: reflink is not supported
2020-02-21 01:13:45,108 DEBUG: Created 'copy': .dvc/cache/60/b725f10c9c85c70d97880dfe8191b3 -> test.txt
2020-02-21 01:13:45,109 DEBUG: Path test.txt inode 72881253
2020-02-21 01:13:45,109 DEBUG: REPLACE INTO link_state(path, inode, mtime) VALUES (?, ?, ?)
2020-02-21 01:13:45,110 DEBUG: Path test.txt inode 72881253
2020-02-21 01:13:45,110 DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
2020-02-21 01:13:45,111 DEBUG: fetched: []
2020-02-21 01:13:45,111 DEBUG: INSERT INTO state(inode, mtime, size, md5, timestamp) VALUES (?, ?, ?, ?, ?)
2020-02-21 01:13:45,111 DEBUG: Path .dvc/cache/60/b725f10c9c85c70d97880dfe8191b3 inode 72881248
2020-02-21 01:13:45,111 DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
2020-02-21 01:13:45,112 DEBUG: fetched: [('1582247625072605696', '2', '60b725f10c9c85c70d97880dfe8191b3', '1582247625097821184')]
2020-02-21 01:13:45,112 DEBUG: UPDATE state SET mtime = ?, size = ?, md5 = ?, timestamp = ? WHERE inode = ?
2020-02-21 01:13:45,120 DEBUG: '1.dvc' was reproduced
2020-02-21 01:13:45,124 DEBUG: Saving information to '1.dvc'.

To track the changes with git, run:

	git add 1.dvc .gitignore
2020-02-21 01:13:45,145 DEBUG: SELECT count from state_info WHERE rowid=?
2020-02-21 01:13:45,145 DEBUG: fetched: [(0,)]
2020-02-21 01:13:45,145 DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?
2020-02-21 01:13:36,732 DEBUG: PRAGMA user_version;
2020-02-21 01:13:36,733 DEBUG: fetched: [(3,)]
2020-02-21 01:13:36,733 DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
2020-02-21 01:13:36,734 DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
2020-02-21 01:13:36,734 DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
2020-02-21 01:13:36,734 DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
2020-02-21 01:13:36,735 DEBUG: PRAGMA user_version = 3;
2020-02-21 01:13:36,877 DEBUG: Computed stage '1.dvc' md5: 'be5d0955705e41bef367398faf0f9038'
2020-02-21 01:13:36,878 WARNING: DVC-file '1.dvc' changed.
2020-02-21 01:13:36,878 WARNING: Stage '1.dvc' changed.
2020-02-21 01:13:36,888 DEBUG: Removing output 'test.txt' of '1.dvc'.
Running command:
	sleep 10; echo "a">> test.txt
2020-02-21 01:13:36,892 DEBUG: Computed stage '1.dvc' md5: 'be5d0955705e41bef367398faf0f9038'
2020-02-21 01:13:36,892 DEBUG: SELECT count from state_info WHERE rowid=?
2020-02-21 01:13:36,892 DEBUG: fetched: [(0,)]
2020-02-21 01:13:36,893 DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?
2020-02-21 01:13:46,905 DEBUG: PRAGMA user_version;
2020-02-21 01:13:46,905 DEBUG: fetched: [(3,)]
2020-02-21 01:13:46,906 DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
2020-02-21 01:13:46,906 DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
2020-02-21 01:13:46,907 DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
2020-02-21 01:13:46,907 DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
2020-02-21 01:13:46,908 DEBUG: PRAGMA user_version = 3;
2020-02-21 01:13:46,909 DEBUG: Path dep.txt inode 72881240
2020-02-21 01:13:46,909 DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
2020-02-21 01:13:46,910 DEBUG: fetched: [('1582247599108759040', '2', '60b725f10c9c85c70d97880dfe8191b3', '1582247625084637952')]
2020-02-21 01:13:46,910 DEBUG: UPDATE state SET timestamp = ? WHERE inode = ?
2020-02-21 01:13:46,918 DEBUG: Path test.txt inode 72881253
2020-02-21 01:13:46,919 DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
2020-02-21 01:13:46,919 DEBUG: fetched: [('1582247625104605440', '2', '60b725f10c9c85c70d97880dfe8191b3', '1582247625111299840')]
2020-02-21 01:13:46,920 DEBUG: Path test.txt inode 72881253
2020-02-21 01:13:46,920 DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
2020-02-21 01:13:46,920 DEBUG: fetched: [('1582247625104605440', '2', '60b725f10c9c85c70d97880dfe8191b3', '1582247625111299840')]
2020-02-21 01:13:46,921 DEBUG: UPDATE state SET mtime = ?, size = ?, md5 = ?, timestamp = ? WHERE inode = ?
2020-02-21 01:13:46,921 DEBUG: {'test.txt': 'modified'}
2020-02-21 01:13:46,922 DEBUG: Path test.txt inode 72881253
2020-02-21 01:13:46,922 DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
2020-02-21 01:13:46,922 DEBUG: fetched: [('1582247626900594688', '4', '0d227f1abf8c2932d342e9b99cc957eb', '1582247626921193216')]
2020-02-21 01:13:46,922 DEBUG: UPDATE state SET timestamp = ? WHERE inode = ?
2020-02-21 01:13:46,923 DEBUG: Computed stage '1.dvc' md5: '7ae3f072240bf8be60fbadf49404d5a9'
2020-02-21 01:13:46,931 DEBUG: Saving 'test.txt' to '.dvc/cache/0d/227f1abf8c2932d342e9b99cc957eb'.
2020-02-21 01:13:46,931 DEBUG: cache '.dvc/cache/0d/227f1abf8c2932d342e9b99cc957eb' expected '0d227f1abf8c2932d342e9b99cc957eb' actual 'None'
2020-02-21 01:13:46,933 DEBUG: Cache type 'reflink' is not supported: reflink is not supported
2020-02-21 01:13:46,933 DEBUG: Created 'copy': .dvc/cache/0d/227f1abf8c2932d342e9b99cc957eb -> test.txt
2020-02-21 01:13:46,934 DEBUG: Path test.txt inode 72881254
2020-02-21 01:13:46,934 DEBUG: REPLACE INTO link_state(path, inode, mtime) VALUES (?, ?, ?)
2020-02-21 01:13:46,934 DEBUG: Path test.txt inode 72881254
2020-02-21 01:13:46,935 DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
2020-02-21 01:13:46,935 DEBUG: fetched: []
2020-02-21 01:13:46,935 DEBUG: INSERT INTO state(inode, mtime, size, md5, timestamp) VALUES (?, ?, ?, ?, ?)
2020-02-21 01:13:46,935 DEBUG: Path .dvc/cache/0d/227f1abf8c2932d342e9b99cc957eb inode 72881253
2020-02-21 01:13:46,936 DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
2020-02-21 01:13:46,936 DEBUG: fetched: [('1582247626900594688', '4', '0d227f1abf8c2932d342e9b99cc957eb', '1582247626922825472')]
2020-02-21 01:13:46,936 DEBUG: UPDATE state SET mtime = ?, size = ?, md5 = ?, timestamp = ? WHERE inode = ?
2020-02-21 01:13:46,943 DEBUG: '1.dvc' was reproduced
2020-02-21 01:13:46,947 DEBUG: Saving information to '1.dvc'.

To track the changes with git, run:

	git add 1.dvc
2020-02-21 01:13:46,969 DEBUG: SELECT count from state_info WHERE rowid=?
2020-02-21 01:13:46,969 DEBUG: fetched: [(3,)]
2020-02-21 01:13:46,969 DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?

@efiop
Copy link
Contributor

efiop commented Feb 21, 2020

@Ykid Oops, I'm blind 🙁 Sorry, didn't notice that right away.

Thanks for the logs! Very interesting, looks like there is indeed some issue there. Could you try running dvc repros without docker, please? Just as a sanity check.

@Ykid
Copy link
Author

Ykid commented Feb 21, 2020

@efiop without docker dvc it looks like what you said above #3370 (comment)

@efiop
Copy link
Contributor

efiop commented Feb 21, 2020

@Ykid Ok, so I guess we are missing something docker-specific here. Might be some isolation aspect that we are not catching right away. Need to investigate closer. There main mechanism that should've kicked in there is .dvc/tmp/rwlock. Could you please adjust your commands to do cat .dvc/tmp/rwlock and show the output? Just for some context: when some command is running .dvc/tmp/rwlock has deps and outs locked for read and/or write, so that other commands will hit it.

@skshetry
Copy link
Member

@Ykid, @efiop, the docker is running as PID 1 (root). That could also be a problem.

@Ykid
Copy link
Author

Ykid commented Feb 21, 2020

@efiop i am not with my computer now. Would you mind trying to reproduce ? My description should Applicable to Linux platform

@efiop
Copy link
Contributor

efiop commented Feb 21, 2020

@Ykid, @efiop, the docker is running as PID 1 (root). That could also be a problem.

@skshetry You are absolutely right! That is precisely what is happening! Pid and cmd match, so rwlock thinks that it is his own process. Might consider adding another parameter to info to distinguish that (not sure which one, but we could also use a static counter of sorts to ensure that it was us who has the previous lock) or might disable reentrance (might require a significant refactor and is generally a bad idea).

@efiop efiop added bug Did we break something? p2-medium Medium priority, should be done, but less important research labels Feb 21, 2020
@skshetry

This comment has been minimized.

@Ykid
Copy link
Author

Ykid commented Feb 24, 2020

how about associate a random number for execution of dvc ?

@efiop
Copy link
Contributor

efiop commented Feb 26, 2020

@Ykid Yep, some variation of that would work. Maybe you would like to submit a PR? 🙂

@Ykid
Copy link
Author

Ykid commented Feb 26, 2020

interested to give a try but can't guarantee a delivery time due to work factors. 😅 . also a few pointers to corresponding files would be nice

@efiop
Copy link
Contributor

efiop commented Feb 28, 2020

@Ykid That's understandable. The file in question is https://github.com/iterative/dvc/blob/master/dvc/rwlock.py . In particlular this line https://github.com/iterative/dvc/blob/0.87.0/dvc/rwlock.py#L161 . We need to generate a run-unique id and put it in addition to the existing info. That way things won't overlap even if pids do.

@efiop efiop removed the awaiting response we are waiting for your reply, please respond! :) label May 20, 2020
@efiop
Copy link
Contributor

efiop commented Oct 8, 2021

Closing as stale

@efiop efiop closed this as completed Oct 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Did we break something? p2-medium Medium priority, should be done, but less important research
Projects
None yet
Development

No branches or pull requests

3 participants