-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datalad.core.distributed.push #4206
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4206 +/- ##
==========================================
+ Coverage 88.75% 88.80% +0.05%
==========================================
Files 283 285 +2
Lines 36942 37417 +475
==========================================
+ Hits 32788 33230 +442
- Misses 4154 4187 +33
Continue to review full report at Codecov.
|
is |
bd11983
to
a2a67cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments from an initial read of push.py
. I haven't experimented with it enough to have a good understanding of some of the logic/decisions, but the overall approach looks fine to me.
I have played around with it a bit to find bugs and see how it feels like: Subjective user experience
(handbook) ╭─adina@muninn /tmp/dataset1 on master
╰─➤ datalad publish --to wrongname 1 ↵
publish(error): . (dataset) [Unknown target sibling 'wrongname' for publication]
(handbook) ╭─adina@muninn /tmp/dataset1 on master
╰─➤ datalad push --to wrongname 1 ↵
[ERROR ] Dataset <Dataset path=/tmp/dataset1> does not know of a sibling 'wrongname' to push to. [publish(/tmp/dataset1)]
publish(error): . (dataset) [Dataset <Dataset path=/tmp/dataset1> does not know of a sibling 'wrongname' to push to.] What I assume to be bugsI've found some things to not work or not work as I expected /hoped, and compared them with
exampleHere is how it looks with HTTPS (handbook) ╭─adina@muninn /tmp
╰─➤ datalad create dataset && cd dataset
[INFO ] Creating a new annex repo at /tmp/dataset
create(ok): /tmp/dataset (dataset)
(handbook) ╭─adina@muninn /tmp/dataset on master
╰─➤ echo 2356 > file3 && echo 12345 > file
(handbook) ╭─adina@muninn /tmp/dataset on master
╰─➤ datalad save file3 && datalad save --to-git file -m "save to git"
add(ok): file3 (file)
save(ok): . (dataset)
action summary:
add (ok: 1)
save (ok: 1)
add(ok): file (file)
save(ok): . (dataset)
action summary:
add (ok: 1)
save (ok: 1)
(handbook) ╭─adina@muninn /tmp/dataset on master
╰─➤ datalad siblings add -d . -s gin --url https://gin.g-node.org/adswa/test_dataset.git
[INFO ] Failed to enable annex remote gin, could be a pure git or not accessible
[WARNING] Failed to determine if gin carries annex. Remote was marked by annex as annex-ignore. Edit .git/config to reset if you think that was done by mistake due to absent connection etc
.: gin(-) [https://gin.g-node.org/adswa/test_dataset.git (git)]
(handbook) ╭─adina@muninn /tmp/dataset on master
╰─➤ datalad siblings
.: here(+) [git]
.: gin(-) [https://gin.g-node.org/adswa/test_dataset.git (git)]
(handbook) ╭─adina@muninn /tmp/dataset on master
╰─➤ datalad push --to gin 1 ↵
# SO MUCH AUTHENTICATION
Username for 'https://gin.g-node.org': adswa
Password for 'https://[email protected]':
Username for 'https://gin.g-node.org': adswa
Password for 'https://[email protected]':
publish(ok): . (dataset) [refs/heads/master->gin:refs/heads/master [new branch]]
Username for 'https://gin.g-node.org': adswa
Password for 'https://[email protected]':
CommandError: 'git fetch gin git-annex' failed with exitcode 128 under /tmp/dataset
fatal: Couldn't find remote ref git-annex
# trying again, just to be sure:
(handbook) ╭─adina@muninn /tmp/dataset on master
╰─➤ datalad push --to gin 128 ↵
Username for 'https://gin.g-node.org': adswa
Password for 'https://[email protected]':
Username for 'https://gin.g-node.org': adswa
Password for 'https://[email protected]':
Username for 'https://gin.g-node.org': adswa
Password for 'https://[email protected]':
CommandError: 'git fetch gin git-annex' failed with exitcode 128 under /tmp/dataset
fatal: Couldn't find remote ref git-annex
(handbook) ╭─adina@muninn /tmp/dataset on master works with datalad publish, though
Once the annex branch is published, push works:
Side note: Here is the same with a sibling added via SSH
Again, works with publish:
exampleHere is push: (handbook) ╭─adina@muninn ~/repos/datalad-handbook on master!
╰─➤ git co -b tmp
(handbook) ╭─adina@muninn ~/repos/datalad-handbook on tmp!
╰─➤ datalad save docs/intro/narrative.rst -m "some commit"
add(ok): docs/intro/narrative.rst (file)
save(ok): . (dataset)
action summary:
add (ok: 1)
save (ok: 1)
(handbook) ╭─adina@muninn ~/repos/datalad-handbook on tmp!
╰─➤ datalad push --to upstream 1 ↵
publish(ok): . (dataset) [refs/heads/search->upstream:refs/heads/search c3a33f3..62999c6] Doing it with
Here is how it looks with publish (handbook) ╭─adina@muninn ~/repos/datalad-handbook on tmp!
╰─➤ git co -b tmp2
Switched to a new branch 'tmp2'
(handbook) ╭─adina@muninn ~/repos/datalad-handbook on tmp2!
╰─➤ datalad save docs/intro/narrative.rst -m "some commit"
add(ok): docs/intro/narrative.rst (file)
save(ok): . (dataset)
action summary:
add (ok: 1)
save (ok: 1)
(handbook) ╭─adina@muninn ~/repos/datalad-handbook on tmp2!
╰─➤ datalad publish --to upstream 1 ↵
[INFO ] Publishing <Dataset path=/home/adina/repos/datalad-handbook> to upstream
publish(ok): . (dataset) [pushed to upstream: ['[new branch]']] - Probably not yet intended/adjusted to work (or I'm not up to date with RIA developments): I can't use `push` for pushing dataset contents to a RIA store:
|
Error is now the same as before.
Error is now even leaner! Thanks for both observations -- very helpful! |
Tried with a local setup, but failed to replicate: # dest is a bare git repo
% datalad siblings add -d . -s gin --url ../dest
[INFO ] Failed to enable annex remote gin, could be a pure git or not accessible
[WARNING] Failed to determine if gin carries annex.
.: gin(-) [../dest (git)]
(datalad3-dev) mih@meiner /tmp/src (git)-[master] % datalad push --to gin
publish(ok): . (dataset) [refs/heads/master->gin:refs/heads/master [new branch]]
publish(ok): . (dataset) [refs/heads/git-annex->gin:refs/heads/git-annex [new branch]]
action summary:
copy (notneeded: 1)
publish (ok: 2) Same with GIN (datalad3-dev) mih@meiner /tmp/src (git)-[master] % dl siblings add -s realgin --url [email protected]:/mih/datalad-test3.git
[INFO ] Failed to enable annex remote realgin, could be a pure git or not accessible
[WARNING] Failed to determine if realgin carries annex.
.: realgin(-) [[email protected]:/mih/datalad-test3.git (git)]
(datalad3-dev) mih@meiner /tmp/src (git)-[master] % datalad push --to realgin
publish(ok): . (dataset) [refs/heads/master->realgin:refs/heads/master [new branch]]
publish(ok): . (dataset) [refs/heads/git-annex->realgin:refs/heads/git-annex [new branch]]
action summary:
copy (notneeded: 1)
publish (ok: 2) Maybe some other fix that was applied since you reviewed dealt with this? |
Seems to work as desired (now?).
|
This is a bug! |
Huh... Good to know that it worked for you. Not sure why it does not work for me,I'm doing it on the branch of this PR. I'm attaching a WTF below. Maybe annex version? WTFdatalad wtf 128 ↵
# WTF
## configuration <SENSITIVE, report disabled by configuration>
## datalad
- full_version: 0.12.2.dev379-gd340
- version: 0.12.2.dev379
## dataset
- id: 52cc3e2a-6095-11ea-8652-27eb4f93a5dc
- metadata: <SENSITIVE, report disabled by configuration>
- path: /tmp/some
- repo: AnnexRepo
## dependencies
- appdirs: 1.4.3
- boto: 2.49.0
- cmd:7z: 16.02
- cmd:annex: 7.20190819+git2-g908476a9b-1~ndall+1
- cmd:bundled-git: 2.20.1
- cmd:git: 2.20.1
- cmd:system-git: 2.25.0
- cmd:system-ssh: 8.1p1
- exifread: 2.1.2
- git: 3.0.5
- gitdb: 2.0.5
- humanize: 0.5.1
- iso8601: 0.1.12
- keyring: 19.0.2
- keyrings.alt: 3.1.1
- msgpack: 0.6.1
- mutagen: 1.43.0
- requests: 2.22.0
- scrapy: 1.7.4
- tqdm: 4.32.2
- wrapt: 1.11.2
## environment
- GIT_PYTHON_GIT_EXECUTABLE: /usr/lib/git-annex.linux/git
- LANG: en_US.UTF-8
- LANGUAGE: en_US:en
- PATH: /home/adina/env/handbook/bin:/home/adina/Documents/freesurfer/bin:/home/adina/Documents/freesurfer/fsfast/bin:/home/adina/Documents/freesurfer/tktools:/usr/share/fsl/5.0/bin:/usr/lib/fsl/5.0:/home/adina/Documents/freesurfer/mni/bin:/usr/share/fsl/5.0/5.0/bin:/home/adina/.local/bin:/home/adina/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adina/.dotfiles/bin:/usr/local/go/bin:/home/adina/work/bin
- PYTHONPATH: /usr/bin/python
## extensions
- container:
- description: Containerized environments
- entrypoints:
- datalad_container.containers_add.ContainersAdd:
- class: ContainersAdd
- load_error: None
- module: datalad_container.containers_add
- names:
- containers-add
- containers_add
- datalad_container.containers_list.ContainersList:
- class: ContainersList
- load_error: None
- module: datalad_container.containers_list
- names:
- containers-list
- containers_list
- datalad_container.containers_remove.ContainersRemove:
- class: ContainersRemove
- load_error: None
- module: datalad_container.containers_remove
- names:
- containers-remove
- containers_remove
- datalad_container.containers_run.ContainersRun:
- class: ContainersRun
- load_error: None
- module: datalad_container.containers_run
- names:
- containers-run
- containers_run
- load_error: None
- module: datalad_container
- version: 0.5.0
- crawler:
- description: Crawl web resources
- entrypoints:
- datalad_crawler.crawl.Crawl:
- class: Crawl
- load_error: None
- module: datalad_crawler.crawl
- names:
- crawl
- datalad_crawler.crawl_init.CrawlInit:
- class: CrawlInit
- load_error: None
- module: datalad_crawler.crawl_init
- names:
- crawl-init
- crawl_init
- load_error: None
- module: datalad_crawler
- version: 0.4.4
- hirni:
- description: HIRNI workflows
- entrypoints:
- datalad_hirni.commands.dicom2spec.Dicom2Spec:
- class: Dicom2Spec
- load_error: None
- module: datalad_hirni.commands.dicom2spec
- names:
- hirni-dicom2spec
- hirni_dicom2spec
- datalad_hirni.commands.import_dicoms.ImportDicoms:
- class: ImportDicoms
- load_error: None
- module: datalad_hirni.commands.import_dicoms
- names:
- hirni-import-dcm
- hirni_import_dcm
- datalad_hirni.commands.spec2bids.Spec2Bids:
- class: Spec2Bids
- load_error: None
- module: datalad_hirni.commands.spec2bids
- names:
- hirni-spec2bids
- hirni_spec2bids
- datalad_hirni.commands.spec4anything.Spec4Anything:
- class: Spec4Anything
- load_error: None
- module: datalad_hirni.commands.spec4anything
- names:
- hirni-spec4anything
- hirni_spec4anything
- load_error: None
- module: datalad_hirni
- version: 0.0.4
- metalad:
- description: DataLad semantic metadata command suite
- entrypoints:
- datalad_metalad.aggregate.Aggregate:
- class: Aggregate
- load_error: None
- module: datalad_metalad.aggregate
- names:
- meta-aggregate
- meta_aggregate
- datalad_metalad.dump.Dump:
- class: Dump
- load_error: None
- module: datalad_metalad.dump
- names:
- meta-dump
- meta_dump
- datalad_metalad.extract.Extract:
- class: Extract
- load_error: None
- module: datalad_metalad.extract
- names:
- meta-extract
- meta_extract
- load_error: None
- module: datalad_metalad
- version: 0.2.0
- neuroimaging:
- description: Neuroimaging tools
- entrypoints:
- datalad_neuroimaging.bids2scidata.BIDS2Scidata:
- class: BIDS2Scidata
- load_error: None
- module: datalad_neuroimaging.bids2scidata
- names:
- bids2scidata
- load_error: None
- module: datalad_neuroimaging
- version: 0.2.3
- ria:
- description: Helper for the remote indexed archive (RIA) special remote
- entrypoints:
- ria_remote.create_sibling_ria.CreateSiblingRia:
- class: CreateSiblingRia
- load_error: None
- module: ria_remote.create_sibling_ria
- names:
- create-sibling-ria
- create_sibling_ria
- ria_remote.export_archive.ExportArchive:
- class: ExportArchive
- load_error: None
- module: ria_remote.export_archive
- names:
- ria-export-archive
- ria_export_archive
- load_error: None
- module: ria_remote
- version: 0.7+90.g31bcf57
- webapp:
- description: Generic web app support
- entrypoints:
- datalad_webapp.WebApp:
- class: WebApp
- load_error: None
- module: datalad_webapp
- names:
- webapp
- webapp
- load_error: None
- module: datalad_webapp
- version: 0.2
## git-annex
- build flags:
- Assistant
- Webapp
- Pairing
- S3
- WebDAV
- Inotify
- DBus
- DesktopNotify
- TorrentParser
- MagicMime
- Feeds
- Testsuite
- dependency versions:
- aws-0.20
- bloomfilter-2.0.1.0
- cryptonite-0.25
- DAV-1.3.3
- feed-1.0.0.0
- ghc-8.4.4
- http-client-0.5.13.1
- persistent-sqlite-2.8.2
- torrent-10000.1.1
- uuid-1.3.13
- yesod-1.6.0
- key/value backends:
- SHA256E
- SHA256
- SHA512E
- SHA512
- SHA224E
- SHA224
- SHA384E
- SHA384
- SHA3_256E
- SHA3_256
- SHA3_512E
- SHA3_512
- SHA3_224E
- SHA3_224
- SHA3_384E
- SHA3_384
- SKEIN256E
- SKEIN256
- SKEIN512E
- SKEIN512
- BLAKE2B256E
- BLAKE2B256
- BLAKE2B512E
- BLAKE2B512
- BLAKE2B160E
- BLAKE2B160
- BLAKE2B224E
- BLAKE2B224
- BLAKE2B384E
- BLAKE2B384
- BLAKE2BP512E
- BLAKE2BP512
- BLAKE2S256E
- BLAKE2S256
- BLAKE2S160E
- BLAKE2S160
- BLAKE2S224E
- BLAKE2S224
- BLAKE2SP256E
- BLAKE2SP256
- BLAKE2SP224E
- BLAKE2SP224
- SHA1E
- SHA1
- MD5E
- MD5
- WORM
- URL
- local repository version: 5
- operating system: linux x86_64
- remote types:
- git
- gcrypt
- p2p
- S3
- bup
- directory
- rsync
- web
- bittorrent
- webdav
- adb
- tahoe
- glacier
- ddar
- git-lfs
- hook
- external
- supported repository versions:
- 5
- 7
- upgrade supported from repository versions:
- 0
- 1
- 2
- 3
- 4
- 5
- 6
- version: 7.20190819+git2-g908476a9b-1~ndall+1
## location
- path: /tmp/some
- type: dataset
## metadata_extractors
- annex:
- load_error: None
- module: datalad.metadata.extractors.annex
- version: None
- audio:
- load_error: None
- module: datalad.metadata.extractors.audio
- version: None
- bids:
- load_error: None
- module: datalad_neuroimaging.extractors.bids
- version: None
- datacite:
- load_error: None
- module: datalad.metadata.extractors.datacite
- version: None
- datalad_core:
- load_error: None
- module: datalad.metadata.extractors.datalad_core
- version: None
- datalad_rfc822:
- load_error: None
- module: datalad.metadata.extractors.datalad_rfc822
- version: None
- dicom:
- load_error: None
- module: datalad_neuroimaging.extractors.dicom
- version: None
- exif:
- load_error: None
- module: datalad.metadata.extractors.exif
- version: None
- frictionless_datapackage:
- load_error: None
- module: datalad.metadata.extractors.frictionless_datapackage
- version: None
- image:
- load_error: None
- module: datalad.metadata.extractors.image
- version: None
- metalad_annex:
- load_error: None
- module: datalad_metalad.extractors.annex
- version: None
- metalad_core:
- load_error: None
- module: datalad_metalad.extractors.core
- version: None
- metalad_custom:
- load_error: None
- module: datalad_metalad.extractors.custom
- version: None
- metalad_runprov:
- load_error: None
- module: datalad_metalad.extractors.runprov
- version: None
- nidm:
- load_error: None
- module: datalad_neuroimaging.extractors.nidm
- version: None
- nifti1:
- load_error: None
- module: datalad_neuroimaging.extractors.nifti1
- version: None
- xmp:
- load_error: None
- module: datalad.metadata.extractors.xmp
- version: None
## python
- implementation: CPython
- version: 3.7.3
## system
- distribution: debian/bullseye/sid
- encoding:
- default: utf-8
- filesystem: utf-8
- locale.prefered: UTF-8
- max_path_length: 265
- name: Linux
- release: 5.4.0-3-amd64
- type: posix
- version: #1 SMP Debian 5.4.13-1 (2020-01-19) |
That could be. I have Git 2.24.1 and Git-annex 8.20200226. |
b0939f8
to
8fc8b20
Compare
- general approach is: push main branch -> copy -> push git-annex branch This will expose any history issues (missing pieces, conflicts) that could possibly invalidate local decision making. push() will fail early, allowing for fixes (e.g. update(merge=True)), and then reattempt. The annex branch is pushed last, after file transfer is completed. It is the least critical part, because annex will update availability info on the remote end on its own, as part of the transfer. - push != sync generally changes will only go from local to remote. However, in corner cases it is necessary to use `annex sync` internally to consolidate the git-annex branch or corresponding branches. - perform data transfer via async-call to `annex copy`, not via AnnexRepo.copy_to() which performs too many inspections and reporting decisions. - current approach can pass many paths to `annex copy`, so I opted for a temp file that is used as stdin for a batch-mode process of `annex copy`. This saves result merges across the alternative 'file chunk' runs. - support push to empty repos (fixes dataladgh-4074) - implement tests largely without `create_sibling`, because it doesnt work on Windows - support for managed branches - pass --jobs to git-annex copy (fixes gh-3732)
Must publish the subdatasets, but no further subdatasets of their own.
The remaining test failure is
No idea why that would be Travis specific.... Edit: found a real machine that also has the behavior!! Edit: WTF!?!?! It is a capitalization difference in the error message... |
Older Gits used to be different
OK, finally! Everything that should be green, is green. I will give it a few min of rest, and then merge this. |
Approach is similar as before:
publish
Note that the tests here are affected (probabilisticly) by #4279
Status quo:
diff
not having--report-filetype raw
capabilities ATM. If this is an issue, RF_diff_cmd()
into a proper public function that we can use. it already exposes this parameter.status
anddiff
. Fixes . directory is still too special for publish -- does not copy data if I publish '.' #1726 and fixes Observations on inconsistent handling of relative paths across python commands #4098action=publish
result for each pushed branch. Fixes publish should report which branches are pushed, not just that "new branch" #2000AnnexRepo.copy_to()
, but a dedicated async call with a protocol class implementingannex JSON comminucation. Results are directly reported after normalization without intermediate inspection and decision making. Fixes AnnexRepo.copy_to() should report errors not do fancy detection #3412
--jobs
on togit annex copy
. Fixes gh-3732impossible
result when content should be transferred, but isn't present locally. Fixes publish: should report that not all requested data was published since not present locally #3424datatransfer
andnodatatransfer
(among other) that replace the clunky--transfer-data
parameter ofpublish
. Fixespublish --force
should adopt "mode" specification model #3414annex ... --json
commands #4228 be altered