Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datalad.core.distributed.push #4206

Merged
merged 17 commits into from
Mar 11, 2020
Merged

datalad.core.distributed.push #4206

merged 17 commits into from
Mar 11, 2020

Conversation

mih
Copy link
Member

@mih mih commented Feb 26, 2020

Approach is similar as before:

  • do not attempt to build a 1:1 replacement of publish
  • see what makes sense for the scope of a "plumbing" command
  • focus on correct operation, not feature-wealth
  • keep parallelization in mind
  • hide as little as shallow as possible

Note that the tests here are affected (probabilisticly) by #4279

Status quo:

@codecov
Copy link

codecov bot commented Feb 26, 2020

Codecov Report

Merging #4206 into master will increase coverage by 0.05%.
The diff coverage is 95.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4206      +/-   ##
==========================================
+ Coverage   88.75%   88.80%   +0.05%     
==========================================
  Files         283      285       +2     
  Lines       36942    37417     +475     
==========================================
+ Hits        32788    33230     +442     
- Misses       4154     4187      +33     
Impacted Files Coverage Δ
datalad/interface/__init__.py 100.00% <ø> (ø)
datalad/core/distributed/push.py 89.55% <89.55%> (ø)
datalad/core/distributed/tests/test_push.py 99.62% <99.62%> (ø)
datalad/support/annexrepo.py 86.02% <100.00%> (+0.48%) ⬆️
datalad/downloaders/http.py 72.11% <0.00%> (-2.79%) ⬇️
datalad/downloaders/tests/test_http.py 58.39% <0.00%> (-2.19%) ⬇️
datalad/core/local/tests/test_save.py 96.75% <0.00%> (-0.73%) ⬇️
datalad/core/local/tests/test_run.py 98.25% <0.00%> (-0.44%) ⬇️
datalad/core/distributed/tests/test_clone.py 92.48% <0.00%> (-0.23%) ⬇️
datalad/core/local/tests/test_diff.py 99.50% <0.00%> (ø)
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed0faaa...188f46d. Read the comment docs.

@yarikoptic
Copy link
Member

is pull, commit, tag, etc. are also in the queue? ;-)

@mih mih force-pushed the nf-corepush branch 6 times, most recently from bd11983 to a2a67cc Compare February 29, 2020 09:57
@mih mih marked this pull request as ready for review March 2, 2020 14:28
@mih mih changed the title Only for the brave: datalad.core.distributed.push datalad.core.distributed.push Mar 2, 2020
@mih mih requested a review from kyleam March 3, 2020 14:07
Copy link
Contributor

@kyleam kyleam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments from an initial read of push.py. I haven't experimented with it enough to have a good understanding of some of the logic/decisions, but the overall approach looks fine to me.

@adswa
Copy link
Member

adswa commented Mar 6, 2020

I have played around with it a bit to find bugs and see how it feels like:

Subjective user experience

  • publish returns an "impossible" if the target sibling is not specified and helps by suggesting the --to option (publish(impossible): . (dataset) [No target sibling configured for default publication, please specific via --to]). push reports this as an error and does not suggest --to (datalad push [ERROR ] No push target given, and none could be auto-detected [publish(/tmp/dataset)] ). Personally, I find it helpful to hint at --to as a possible solution. Also, my personal reaction to "Error" in the terminal is "Oh no, I need to start over" whereas my reaction to "Impossible" is "Huh, let me read this error message in detail", so I kinda favor this situation to be an "impossible" rather than "error" outcome.

  • The error when an unknown sibling is specified is leaner with publish; I find the message of publish subjectively more readable (also because it is shorter)

(handbook) ╭─adina@muninn /tmp/dataset1 on master
╰─➤ datalad publish --to wrongname                                                1 ↵
publish(error): . (dataset) [Unknown target sibling 'wrongname' for publication]
(handbook) ╭─adina@muninn /tmp/dataset1 on master
╰─➤ datalad push --to wrongname                                                   1 ↵
[ERROR  ] Dataset <Dataset path=/tmp/dataset1> does not know of a sibling 'wrongname' to push to. [publish(/tmp/dataset1)] 
publish(error): . (dataset) [Dataset <Dataset path=/tmp/dataset1> does not know of a sibling 'wrongname' to push to.]

What I assume to be bugs

I've found some things to not work or not work as I expected /hoped, and compared them with publish.

  • Pushing a new repository to Gin fails for annex branch with push but not publish
    I created a dataset with some files locally, created an empty gin repository, added it as a sibling (via HTTPS and SSH), and tried to push. Pushing annexed files with datalad push fails, but succeeds with datalad publish:
example

Here is how it looks with HTTPS

(handbook) ╭─adina@muninn /tmp
╰─➤ datalad create dataset && cd dataset
[INFO   ] Creating a new annex repo at /tmp/dataset 
create(ok): /tmp/dataset (dataset)                                                    
(handbook) ╭─adina@muninn /tmp/dataset on master
╰─➤ echo 2356 > file3 && echo 12345 > file
(handbook) ╭─adina@muninn /tmp/dataset on master
╰─➤ datalad save file3 && datalad save --to-git file -m "save to git"
add(ok): file3 (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)
add(ok): file (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)
(handbook) ╭─adina@muninn /tmp/dataset on master
╰─➤ datalad siblings add -d . -s gin --url https://gin.g-node.org/adswa/test_dataset.git
[INFO   ] Failed to enable annex remote gin, could be a pure git or not accessible 
[WARNING] Failed to determine if gin carries annex. Remote was marked by annex as annex-ignore.  Edit .git/config to reset if you think that was done by mistake due to absent connection etc 
.: gin(-) [https://gin.g-node.org/adswa/test_dataset.git (git)]
(handbook) ╭─adina@muninn /tmp/dataset on master
╰─➤ datalad siblings
.: here(+) [git]
.: gin(-) [https://gin.g-node.org/adswa/test_dataset.git (git)]
(handbook) ╭─adina@muninn /tmp/dataset on master
╰─➤ datalad push --to gin                                                         1 ↵
# SO MUCH AUTHENTICATION
Username for 'https://gin.g-node.org': adswa
Password for 'https://[email protected]': 
Username for 'https://gin.g-node.org': adswa
Password for 'https://[email protected]': 
publish(ok): . (dataset) [refs/heads/master->gin:refs/heads/master [new branch]]      
Username for 'https://gin.g-node.org': adswa
Password for 'https://[email protected]': 
CommandError: 'git fetch gin git-annex' failed with exitcode 128 under /tmp/dataset
fatal: Couldn't find remote ref git-annex
# trying again, just to be sure:
(handbook) ╭─adina@muninn /tmp/dataset on master
╰─➤ datalad push --to gin                                                       128 ↵
Username for 'https://gin.g-node.org': adswa
Password for 'https://[email protected]': 
Username for 'https://gin.g-node.org': adswa
Password for 'https://[email protected]': 
Username for 'https://gin.g-node.org': adswa
Password for 'https://[email protected]': 
CommandError: 'git fetch gin git-annex' failed with exitcode 128 under /tmp/dataset
fatal: Couldn't find remote ref git-annex
(handbook) ╭─adina@muninn /tmp/dataset on master

works with datalad publish, though

╰─➤ datalad publish --to gin                                                    128 ↵
Username for 'https://gin.g-node.org': adswa
Password for 'https://[email protected]': 
[INFO   ] Will publish updated git-annex 
[INFO   ] Publishing <Dataset path=/tmp/dataset> to gin 
Username for 'https://gin.g-node.org': adswa
Password for 'https://[email protected]': 
Username for 'https://gin.g-node.org': adswa
Password for 'https://[email protected]': 
publish(ok): . (dataset) [pushed to gin: ['[up to date]', '[new branch]']]

Once the annex branch is published, push works:

(handbook) ╭─adina@muninn /tmp/dataset on master
╰─➤ echo 123456 > file2
(handbook) ╭─adina@muninn /tmp/dataset on master
╰─➤ datalad save
add(ok): file2 (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)
(handbook) ╭─adina@muninn /tmp/dataset on master
╰─➤ datalad push --to gin
# (oh god, so much authentication...)
Username for 'https://gin.g-node.org': adswa
Password for 'https://[email protected]': 
Username for 'https://gin.g-node.org': adswa
Password for 'https://[email protected]': 
publish(ok): . (dataset) [refs/heads/master->gin:refs/heads/master 73ee5d6..23861ee]  
Username for 'https://gin.g-node.org': adswa
Password for 'https://[email protected]': 
Username for 'https://gin.g-node.org': adswa
Password for 'https://[email protected]': 
publish(ok): . (dataset) [refs/heads/git-annex->gin:refs/heads/git-annex d23e17d..c752534]
action summary:
  publish (ok: 2)

Side note:
I think its git-annex / protocol related thing, but I wanted to mention it if it should work: with an HTTPS sibling I could not transfer annexed file contents to Gin, neither with publish nor push (only with an SSH sibling)

Here is the same with a sibling added via SSH

(handbook) ╭─adina@muninn /tmp
╰─➤ datalad create dataset1 && cd cd dataset1    
[INFO   ] Creating a new annex repo at /tmp/dataset1 
(handbook) ╭─adina@muninn /tmp/dataset1 on master
╰─➤ echo 123456 > file1 && echo 1234567 > file2
(handbook) ╭─adina@muninn /tmp/dataset1 on master
╰─➤ datalad save --to-git file1 -m "to git" && datalad save -m "to annex" file2
add(ok): file1 (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)
add(ok): file2 (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)
(handbook) ╭─adina@muninn /tmp/dataset1 on master
╰─➤ ls -l
total 8
-rw-r--r-- 1 adina adina   7 Mar  6 15:17 file1
lrwxrwxrwx 1 adina adina 108 Mar  6 15:17 file2 -> .git/annex/objects/wm/XZ/MD5E-s8--1b504d3328e16fdf281d1fb9516dd90b/MD5E-s8--1b504d3328e16fdf281d1fb9516dd90b
(handbook) ╭─adina@muninn /tmp/dataset1 on master
╰─➤ datalad siblings add -d . -s gin --url [email protected]:/adswa/test_dataset1.git
[INFO   ] Failed to enable annex remote gin, could be a pure git or not accessible 
[WARNING] Failed to determine if gin carries annex. 
.: gin(-) [[email protected]:/adswa/test_dataset1.git (git)]
(handbook) ╭─adina@muninn /tmp/dataset1 on master
╰─➤ datalad push --to gin
publish(ok): . (dataset) [refs/heads/master->gin:refs/heads/master [new branch]]      
CommandError: 'git fetch gin git-annex' failed with exitcode 128 under /tmp/dataset1
fatal: Couldn't find remote ref git-annex
fatal: the remote end hung up unexpectedly
CommandError: 'ssh -o ControlPath=/home/adina/.cache/datalad/sockets/fd903614 [email protected] 'git-upload-pack '"'"'/adswa/test_dataset1.git'"'"''' failed with exitcode 128

Again, works with publish:

(handbook) ╭─adina@muninn /tmp/dataset1 on master
╰─➤ datalad publish --to gin                                                    130 ↵
[INFO   ] Will publish updated git-annex 
[INFO   ] Publishing <Dataset path=/tmp/dataset1> to gin 
publish(ok): . (dataset) [pushed to gin: ['[up to date]', 'd148e72..e1ae4ca']] 
  • Checkout new branch and blindly push changes works as expected with publish but not push
    I created a new branch in the handbook repo, committed something, and did a datalad push --to upstream, hoping it would push my changes to a new branch. I'm not 100% sure what it did, but it concerned a wrong ref (branch search instead of tmp), and did not lead to any change upstream. If I did the same with publish, it worked (pushing my new branch upstream)
example

Here is push:

(handbook) ╭─adina@muninn ~/repos/datalad-handbook on master!
╰─➤ git co -b tmp
(handbook) ╭─adina@muninn ~/repos/datalad-handbook on tmp!
╰─➤ datalad save docs/intro/narrative.rst -m "some commit" 
add(ok): docs/intro/narrative.rst (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)
(handbook) ╭─adina@muninn ~/repos/datalad-handbook on tmp!
╰─➤ datalad push --to upstream                                                    1 ↵
publish(ok): . (dataset) [refs/heads/search->upstream:refs/heads/search c3a33f3..62999c6]

Doing it with -f gitpush worked:

(handbook) ╭─adina@muninn ~/repos/datalad-handbook on tmp!
╰─➤ datalad push --to upstream -f gitpush
publish(ok): . (dataset) [refs/heads/tmp->upstream:refs/heads/tmp [new branch]]

Here is how it looks with publish

(handbook) ╭─adina@muninn ~/repos/datalad-handbook on tmp!
╰─➤ git co -b tmp2
Switched to a new branch 'tmp2'
(handbook) ╭─adina@muninn ~/repos/datalad-handbook on tmp2!
╰─➤ datalad save docs/intro/narrative.rst -m "some commit"
add(ok): docs/intro/narrative.rst (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)
(handbook) ╭─adina@muninn ~/repos/datalad-handbook on tmp2!
╰─➤ datalad publish --to upstream                                                 1 ↵
[INFO   ] Publishing <Dataset path=/home/adina/repos/datalad-handbook> to upstream 
publish(ok): . (dataset) [pushed to upstream: ['[new branch]']]     
- Probably not yet intended/adjusted to work (or I'm not up to date with RIA developments): I can't use `push` for pushing dataset contents to a RIA store:
(handbook) ╭─adina@muninn /tmp/dataset1 on master
╰─➤ datalad create-sibling-ria -s backup ria+file:///tmp/myriastore               1 ↵
[INFO   ] create siblings 'backup' and 'backup-ria' ... 
[INFO   ] Fetching updates for <Dataset path=/tmp/dataset1> 
[INFO   ] Configure additional publication dependency on "backup-ria" 
create-sibling-ria(ok): /tmp/dataset1 (dataset)
(handbook) ╭─adina@muninn /tmp/dataset1 on master
╰─➤ datalad siblings                                                              1 ↵
.: here(+) [git]
.: gin(+) [[email protected]:/adswa/test_dataset1.git (git)]
.: backup-ria(+) [ria]
.: backup(-) [/tmp/myriastore/266/dcfa0-5fb5-11ea-8652-27eb4f93a5dc (git)]
(handbook) ╭─adina@muninn /tmp/dataset1 on master
╰─➤ datalad push --to backup
CommandError: 'git push --progress --porcelain backup-ria master:master' failed with exitcode 128 under /tmp/dataset1
fatal: 'backup-ria' does not appear to be a git repository
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
(handbook) ╭─adina@muninn /tmp/dataset1 on master
╰─➤ datalad publish --to backup                                                 128 ↵
[INFO   ] Transferring data to configured publication dependency: 'backup-ria' 
[INFO   ] Publishing to configured dependency: 'backup-ria' 
[INFO   ] Publishing <Dataset path=/tmp/dataset1> to backup 
publish(ok): . (dataset) [pushed to backup: ['[new branch]', '[new branch]']] 

@mih
Copy link
Member Author

mih commented Mar 6, 2020

  • publish returns an "impossible" if the target sibling is not specified and helps by suggesting the --to option (publish(impossible): . (dataset) [No target sibling configured for default publication, please specific via --to]). push reports this as an error and does not suggest --to (datalad push [ERROR ] No push target given, and none could be auto-detected [publish(/tmp/dataset)] ). Personally, I find it helpful to hint at --to as a possible solution. Also, my personal reaction to "Error" in the terminal is "Oh no, I need to start over" whereas my reaction to "Impossible" is "Huh, let me read this error message in detail", so I kinda favor this situation to be an "impossible" rather than "error" outcome.

Error is now the same as before.

  • The error when an unknown sibling is specified is leaner with publish; I find the message of publish subjectively more readable (also because it is shorter)

Error is now even leaner!

Thanks for both observations -- very helpful!

@mih
Copy link
Member Author

mih commented Mar 6, 2020

Pushing a new repository to Gin fails for annex branch with push but not publish

Tried with a local setup, but failed to replicate:

# dest is a bare git repo
% datalad siblings add -d . -s gin --url ../dest                                      
[INFO   ] Failed to enable annex remote gin, could be a pure git or not accessible 
[WARNING] Failed to determine if gin carries annex. 
.: gin(-) [../dest (git)]
(datalad3-dev) mih@meiner /tmp/src (git)-[master] % datalad push --to gin 
publish(ok): . (dataset) [refs/heads/master->gin:refs/heads/master [new branch]]
publish(ok): . (dataset) [refs/heads/git-annex->gin:refs/heads/git-annex [new branch]]
action summary:
  copy (notneeded: 1)
  publish (ok: 2)

Same with GIN

(datalad3-dev) mih@meiner /tmp/src (git)-[master] % dl siblings add -s realgin --url [email protected]:/mih/datalad-test3.git
[INFO   ] Failed to enable annex remote realgin, could be a pure git or not accessible 
[WARNING] Failed to determine if realgin carries annex. 
.: realgin(-) [[email protected]:/mih/datalad-test3.git (git)]
(datalad3-dev) mih@meiner /tmp/src (git)-[master] % datalad push --to realgin 
publish(ok): . (dataset) [refs/heads/master->realgin:refs/heads/master [new branch]]
publish(ok): . (dataset) [refs/heads/git-annex->realgin:refs/heads/git-annex [new branch]]
action summary:
  copy (notneeded: 1)
  publish (ok: 2)

Maybe some other fix that was applied since you reviewed dealt with this?

@mih
Copy link
Member Author

mih commented Mar 7, 2020

Checkout new branch and blindly push changes works as expected with publish but not push

Seems to work as desired (now?).

(datalad3-dev) mih@meiner /tmp/src (git)-[master] % git checkout -b new
Switched to a new branch 'new'
(datalad3-dev) mih@meiner /tmp/src (git)-[new] % datalad push --to realgin
publish(ok): . (dataset) [refs/heads/new->realgin:refs/heads/new [new branch]]
copy(ok): /tmp/src/mike (file) [to realgin...]
publish(ok): . (dataset) [refs/heads/git-annex->realgin:refs/heads/git-annex 5b969cf..c1b5ba8]
action summary:
  copy (ok: 1)
  publish (ok: 2)

@mih
Copy link
Member Author

mih commented Mar 7, 2020

Probably not yet intended/adjusted to work (or I'm not up to date with RIA developments): I can't use push for pushing dataset contents to a RIA store

This is a bug! backup-ria is a special remote and push must anticipate this and ont attempt a git push. Will fix.

@adswa
Copy link
Member

adswa commented Mar 7, 2020

Maybe some other fix that was applied since you reviewed dealt with this?

Huh... Good to know that it worked for you. Not sure why it does not work for me,I'm doing it on the branch of this PR. I'm attaching a WTF below. Maybe annex version?

WTF
datalad wtf                                                           128 ↵
# WTF
## configuration <SENSITIVE, report disabled by configuration>
## datalad 
  - full_version: 0.12.2.dev379-gd340
  - version: 0.12.2.dev379
## dataset 
  - id: 52cc3e2a-6095-11ea-8652-27eb4f93a5dc
  - metadata: <SENSITIVE, report disabled by configuration>
  - path: /tmp/some
  - repo: AnnexRepo
## dependencies 
  - appdirs: 1.4.3
  - boto: 2.49.0
  - cmd:7z: 16.02
  - cmd:annex: 7.20190819+git2-g908476a9b-1~ndall+1
  - cmd:bundled-git: 2.20.1
  - cmd:git: 2.20.1
  - cmd:system-git: 2.25.0
  - cmd:system-ssh: 8.1p1
  - exifread: 2.1.2
  - git: 3.0.5
  - gitdb: 2.0.5
  - humanize: 0.5.1
  - iso8601: 0.1.12
  - keyring: 19.0.2
  - keyrings.alt: 3.1.1
  - msgpack: 0.6.1
  - mutagen: 1.43.0
  - requests: 2.22.0
  - scrapy: 1.7.4
  - tqdm: 4.32.2
  - wrapt: 1.11.2
## environment 
  - GIT_PYTHON_GIT_EXECUTABLE: /usr/lib/git-annex.linux/git
  - LANG: en_US.UTF-8
  - LANGUAGE: en_US:en
  - PATH: /home/adina/env/handbook/bin:/home/adina/Documents/freesurfer/bin:/home/adina/Documents/freesurfer/fsfast/bin:/home/adina/Documents/freesurfer/tktools:/usr/share/fsl/5.0/bin:/usr/lib/fsl/5.0:/home/adina/Documents/freesurfer/mni/bin:/usr/share/fsl/5.0/5.0/bin:/home/adina/.local/bin:/home/adina/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/home/adina/.dotfiles/bin:/usr/local/go/bin:/home/adina/work/bin
  - PYTHONPATH: /usr/bin/python
## extensions 
  - container: 
    - description: Containerized environments
    - entrypoints: 
      - datalad_container.containers_add.ContainersAdd: 
        - class: ContainersAdd
        - load_error: None
        - module: datalad_container.containers_add
        - names: 
          - containers-add
          - containers_add
      - datalad_container.containers_list.ContainersList: 
        - class: ContainersList
        - load_error: None
        - module: datalad_container.containers_list
        - names: 
          - containers-list
          - containers_list
      - datalad_container.containers_remove.ContainersRemove: 
        - class: ContainersRemove
        - load_error: None
        - module: datalad_container.containers_remove
        - names: 
          - containers-remove
          - containers_remove
      - datalad_container.containers_run.ContainersRun: 
        - class: ContainersRun
        - load_error: None
        - module: datalad_container.containers_run
        - names: 
          - containers-run
          - containers_run
    - load_error: None
    - module: datalad_container
    - version: 0.5.0
  - crawler: 
    - description: Crawl web resources
    - entrypoints: 
      - datalad_crawler.crawl.Crawl: 
        - class: Crawl
        - load_error: None
        - module: datalad_crawler.crawl
        - names: 
          - crawl
      - datalad_crawler.crawl_init.CrawlInit: 
        - class: CrawlInit
        - load_error: None
        - module: datalad_crawler.crawl_init
        - names: 
          - crawl-init
          - crawl_init
    - load_error: None
    - module: datalad_crawler
    - version: 0.4.4
  - hirni: 
    - description: HIRNI workflows
    - entrypoints: 
      - datalad_hirni.commands.dicom2spec.Dicom2Spec: 
        - class: Dicom2Spec
        - load_error: None
        - module: datalad_hirni.commands.dicom2spec
        - names: 
          - hirni-dicom2spec
          - hirni_dicom2spec
      - datalad_hirni.commands.import_dicoms.ImportDicoms: 
        - class: ImportDicoms
        - load_error: None
        - module: datalad_hirni.commands.import_dicoms
        - names: 
          - hirni-import-dcm
          - hirni_import_dcm
      - datalad_hirni.commands.spec2bids.Spec2Bids: 
        - class: Spec2Bids
        - load_error: None
        - module: datalad_hirni.commands.spec2bids
        - names: 
          - hirni-spec2bids
          - hirni_spec2bids
      - datalad_hirni.commands.spec4anything.Spec4Anything: 
        - class: Spec4Anything
        - load_error: None
        - module: datalad_hirni.commands.spec4anything
        - names: 
          - hirni-spec4anything
          - hirni_spec4anything
    - load_error: None
    - module: datalad_hirni
    - version: 0.0.4
  - metalad: 
    - description: DataLad semantic metadata command suite
    - entrypoints: 
      - datalad_metalad.aggregate.Aggregate: 
        - class: Aggregate
        - load_error: None
        - module: datalad_metalad.aggregate
        - names: 
          - meta-aggregate
          - meta_aggregate
      - datalad_metalad.dump.Dump: 
        - class: Dump
        - load_error: None
        - module: datalad_metalad.dump
        - names: 
          - meta-dump
          - meta_dump
      - datalad_metalad.extract.Extract: 
        - class: Extract
        - load_error: None
        - module: datalad_metalad.extract
        - names: 
          - meta-extract
          - meta_extract
    - load_error: None
    - module: datalad_metalad
    - version: 0.2.0
  - neuroimaging: 
    - description: Neuroimaging tools
    - entrypoints: 
      - datalad_neuroimaging.bids2scidata.BIDS2Scidata: 
        - class: BIDS2Scidata
        - load_error: None
        - module: datalad_neuroimaging.bids2scidata
        - names: 
          - bids2scidata
    - load_error: None
    - module: datalad_neuroimaging
    - version: 0.2.3
  - ria: 
    - description: Helper for the remote indexed archive (RIA) special remote
    - entrypoints: 
      - ria_remote.create_sibling_ria.CreateSiblingRia: 
        - class: CreateSiblingRia
        - load_error: None
        - module: ria_remote.create_sibling_ria
        - names: 
          - create-sibling-ria
          - create_sibling_ria
      - ria_remote.export_archive.ExportArchive: 
        - class: ExportArchive
        - load_error: None
        - module: ria_remote.export_archive
        - names: 
          - ria-export-archive
          - ria_export_archive
    - load_error: None
    - module: ria_remote
    - version: 0.7+90.g31bcf57
  - webapp: 
    - description: Generic web app support
    - entrypoints: 
      - datalad_webapp.WebApp: 
        - class: WebApp
        - load_error: None
        - module: datalad_webapp
        - names: 
          - webapp
          - webapp
    - load_error: None
    - module: datalad_webapp
    - version: 0.2
## git-annex 
  - build flags: 
    - Assistant
    - Webapp
    - Pairing
    - S3
    - WebDAV
    - Inotify
    - DBus
    - DesktopNotify
    - TorrentParser
    - MagicMime
    - Feeds
    - Testsuite
  - dependency versions: 
    - aws-0.20
    - bloomfilter-2.0.1.0
    - cryptonite-0.25
    - DAV-1.3.3
    - feed-1.0.0.0
    - ghc-8.4.4
    - http-client-0.5.13.1
    - persistent-sqlite-2.8.2
    - torrent-10000.1.1
    - uuid-1.3.13
    - yesod-1.6.0
  - key/value backends: 
    - SHA256E
    - SHA256
    - SHA512E
    - SHA512
    - SHA224E
    - SHA224
    - SHA384E
    - SHA384
    - SHA3_256E
    - SHA3_256
    - SHA3_512E
    - SHA3_512
    - SHA3_224E
    - SHA3_224
    - SHA3_384E
    - SHA3_384
    - SKEIN256E
    - SKEIN256
    - SKEIN512E
    - SKEIN512
    - BLAKE2B256E
    - BLAKE2B256
    - BLAKE2B512E
    - BLAKE2B512
    - BLAKE2B160E
    - BLAKE2B160
    - BLAKE2B224E
    - BLAKE2B224
    - BLAKE2B384E
    - BLAKE2B384
    - BLAKE2BP512E
    - BLAKE2BP512
    - BLAKE2S256E
    - BLAKE2S256
    - BLAKE2S160E
    - BLAKE2S160
    - BLAKE2S224E
    - BLAKE2S224
    - BLAKE2SP256E
    - BLAKE2SP256
    - BLAKE2SP224E
    - BLAKE2SP224
    - SHA1E
    - SHA1
    - MD5E
    - MD5
    - WORM
    - URL
  - local repository version: 5
  - operating system: linux x86_64
  - remote types: 
    - git
    - gcrypt
    - p2p
    - S3
    - bup
    - directory
    - rsync
    - web
    - bittorrent
    - webdav
    - adb
    - tahoe
    - glacier
    - ddar
    - git-lfs
    - hook
    - external
  - supported repository versions: 
    - 5
    - 7
  - upgrade supported from repository versions: 
    - 0
    - 1
    - 2
    - 3
    - 4
    - 5
    - 6
  - version: 7.20190819+git2-g908476a9b-1~ndall+1
## location 
  - path: /tmp/some
  - type: dataset
## metadata_extractors 
  - annex: 
    - load_error: None
    - module: datalad.metadata.extractors.annex
    - version: None
  - audio: 
    - load_error: None
    - module: datalad.metadata.extractors.audio
    - version: None
  - bids: 
    - load_error: None
    - module: datalad_neuroimaging.extractors.bids
    - version: None
  - datacite: 
    - load_error: None
    - module: datalad.metadata.extractors.datacite
    - version: None
  - datalad_core: 
    - load_error: None
    - module: datalad.metadata.extractors.datalad_core
    - version: None
  - datalad_rfc822: 
    - load_error: None
    - module: datalad.metadata.extractors.datalad_rfc822
    - version: None
  - dicom: 
    - load_error: None
    - module: datalad_neuroimaging.extractors.dicom
    - version: None
  - exif: 
    - load_error: None
    - module: datalad.metadata.extractors.exif
    - version: None
  - frictionless_datapackage: 
    - load_error: None
    - module: datalad.metadata.extractors.frictionless_datapackage
    - version: None
  - image: 
    - load_error: None
    - module: datalad.metadata.extractors.image
    - version: None
  - metalad_annex: 
    - load_error: None
    - module: datalad_metalad.extractors.annex
    - version: None
  - metalad_core: 
    - load_error: None
    - module: datalad_metalad.extractors.core
    - version: None
  - metalad_custom: 
    - load_error: None
    - module: datalad_metalad.extractors.custom
    - version: None
  - metalad_runprov: 
    - load_error: None
    - module: datalad_metalad.extractors.runprov
    - version: None
  - nidm: 
    - load_error: None
    - module: datalad_neuroimaging.extractors.nidm
    - version: None
  - nifti1: 
    - load_error: None
    - module: datalad_neuroimaging.extractors.nifti1
    - version: None
  - xmp: 
    - load_error: None
    - module: datalad.metadata.extractors.xmp
    - version: None
## python 
  - implementation: CPython
  - version: 3.7.3
## system 
  - distribution: debian/bullseye/sid
  - encoding: 
    - default: utf-8
    - filesystem: utf-8
    - locale.prefered: UTF-8
  - max_path_length: 265
  - name: Linux
  - release: 5.4.0-3-amd64
  - type: posix
  - version: #1 SMP Debian 5.4.13-1 (2020-01-19)

@mih
Copy link
Member Author

mih commented Mar 7, 2020

Huh... Good to know that it worked for you. Not sure why it does not work for me,I'm doing it on the branch of this PR. I'm attaching a WTF below. Maybe annex version?

That could be. I have Git 2.24.1 and Git-annex 8.20200226.

@mih mih force-pushed the nf-corepush branch 3 times, most recently from b0939f8 to 8fc8b20 Compare March 9, 2020 14:35
mih added 15 commits March 10, 2020 20:44
- general approach is: push main branch -> copy -> push git-annex branch

  This will expose any history issues (missing pieces, conflicts) that
  could possibly invalidate local decision making. push() will fail early,
  allowing for fixes (e.g. update(merge=True)), and then reattempt.
  The annex branch is pushed last, after file transfer is completed.
  It is the least critical part, because annex will update availability
  info on the remote end on its own, as part of the transfer.

- push != sync generally changes will only go from local to remote.
  However, in corner cases it is necessary to use `annex sync`
  internally to consolidate the git-annex branch or corresponding
  branches.

- perform data transfer via async-call to `annex copy`, not via
  AnnexRepo.copy_to() which performs too many inspections and reporting
  decisions.

- current approach can pass many paths to `annex copy`, so I opted for
  a temp file that is used as stdin for a batch-mode process of `annex copy`.
  This saves result merges across the alternative 'file chunk' runs.

- support push to empty repos (fixes dataladgh-4074)

- implement tests largely without `create_sibling`, because it doesnt work
  on Windows

- support for managed branches

- pass --jobs to git-annex copy (fixes gh-3732)
Must publish the subdatasets, but no further subdatasets of their own.
@mih
Copy link
Member Author

mih commented Mar 11, 2020

The remaining test failure is

datalad.support.exceptions.CommandError: CommandError: 'git fetch datastore git-annex' failed with exitcode 128 under /tmp/datalad_temp_test_ria_push6g__lhgl [err: 'fatal: Couldn't find remote ref git-annex

No idea why that would be Travis specific.... Edit: found a real machine that also has the behavior!!

Edit: WTF!?!?! It is a capitalization difference in the error message...

@mih
Copy link
Member Author

mih commented Mar 11, 2020

OK, finally! Everything that should be green, is green. I will give it a few min of rest, and then merge this.

@mih mih merged commit 1dd15b3 into datalad:master Mar 11, 2020
@mih mih deleted the nf-corepush branch March 11, 2020 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment