Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publication dependency is lost on cloning, pushing from clone fails with mode export #201

Closed
jsheunis opened this issue Jun 4, 2024 · 5 comments · Fixed by #204
Closed

Comments

@jsheunis
Copy link
Member

jsheunis commented Jun 4, 2024

I am not sure if this is an issue or intended functioning of cloning from a dataset on OSF. It was encountered on Linux during a DataLad workshop by @charlottemock (thanks!)

Environment

  • encountered first on linux, reproduced on macOS
  • datalad 1.0.2
  • datalad-next 1.4.1
  • datalad-osf 0.3.0

Create dataset, add file, save to git

> datalad create osfbla
create(ok): /Users/jsheunis/Documents/psyinf/Data/osfbla (dataset)

> cd osfbla
> echo 'kaas' > k.txt
> datalad save --to-git
add(ok): k.txt (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)

Create osf sibling, check .git/config, push to sibling

> datalad create-sibling-osf --title YODA --mode export -s osf --public
create-sibling-osf(ok): https://osf.io/gfp9r/
[INFO   ] Configure additional publication dependency on "osf-storage"
configure-sibling(ok): . (sibling)

> cat .git/config
[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true
[annex]
	uuid = 9d551f14-e086-452b-9ec9-1fb96923836d
	version = 10
[filter "annex"]
	smudge = git-annex smudge -- %f
	clean = git-annex smudge --clean -- %f
	process = git-annex filter-process
[remote "osf-storage"]
	annex-externaltype = osf
	annex-uuid = b1ae33ec-5144-4625-af3e-36dfc4174b1c
	skipFetchAll = true
	annex-cost = 200.0
	annex-availability = GloballyAvailable
[remote "osf"]
	annex-ignore = true
	url = osf://gfp9r
	fetch = +refs/heads/*:refs/remotes/osf/*
	datalad-publish-depends = osf-storage

> datalad push --to osf
copy(ok): .datalad/.gitattributes (dataset)
copy(ok): .datalad/config (dataset)
copy(ok): .gitattributes (dataset)
copy(ok): k.txt (dataset)
publish(ok): . (dataset) [refs/heads/main->osf:refs/heads/main [new branch]]
publish(ok): . (dataset) [refs/heads/git-annex->osf:refs/heads/git-annex [new branch]]

Check online if the result is as expected

https://osf.io/gfp9r/

Yes it is

Clone from published OSF url into different location of the same system

> datalad clone osf://gfp9r cloned_osfbla
[INFO   ] Remote origin uses a protocol not supported by git-annex; setting annex-ignore
install(ok): /Users/jsheunis/Documents/psyinf/Data/cloned_osfbla (dataset)

> cd cloned_osfbla

> cat .git/config
[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true
[remote "origin"]
	url = osf://gfp9r
	fetch = +refs/heads/*:refs/remotes/origin/*
	annex-ignore = true
[branch "main"]
	remote = origin
	merge = refs/heads/main
[annex]
	uuid = d4c036e4-6fad-46e9-8066-cb13f25bcfc5
	version = 10
[filter "annex"]
	smudge = git-annex smudge -- %f
	clean = git-annex smudge --clean -- %f
	process = git-annex filter-process
[remote "osf-storage"]
	annex-externaltype = osf
	annex-uuid = b1ae33ec-5144-4625-af3e-36dfc4174b1c

Here we can see that the publication dependency is missing in the clone. Also, the sibling name is origin and not osf (see datalad siblings call). I don't know if either of these (the missing publication dependency and the changed sibling name) are intentional or by design, or a problem? I couldn't find informative docs about this.

> datalad siblings
.: here(+) [git]
.: osf-storage(+) [osf]
.: origin(-) [osf://gfp9r (git)]

Add the publication dependency explicitly

> datalad siblings -s origin --publish-depends osf-storage configure
[INFO   ] Configure additional publication dependency on "osf-storage"
.: origin(-) [osf://gfp9r (git)]

This works fine, and was confirmed by inspecting .git/config

Add changes in the clone, push to origin

> echo 'kaaskoek' > kk.txt

> datalad save --to-git
add(ok): kk.txt (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)

> datalad push --to origin
publish(error): . (dataset) [refuse to export to osf-storage, because the last known export came from another repo (9d551f14-e086-452b-9ec9-1fb96923836d). Use --force=export to enforce the export anyway.]
publish(ok): . (dataset) [refs/heads/git-annex->origin:refs/heads/git-annex b5a85aa..8250143]
publish(ok): . (dataset) [refs/heads/main->origin:refs/heads/main 3142c1a..1f70404]
action summary:
  publish (error: 1, ok: 2)

This publish(error) is the second part of the issue. If I use the --force=export flag with the push, the push of the additional change succeeds.

Another note: if I don't configure the publication-dependency in the clone, and then save a change, and push the change to origin, the git refs / history is pushed, but not the actual file. This behaviour, or the need to use the --force=export flag (and why) isn't documented anywhere that I could find.

@jsheunis
Copy link
Member Author

jsheunis commented Jun 5, 2024

@datalad/developers is this all expected behaviour? If so, I think it makes sense to improve documentation (docs and docstrings) to make the use of --publish-depends and --force=export after cloning clearer for users. If not, where should we be looking to improve this?

@adswa
Copy link
Member

adswa commented Jul 2, 2024

Just leaving a few quick notes from the office hour:

  • Not updating the .config file with the dependency is expected, as this is a local configuration
  • the behavior is confusing though, and the work around could be added to the export mode usecase, together with the --force parameter as you suggested
  • the update of git refs but not the actual file is expected given the difference between annexed files and git parts of the daataset; in your description only git history is pushed (we think)

@mslw
Copy link

mslw commented Jul 2, 2024

And one more:

  • "refuse to export to osf-storage, because the last known export came from another repo" is probably the standard behavior of git-annex related to export mode (given that you are pushing from a repo with a different annex uuid then the one which was first pushed from), the "export conflicts" section at the end of git annex export docs suggests (although not explicitly) this may be the case

@jsheunis
Copy link
Member Author

jsheunis commented Sep 4, 2024

Thanks for the investigation and info, @adswa and @mslw. So it seems like everything is expected behaviour technically, but not necessarily intuitive for a new user. I agree that updating the export mode use case docs would be useful.

@adswa
Copy link
Member

adswa commented Sep 23, 2024

PR with a doc update to get this out of the BORG queue: #204

@adswa adswa closed this as completed in #204 Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants