Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support bagging to a destination other than the source #35

Open
finoradin opened this issue Oct 15, 2014 · 12 comments
Open

Support bagging to a destination other than the source #35

finoradin opened this issue Oct 15, 2014 · 12 comments

Comments

@finoradin
Copy link

Currently the module only allows one to do what the LOC Java library calls "bag in place". It would be very useful to have built-in the ability to specify one or more payloads as the "source" and to then specify a "destination" where the bag containing the payloads will be created.

Minor but important note – the hashes in the manifest should be generated from the source payloads, not the copied files in the bag.

@finoradin
Copy link
Author

bump

@acdha
Copy link
Member

acdha commented Mar 11, 2016

@finoradin Thanks – I'll work on a pull request for this. Strong +1 on checksumming the source files — the way I typically do things like that would be streaming reads at some large block size since that's also good for performance on network filesystems and it's a natural extension to feed each block into the hashers as well.

@acdha acdha self-assigned this Mar 11, 2016
@acdha acdha modified the milestone: 1.6 Mar 15, 2016
@acdha acdha changed the title support bag source / destination Support bagging to a destination other than the source Mar 18, 2016
@acdha acdha modified the milestone: 1.6 Feb 24, 2017
@runderwood
Copy link
Contributor

I'm willing to take this on. Would it be agreeable to add a --destination option, which, when present, tells bagit.py to draw on the bag_dir args as sources and create the new bag in the destination path?

I would, of course, checksum the files as described above.

@acdha
Copy link
Member

acdha commented Jun 19, 2017

@runderwood +1 – that should cover the most common use-cases

@runderwood
Copy link
Contributor

runderwood commented Jun 20, 2017

@acdha Clarification: With multiple sources, should:

  1. ...multiple bags be generated?
  2. ...one bag be generated with the sources merged?
  3. ...one bag be generated with directories for each source?

@johnscancella
Copy link
Contributor

I would vote for choice 3, one bag with each directory for each source

@runderwood
Copy link
Contributor

runderwood commented Jun 20, 2017

@johnscancella I can see that being useful, but I also find it counter-intuitive, since directories in bagit world are usually transmuted into payloads at the root of data/.

EDIT: I also should mention that merging directories seems consistent w/ some of the use cases that have come up for me. But that's also a bit weird, I suppose -- though maybe no weirder than in-place bagging.

@runderwood
Copy link
Contributor

FYI, I have a branch where bagging to a destination (with one source only) seems to work alright.

@johnscancella
Copy link
Contributor

I would be proposing that given the source directories foo bar and ham you would end up with something that looks like this:

├── bag-info.txt
├── bagit.txt
├── data
│   ├── bar
│   ├── foo
│   └── ham
├── manifest-md5.txt
└── tagmanifest-md5.txt

@runderwood
Copy link
Contributor

OK. Sounds good to me. I'll give it a shot.

@runderwood
Copy link
Contributor

@johnscancella Sorry, but does it make more sense to you to have the behavior remain the same with just one source, such that even when providing only one bag_dir arg ham/, the data directory likewise contains one directory with the same name? Something like this:

├── bagit.txt
├── data
│   └── ham
├── manifest-md5.txt
└── tagmanifest-md5.txt

...or should, in that case, the contents of ham be dropped into the root of data?

@johnscancella
Copy link
Contributor

I would go with ham/ being dropped into the data directory since that is how it currently behaves.
i.e.

.
└── ham
    ├── bar
    └── foo

becomes

.
├── bag-info.txt
├── bagit.txt
├── data
│   └── ham
│       ├── bar
│       └── foo
├── manifest-sha256.txt
├── manifest-sha512.txt
├── tagmanifest-sha256.txt
└── tagmanifest-sha512.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants