-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: send a BagIt bag to Archivematica for preservation #805
Comments
@jraddaoui I agree it would be better to allow different transfer types to be sent to Archivematica, but in the current processing workflow the bundle activity will convert an incoming Bag transfer into a standard transfer which is then zipped and sent to AM (or a3m). Allowing a Bag transfer to be sent to Archivematica will require removing the bundle activity from the AM workflow or updating it to support multiple output transfer types. |
Note: the conversion of Bags -> standard transfer is a decision that was made for the a3m preservation engine, and I decided to retain this convention when adding Archivematica as a preservation engine option. |
I'll create another issue talking about that bundle activity, this is all looking forward to have an extensible pre-processing option and it will help if we have a child workflow for those activities later on. Then we should discuss where should the bundle activity be located (if needed), looking at the conceptual design bundling seems like a responsibility for pre-processing. And in the SFA fork we are skipping the bundle activity right now. |
@jraddaoui okay, but I don't see any point in making the AM transfer type configurable without addressing bundle activity - Enduro will always deliver a zipped standard transfer to AM. In the SFA case you've already modified the Enduro code, so just changing the transfer type in the code is a simpler solution then adding a config variable. |
Note from today's meeting: @djjuhasz, @jraddaoui, and @sallain to review this issue and decide what pieces of work need to be completed to support SFA and MoMA. |
I have a proposal for how to handle the SIP format delivered to the preservation system by Enduro. My proposal is based on the supposition that a BagIt Bag is the best SIP format for Enduro to send to the preservation system, but recognizes that a3m currently can't process Bagged SIPs. I believe a BagIt Bag should be the preferred SIP format because:
My proposed solution for the Enduro SIP type
@sallain @jraddaoui what do you think? If you have a counter-proposal or any suggested modifications to my proposal, I'd love to hear your ideas. |
I think that this is a good idea for the following reasons:
I also completely agree that this should all occur in pre-processing. Here are a few things to consider:
I'm sure that there are other considerations as well, but for the most part I think that this is a solid proposal. |
I would like to outline one of the considerations that is missing here. That consideration is that our current way of validating bags uses a very early, and not well tested bagit library in go. see https://github.com/nyudlts/go-bagit and nyudlts/go-bagit#7 (comment). It would require some work to make this fully featured and complaint bag validator according to spec. |
@sallain I agree that we should avoid rebagging a transfer that is submitted as a Bag and that adding Bag processing to a3m ASAP would avoid having to unbag the bag we just bagged. :P @Diogenesoftoronto yes, good points about the https://github.com/nyudlts/go-bagit library. I was assuming we would use https://github.com/LibraryOfCongress/bagit-python for Bag validation, but it being a Python tool definitely makes it more challenging to integrate than a native Go library. It also looks like bagit-python is not being actively maintained, and requires Python 2 which was sunset in January 2020. |
I was discussing this with @fiver-watson and he pointed out that there may be circumstances where a user submits a bag, but other activities in the pre-processing application mean that the original bag is invalid (ex. transforming or adding metadata files), meaning that the bag WOULD have to be rebagged. Just something to consider. |
@sallain the workflow diagram looks good to me. 👍 |
Fixes #805 - Change the package type to "zipped bag" when starting a transfer via the Archivematica API - Bag the PIP before sending it to Archivematica, if it's not already a bag - Add "TransferSourcePath" config value to specify the API path to the Transfer Source directory where PIPs are uploaded
Fixes #805 - Change the package type to "zipped bag" when starting a transfer via the Archivematica API - Bag the PIP before sending it to Archivematica (if it's not already a bag) - Add a "TransferSourcePath" config value to specify the API path to the Transfer Source directory where PIPs are uploaded
Fixes #805 - Change the package type to "zipped bag" when starting a transfer via the Archivematica API - Bag the PIP before sending it to Archivematica (if it's not already a bag) - Add a "TransferSourcePath" config value to specify the API path to the Transfer Source directory where PIPs are uploaded
Fixes #805 - Change the package type to "zipped bag" when starting a transfer via the Archivematica API - Bag the PIP before sending it to Archivematica (if it's not already a bag) - Add a "TransferSourcePath" config value to specify the API path to the Transfer Source directory where PIPs are uploaded
Fixes #805 - Change the package type to "zipped bag" when starting a transfer via the Archivematica API - Bag the PIP before sending it to Archivematica (if it's not already a bag) - Add a "TransferSourcePath" config value to specify the API path to the Transfer Source directory where PIPs are uploaded
Fixes #805 - Move the bundle activity to the a3m branch of the processing workflow - Change the package type to "zipped bag" when starting a transfer via the Archivematica API - Bag the PIP before sending it to Archivematica (if it's not already a bag) - Add a "TransferSourcePath" config value to specify the API path to the Transfer Source directory where PIPs are uploaded
Is your feature request related to a problem? Please describe.
Currently, all transfers started in Archivematica use the
zipfile
transfer type:https://github.com/artefactual-sdps/enduro/blob/main/internal/am/start_transfer.go#L49
This is not an issue in the current implementation where the transfer is always bundled as a ZIP file. However, it limits the extensibility of the workflow; thinking in the particular case of the SFA fork, where the transfer is transformed into a zipped bag in the pre-processing activities:
https://github.com/artefactual-sdps/enduro-sfa/pull/4/files#diff-ae98fc39bbc9e053ec8d1d2ed56184cd9ba7ea280d3e72975617da81c3cfadd3
Describe the solution you'd like
Provide a configuration setting like the one used for the processing configuration:
https://github.com/artefactual-sdps/enduro/blob/main/enduro.toml#L99
Describe alternatives you've considered
Allow changing the transfer type value in workflow. Thinking about the possibility of using child workflows to manage that extensibility, another option could be to indicate the transfer type in the child workflow result.
The text was updated successfully, but these errors were encountered: