Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export fails on 410 response caused by tombstone #172

Open
gdelisle opened this issue Jan 22, 2025 · 11 comments · May be fixed by #173
Open

Export fails on 410 response caused by tombstone #172

gdelisle opened this issue Jan 22, 2025 · 11 comments · May be fixed by #173

Comments

@gdelisle
Copy link

gdelisle commented Jan 22, 2025

I am attempting to export a fcrepo repository and the job fails with

java.lang.RuntimeException: Export operation failed: unexpected status 410 for http://<url-redacted>:8080/fedora/rest/prod/zs/26/03/66/zs2603669
	at org.fcrepo.importexport.common.TransferProcess.checkValidResponse(TransferProcess.java:175)
	at org.fcrepo.importexport.exporter.Exporter.filterBinaryReferences(Exporter.java:575)
	at org.fcrepo.importexport.exporter.Exporter.exportRdf(Exporter.java:460)
	at org.fcrepo.importexport.exporter.Exporter.export(Exporter.java:361)
	at org.fcrepo.importexport.exporter.Exporter.run(Exporter.java:300)
	at org.fcrepo.importexport.ImportExportDriver.run(ImportExportDriver.java:58)
	at org.fcrepo.importexport.ImportExportDriver.main(ImportExportDriver.java:47)

The resource it points to is a tombstone for a deleted item, which is a normal thing to encounter, and fcrepo gives a HTTP 410 Gone response, which is appropriate. However, the export utility quits at that point and provides no output. How can I get the export to not consider a 410 response as a fatal error and fail to export the repo?

@whikloj
Copy link
Contributor

whikloj commented Jan 22, 2025

It does appear that we don't expect to encounter resources that are 410 Gone. Fedora doesn't generate links to tombstones so I am assuming you have a manually added predicate pointing to this resource?

@gdelisle
Copy link
Author

Possibly? If so, I am not able to remove it, and there may be others that I haven't encountered yet. I just need the export job not to fail when it hits one. Is there some "ignore errors" option I can pass to it?

@whikloj
Copy link
Contributor

whikloj commented Jan 22, 2025

No, as we figured you would want to ensure the export is complete. I'm remembering that this tool only follows the ldp:contains predicate so once the resource is deleted the ldp:contains should not return tombstones. Are you passing in a --resourcesFile which might have the URI of the deleted resource in it?

@gdelisle
Copy link
Author

I am not passing in a --resourcesFile at all. Here is my generated .yml file:

legacyMode: false
acls: false
overwriteTombstones: false
auditLog: false
resource: http://<url-redacted>:8080/fedora/rest/prod
inbound: false
bag-profile: default
membership: false
dir: /cul/backup/fcrepo-import-export/esmis-prod
rdfLang: text/turtle
mode: export
external: false
predicates: http://www.w3.org/ns/ldp#contains
versions: false
bag-serialization: tar
bag-config: /cul/backup/bagit-config.yml
binaries: false

My command looks like this:

java -Dfcrepo.log.importexport=WARN -Xms4096M -Xmx4096M -jar target/fcrepo-import-export-1.1.0-SNAPSHOT.jar --mode export --resource http://<url-redacted>:8080/fedora/rest/prod --dir /cul/backup/fcrepo-import-export/esmis-prod --repositoryRoot http://<ulr-redacted>:8080/fedora/rest/prod --bag-profile default --bag-serialization tar --bag-config /cul/backup/bagit-config.yml -w ./export-prod.yml

A complete export would be great, of course, but I'd rather have an export that was 99.9999% complete than nothing at all. And it seems strange that no one has ever tried to export a repository containing tombstones before me. Aren't they a fairly common occurrence?

@whikloj
Copy link
Contributor

whikloj commented Jan 22, 2025

They are somewhat common depending on how much you delete resources.

The concern I have is Fedora resources do not show a ldp:contains property to a tombstone resource.

So if I have http://localhost:8080/fcrepo/rest/parent with a child http://localhost:8080/fcrepo/rest/parent/child. I would see a

<http://localhost:8080/fcrepo/rest/parent> <http://www.w3.org/ns/ldp#contains> <http://localhost:8080/fcrepo/rest/parent/child>

But if I delete http://localhost:8080/fcrepo/rest/parent/child then that triple disappears and so an export would not try to retrieve that object.

What version of Fedora are you running this export against?

@whikloj
Copy link
Contributor

whikloj commented Jan 22, 2025

I just did a simple test of a parent resource with a child that was deleted, like above, using your command line args and it did not try to retrieve it the tombstone and the export completed successfully.

I ran that with version 1.1.0 of the fcrepo-import-export tool, but as there is now a 1.2.0 version maybe try that one just to see if there is any difference.

https://github.com/fcrepo-exts/fcrepo-import-export/releases/tag/fcrepo-import-export-1.2.0

Also I noticed that you had set --repositoryRoot http://<ulr-redacted>:8080/fedora/rest/prod, I think this should be --repositoryRoot http://<ulr-redacted>:8080/fedora/rest and the fact that you specify --resource http://<ulr-redacted>:8080/fedora/rest/prod should mean only this resource and its children are exported. Probably not related but just in case

@gdelisle
Copy link
Author

First, thanks for taking your time to help. I do appreciate it.

I've tried this export job twice now, once with --repositoryRoot set and once without. Both were using version 1.1.0 of the fcrepo-import-export tool, and the repository is version 4.7.5 of Fedora. The first run quit after 792 minutes with the error in my original post, and the second quit with the same error but referencing a different object, after 251 minutes. I will try again tomorrow with version 1.2.0 but I have to say, I don't see anything in the diff there that would handle a 410 response differently than in version 1.1.0. Is there some way to tell the tool to continue through 410 errors, or to tell my repository to handle tombstones with a different response code?

@whikloj
Copy link
Contributor

whikloj commented Jan 23, 2025

No worries @gdelisle.

Are you running an export of a repository that is undergoing writes?

@gdelisle
Copy link
Author

gdelisle commented Jan 23, 2025

Yes. This is a production system and new data is getting written. Could that be the issue?
Update: I ran the job again swapping the values of --resource and --repositoryRoot as you suggested, and this time it ran into the 410 error after 1052 minutes with yet another different object. I should be able to run this job over the weekend with less of a chance that it will be written to during the export.

@whikloj whikloj linked a pull request Jan 24, 2025 that will close this issue
@whikloj
Copy link
Contributor

whikloj commented Jan 24, 2025

I have created a PR (linked above) with some work to allow you to continue over a 410 error. Even if you don't enable it, it should still print the URI that is giving the 410 at least. That might help figure out how this tombstone is appearing at all. Give it a try and let me know if it works for you.

@whikloj
Copy link
Contributor

whikloj commented Jan 28, 2025

@gdelisle yes, if the repository is experiencing changes as the export is running then the chance that the parent resource points to a child, but that child is deleted before the export is complete would explain why this is happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants