-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged images are being incorrectly included #2460
Comments
The Sierra-based record https://wellcomecollection.org/works/kqy8b2yh merges as expected with the corresponding METS and Miro records. both h6nmh8c2 and tmc968hm redirect to kqy8b2yh. |
The same pattern is true of bvxsrk7s, (Miro: efdghac2, Mets: zy2xuu9x) |
The Selected Images - "View n Images" section is driven by the content of the images property, which in bvxsrk7s is
Similarly, this list is one entry long in kqy8b2yh.
This list is derived from the content of imageData after the merger, in both cases, the content of
So I suspect the problem is (in part) that the merger is incorrectly adding Images to the Works. For example when the merger encounters some images that are already in essence dealt with by being in the IIIF Manifest, or is otherwise set to be ignored. In the case of Miro L0016704 (efdghac2)(vzc9ptad), this image is not expected to be seen at all, but is still being added as an Image in the Work. |
A related issue is that images in the pipeline do not have State in the same way Works do. However, I do not think they are the same. If we resolve 5470 before this issue, I think we'll still end up with the wrong number in the "View n Images" link, but it will link to an empty search (or at least one that returns a different number of results to n). If we resolve this issue first, then users might still be able to find "suppressed" images with a search (e.g. q=bvxsrk7s might still return 4 records), but the link from the Work will be correct (or correctly absent) |
Initially, I wondered if this might have been related to my change that fixed licence mismatches, but it doesn't look like it. The licences involved in kqy8b2yh are all the same, so that wouldn't have made a difference. |
Oho! ImageDataRule is not being tested in the same manner in which it is used. In merger, it fetches the |
The only Which only covers the scenario where the Sierra item has digmiro, but does not have anything to say about what the resulting Sierra target ends up looking like. |
However, what is happening in kqy8b2yh is that imageData contains kje5zhrd (the METS image), which is not incorrect according to that rule. So I suspect there may be a missing rule. |
When creating a merged Work from a target and sources, the same sourceImageData is used to populate the Images for redirection and the imageData on the target Work. I think this is correct, an entry in imageDataWithSources does require that the corresponding Work has that image, in order for the redirection to actually work. So again, this points to the rule generating it being wrong. I fear this might connect this issue even more tightly with wellcomecollection/platform#5470, If we fix this one without being able to change the state of an image, then what happens to the existing image that currently redirects? |
Oh. I was working on creating an appropriate set of tests that demonstrate this issue in isolation. It reveals an inconsistency depending on the number of items in the Sierra Work. Given a Sierra work with 0, 1 or more Physical items, merging with a Mets and a Miro item with one item each, this table shows the counts of the items in the target work on output.
|
Actually, it looks like that's not a problem, when there is one item in the output, it gets two locations (the physical from Sierra and the digital from METS) |
I'm coming to the conclusion that this might not necessarily be a digmiro issue, but actually that the merger should not insert both imageData and items for the same source into the target. To find other examples, I ran this query:
which yielded plenty of results, including a98wfchg, which "contains 1 image", and allows you to view 1 image, which eventually reaches xpzujv3x. No digmiro involved, but still two routes to reach the same picture. There are a number of tests that explicitly check that a merged source does appear in both |
This also suggests that the two problematic records that started this whole investigation might not be demonstrating the same issue. The bvxsrk7s should not be showing the monochrome image at all. |
Yes! It is two separate issues. (This also explains why I had such trouble coming up with a minimal example to test with. Every time I tried just including the bits I thought relevant, it all worked correctly!) In bvxsrk7s, there are two Excitingly, the fix for this (not including new tests) has a Levenshtein distance of 4. |
The problem evident in kqy8b2yh (and a98wfchg) is also evident in bvxsrk7s. METS images that are present in items are also present in images, and this is unwanted. I think a bit of a discussion is required to ensure that fixing that doesn't break anything, but I don't think it will. |
I note that the same behaviour is present when merging into CALM. Here is a query that returns CALM records with items from METS and images
https://wellcomecollection.org/works/kh8f3nzt contains two images and has 29 selected images. Oddly, this record in Elasticsearch has 42 locations (33 Digital) |
I suspect that CALM thing is actually a completely different, but related issue, but that is also supplemented by this one. |
https://wellcome.slack.com/archives/C8X9YKM5X/p1695906279857879?thread_ts=1695906112.578439&cid=C8X9YKM5X
https://wellcomecollection.org/works/kqy8b2yh is correctly showing "Contains 1 image", but incorrectly showing "View 1 image" in the Selected Images section.
Similarly, https://wellcomecollection.org/works/bvxsrk7s is correctly showing "Contains 3 images" but incorrectly showing "View 4 images".
There should be no images in the Selected Images - View X Images for either of these.
The text was updated successfully, but these errors were encountered: