Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EXIF properties for media objects #498

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

marlip
Copy link
Contributor

@marlip marlip commented Sep 23, 2024

Adding terms extracted from the EXIF tags of image files and added to (image) file descriptions.

Added terms are:

bitsPerSample - The number of bits per image component.
colorSpace - The color space information tag.
dateTimeDigitized - The date and time when the image was stored as digital data.

Terms taken from https://www.w3.org/2003/12/exif/

@marlip marlip requested a review from sarwik September 23, 2024 11:57
Copy link
Contributor

@sarwik sarwik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jag tycker att det ser bra ut! Bra om @niklasl också kastar ett öga.

@sarwik sarwik requested a review from niklasl September 23, 2024 13:33
@marlip marlip marked this pull request as ready for review September 25, 2024 12:55
@marlip
Copy link
Contributor Author

marlip commented Sep 26, 2024

ping @niklasl du får gärna kika på detta när du har tid, det är en liten pull-request

sdo:domainIncludes :MediaObject ;
owl:equivalentProperty exif:colorSpace .

:dateTimeDigitized a owl:DatatypeProperty ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be the same as the value of :date on a :DigitalReproduction event? I think we need to look at how to harmonize the model around these details. (We may define an owl:propertyChainAxiom for this shorthand. The important thing is to ensure that we don't have different ways of expressing the same information; to avoid making it hard to know where to put and look for the details.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be expressing the same value. Meaning, the datetime is the moment in which the digitized image was created and is unchanged by modifications to the image. If it is the the moment in which the camera shot the picture or if it the moment in which the digital time was created, I cannot say for sure. Seems to be hard to differentiate the two in the modern age. But if a DigitalReproduction event is the moment a digital file is made, then we are talking about the same thing.

Is the term 'dateTimeDigitized' superflouous in that case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good reasoning. In principle yes. Although I can see the event on a more general level of abstraction, lacking the timestamp precision expected for files. They should correlate (which is expensive if one is not logically derived from the other), but may be for different uses.

Also, strictly, the Representation is "immutable"; an edited copy would be a new one (derived from or a version of the previous). If that discipline was possible to maintain, kbc:created would be enough (though we'd need to widen its domain a little).

(OTOH, dateTimeDigitized leaves no room for doubt. It also depends on what consumers would look for and assume given the prescence of either; or worse, both.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input, it took a bit of time to come back to this but now we have checked with our consumers and seeing they are specifically interested in date of digitisation per image file, I think it is safer to introduce the new term than to introduce possible confusion in the existing one that, as you say, is not as precise in its name.
Would you agree with that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found ImageBitDepth in vocab/enums.ttl

"""
:ImageBitDepth a owl:Class;
rdfs:label "Image bit depth"@en, "Bildens bitdjup"@sv;
rdfs:subClassOf :DigitalCharacteristic;
owl:equivalentClass bflc:ImageBitDepth .
"""

I am unsure about the domain of the term and whether I need to add something in order for it to also describe MediaObjects now. Help is appreciated @niklasl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @niklasl :)

@niklasl niklasl requested a review from ebengtsson September 26, 2024 14:15
@niklasl
Copy link
Member

niklasl commented Sep 26, 2024

Looks valuable. Do you have example descriptions where we will use these?

As noted for :dateTimeDigitized, we need to look at if these details have been added using different forms of expression, so we can harmonize it (and ensure we follow the normalized form going forward).

There are some forms from BIBFRAME which may have been used to capture this (e.g. :ImageBitDepth), but we'll have to look at what's actually used in Libris.

(Regardless of the normalized form we need to follow, we may still define these as shorthands (using owl:propertyChainAxiom, and marking the terms as shorthands) for use in mappning to simpler vocabularies (as is done here); so we can project the data as such for consumers who cannot handle all details.)

@marlip
Copy link
Contributor Author

marlip commented Sep 26, 2024

An example of this description would be in a FilePackage where the "includes" List for image files could look as follows:

    "includes": [
        {
            "@id": "https://data.kb.se/dark-package/image1.jp2",
            "@type": "File",
            "fileName": "image1.jp2",
            "encodingFormat": {
                "@id": "https://id.kb.se/encodingFormat/image/jp2",
                "@type": "EncodingFormat",
                "code": "image/jp2",
            },
            "describedBy": {"@id": "https://data.kb.se/dark-package"},
            "contentSize": 10786746,
            "checksum": {"@type": "MD5", "value": "62ea86dcf1a6b8f1d62b2a96bf2e6bc9"},
            "height": "9223",
            "width": "2442",
            "colorSpace": "RBG",
            "bitsPerSample": "8",
            "hasNote": "Detail 5",
            "dateTimeDigitized": "2018-12-11T12:10:05.018+01:00",
        },
        {
            "@id": "https://data.kb.se/dark-package/image2.jp2",
            "@type": "File",
            "fileName": "image2.jp2",
            "encodingFormat": {
                "@id": "https://id.kb.se/encodingFormat/image/jp2",
                "@type": "EncodingFormat",
                "code": "image/jp2",
            },
            "describedBy": {"@id": "https://data.kb.se/dark-package"},
            "contentSize": 10786746,
            "checksum": {"@type": "MD5", "value": "62ea86dcf1a6b8f1d62b2a96bf2e6bc9"},
            "height": "9223",
            "width": "2442",
            "colorSpace": "RBG",
            "bitsPerSample": "8",
            "dateTimeDigitized": "2018-12-11T12:10:05.018+01:00",
        },

If :ImageBitDepth is already used in Libris, I have nothing against conforming to that. We will do as we are told :)

@niklasl
Copy link
Member

niklasl commented Oct 23, 2024

If the data is extracted from the embedded EXIF data as is, that bitsPerSample looks correct (and should be a rational number?), as opposed to ImageBitDepth (which is commonly 8 or 24 bit, but many variants exist).

A quick reading of EXIF seems to indicate one of three common values for colorSpace (1 = "sRGB" , 65535 = "uncalibrated"; 1 possibly for (Adobe?) RGB; and "Undefined" reasonably left unstated).

Is also dateTimeDigitized the exact one from the files (and/or sometimes dateTimeOriginal)?

Given that, I'm leaning on defining all three, as you proposed! (And if there is more EXIF data, just using that namespace (declaring the prefix) to embed what we get as it stands is an option. It's probably never expected to be typed, and is provided for potential machine processing and technical display or filtering; right?)

@marlip
Copy link
Contributor Author

marlip commented Oct 24, 2024

@niklasl
The problem is that exifdata tags are following a schema for TIFF and JPEG images, but for JP2 images there is no schema which means that there the data can look a bit differently and that even the fields we take the data from can differ. So while dateTimeDigitized is the one we take for TIFF and JPEG files, in JP2 the field is ".//photoshop:DateCreated" or ".//xmp:CreateDate". Same goes for colourSpace in JP2 images.

Glymur is a library we will use for metadata extraction from JP2 files and they have internally coded for the following colorSpaces

CMYK = 12
SRGB = 16
GREYSCALE = 17
YCC = 18
E_SRGB = 20
ROMM_RGB = 21

I sent this info to the photographer and am waiting for a confirmation from their side so we do not miss a value.

After yesterday's meeting we think to have landed in the following model making my changes to the definitions repo unneccessary, but do correct me if we missunderstood.

"""
{
"@id": "https://data.kb.se/dark-package/image1.jp2",
"@type": "File",
"fileName": "image1.jp2",
"encodingFormat": {
"@id": "https://id.kb.se/encodingFormat/image/jp2",
"@type": "EncodingFormat",
"code": "image/jp2",
},
"describedBy": {"@id": "https://data.kb.se/dark-package"},
"contentSize": 10786746,
"checksum": {"@type": "MD5", "value": "62ea86dcf1a6b8f1d62b2a96bf2e6bc9"},
"height": "9223",
"width": "2442",
"colorSpace": {
"@id": "https://id.kb.se/something-else/RBG"},
"imageBitDepth": {
"@id": "https://id.kb.se/something-else/8"},
"hasNote": {
"@type": "Note",
"label": "Detail 5"
},
"created": "2018-12-11T12:10:05.018+01:00",
},
"""

@niklasl
Copy link
Member

niklasl commented Oct 25, 2024

A quick check indicates (usage aspects aside) that the list of needed color spaces should be finite. They are relevant in determining bitsPerSample (per pixel or channel in the case of images). Image bit depth can be ambiguous unless we formally define what they mean (24 bit color and 8 bit greyscale may all be 8 bits per channel; 32 bit may be a 24 bit plus 8 bit alpha, etc). I don't know if it's needed to specify each channel depth differently. I suspect we can make simplified normalized identifiers here, perhaps to guide searchers towards a reasonable quality?

(BTW, I got the rational number comment wrong, that is for exif:compressedBitsPerPixel.)

We don't have a dedicated property for kbv:ImageBitDepth; it is (sparingly) used with kbv:digitalCharacteristic. We could define a new hasImageBitDepth subproperty for that (similar to the subproperties of kbv:provisionActivity).

But the amount of "surface properties" we define depends on expected usage. Logically, they're not necessary (general relation and specific type is enough here), but it may facilitate use (search, display, even cataloguing; albeit see next
paragraph), depending on what we expect here.

(I can even see us folding colorspace and bits per channel into a set of ImageBitDepth enumerations complete with colorSpace, if the practical set of values roughly fits in a 7x7 matrix or hierarchy... 🤔)

As a counter-point, we could just "tunnel" EXIF metadata through, and "catalogue" such if it's not provided by the source. This should be fully automated (extracted from images). But that also depends on if its needed; it is already stored in the image ("it knows what it is").

Crucially, what is the expected usage? The answer may be "we don't know, but we don't want to lose the signal value"; then we need to know if that is provided externally or is already embedded in the images. But there should be a perceived need for search and display of what we've got based on these colorspace/bitdepth, facts that would motivate working with such metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants