Creating identifiers #375

gerontakos · 2022-08-18T22:49:08Z

gerontakos
Aug 18, 2022
Maintainer

When creating values for identifiers, sometimes we add a qualifier to the identifier string. For example:

(CODEN) 123456

0-123-46744-6 (hardcover)

First question:

Should there always be a space between the qualifier and the identifier string?

Second question:

Can we determine the order; should the qualifier always follow the identifier string?

pan-zhuo · 2022-08-19T02:29:46Z

pan-zhuo
Aug 19, 2022
Maintainer

Some RDA options for manifestation identifiers related to ordering:

OPTION (https://access.rdatoolkit.org/en-US_ala-95f6a60f-3d2b-32d8-9486-cf810708d4ba/div_gq5_w5s_dfb)
Record a value that includes a trade name or the name of an agent who is responsible for assigning the identifier followed by the identifier.

OPTION (https://access.rdatoolkit.org/en-US_ala-95f6a60f-3d2b-32d8-9486-cf810708d4ba/div_trv_s1t_dfb)
Record a value that includes the identifier followed by a type of binding or format, if considered important for identification.

OPTION (https://access.rdatoolkit.org/en-US_ala-95f6a60f-3d2b-32d8-9486-cf810708d4ba/div_b4f_dxs_dfb)
Record each identifier for manifestation as a whole and any identifiers for individual parts followed by a designation of the part to which it applies.

No similar instructions for work/expression (and other RDA entities) identifiers.

0 replies

pan-zhuo · 2023-03-16T01:15:19Z

pan-zhuo
Mar 16, 2023
Maintainer

On March 15, 2023 meeting, RDF datatypes were proposed to record the specific type of identifier system (ISBN, DOI, etc.).

Similar approaches were brought up in a Sinopia MAP meeting. We were not heading in this direction because Sinopia didn't seem to support displaying or choosing datatypes for literals. (Might need further confirmation from @briesenberg07.)

An alternative we talked about was to define new properties for each identifier system as subproperties of more general RDA identifier properties and publish them as UW refinements/extensions for RDA.

An example would be:
"has ISBN" is subproperty of "has identifier for manifestation".
<uwlib:ISBN> rdfs:subPropertyOf rdam:P30004 .

Reasons for this refinement approach:

Precedence in RDA to proliferate properties
"has ISSN" is subproperty of "has identifier for work".
rdaw:P10366 rdfs:subPropertyOf rdaw:P10002
Further refinement is possible for things like status of identifiers
"has canceled/invalid ISBN" (020 $z) is a subproperty of "has ISBN"

Questions:

@GordonDunsire mentioned "ISBN-10"/"ISBN-13". Could potentially be covered by more granular refinements?

Also the different editions of DDC. (082 $2)

Would love feedback on this @CECSpecialistI @gerontakos @AdamSchiff @SitaKB @JianPLee @junghaelee @szapoun @lake44me

3 replies

GordonDunsire Mar 16, 2023
Maintainer

The RDA property "has ISSN" is a pre-3R legacy property that is retained because it does not break LRM/RDA semantics. 3R would not have added it if it didn't already exist, because the preferred approach is data provenance. That's why 3R did not add "has ISBN", etc. That is, this is not "RDA precedence".

The general data provenance approach is to reify a "has identifier" statement and give the value vocabulary from which the identifier is taken (e.g. ISBN Register). This covers all sitations for now and in the future.

Adding local subproperties for specific value vocabularies is an alternative approach that may be suitable for special communities. It is not usually suitable for general communities because a local subproperty for every possible identifier value vocabulary is expensive to develop and maintain. We are sure that there are many, many identifier value vocabularies that are in use around the world.

For the "MARC2RDA" community, the number of "named" identifier schemes is limited, so it is feasible to take the subproperty approach.

If MARC2RDA transformed data is re-used by another community, etc., the subproperty can be easily collapsed into the "has identifier" property if it wants to exclude the local extension. The data provenance can also be automatically assigned.

Note that the same is true in reverse: data that uses data provenance for the source can be automatically transformed into a local subproperty.

gerontakos Mar 16, 2023
Maintainer Author

I was the person who recommended following Dodds/Davis Chapter 3 "Custom Datatype." I'm wondering why Sinopia doesn't support datatypes -- not even XML Schema datatypes. I mean, I'm truly wondering if there is a reason. I think it's common to find that custom datatypes aren't "supported" in general, but custom datatypes seem to be fine with RDF (if they're compatible the XML Schema datatype requirements, as I believe the entries in the LC "Subjects Schemes" are compatible with XML Schema, if I understand lexical and value spaces). Similarly, I believe RDA will accommodate custom-typed literals as the values of datatype properties -- am I correct? RDA will consider the values structured (in the case of subject schemes) or identifiers (like in the case Zhuo described), in which cases a datatype (at least for the structured literal) seems, what, "appropriate." In addition, RDA datatype properties are not instances of owl:DatatypeProperty (they are instances of rdf:Property), so there are not OWL considerations (like a mismatch in lexical spaces, for example). Finally, I don't think we should create a solution for Source Vocabularies based on the limitations of Sinopia. Either a literal with a custom datatype is a good solution or it's not. But I'm writing this reply mainly because I don't favor custom properties or statement reification if a simpler and generally effective solution (like, hopefully, custom datatypes!) is an option.

CECSpecialistI Apr 18, 2023
Maintainer

I like the datatype approach and honestly think that users in Sinopia haven't pushed for datatypes, and that's the reason it's not an option. I haven't looked through GitHub for their discussions on it, but I'd bet that if we opened a ticket and explained why we wanted to use custom datatypes they could make it happen. They made the property drop-down lists for us, after all.

I don't see anything wrong with reifying has identifier statements to give a source value vocabulary as Gordon suggests, or with creating local extension subproperties for "has identifier" as Zhuo points out since we are, as Gordon mentions, working with a reasonably limited number of source vocabularies.

Which of these approaches is the least labor-intensive? Which will make the most sense to RDA and MARC21 users? If we're using datatypes for $2, should we follow the same approach for identifiers for consistency?

pan-zhuo · 2023-03-17T02:12:30Z

pan-zhuo
Mar 17, 2023
Maintainer

Thanks @gerontakos and @GordonDunsire for weighing in.

A problem is that there are qualifications you can add to an identifier.

"0123467446"
"(Canceled/invalid) 0123467446"
"0123467446 (Random House ; paperback)"
"0123467446 (v. 1)"

Surely these cannot be the same datatype. Or can they?
Can a datatype IRI be associated with an SES? Certainly not LOC IRIs.

And wouldn't it be better to record an identifier and its qualifications separately?
(Okay, maybe I'm leaning towards reification now.)

Reification

For identifiers (NOT subjects!), are we overlooking the 'built-in' reification in RDA, i.e. Nomens?

<ex:Man> rdamo:P30004[has identifier for manifestation] <ex:Nomen> .
<ex:Nomen> rdand:P80068[has nomen string] "0123467446" .
<ex:Nomen> rdan:P80069[has scheme of nomen] <http://id.loc.gov/vocabulary/identifiers/isbn> .
<ex:Nomen> rdan:P80078[has category of nomen] <http://id.loc.gov/vocabulary/mstatus/cancinv> .
<ex:Nomen> rdand:P80071[has note on nomen] "Random House ; paperback" .

Unsure if these are correct uses of Nomen properties, but look better than all clumped together to me!

Definitely cannot be used with subjects since they are not RDA entities!

UW extensions

I only recently noticed UW RDA extensions since started working on Sinopia MAPs, and I know they are pre-3R, but we already have properties like these:

https://doi.org/10.6069/uwlib.55.d.4#hasLcClassificationPartA
https://doi.org/10.6069/uwlib.55.d.4#hasNlmClassificationPartB
https://doi.org/10.6069/uwlib.55.d.4#hasLcGeographicClassification
https://doi.org/10.6069/uwlib.55.d.4#hasSuDocClassificationNumber

Don't know about the design choices back then, but it seems that we went for properties rather than datatypes.

Of course, custom properties are unsustainable and we may have changed our minds since then...

Sinopia

I have not seen an option to list datatypes for a literal as a dropdown/checkbox in Sinopia, only language tags. (Correct me if I'm wrong @briesenberg07.)
Maybe because BIBFRAME prefers proliferating Classes like bf2:Isbn (and blank nodes) so there are no use cases for datatypes.

I agree that we shouldn't create a solution for Source Vocabularies based on the limitations of Sinopia, but still, I would prefer a uniform approach.

5 replies

GordonDunsire Mar 17, 2023
Maintainer

Re Reification:

@pan-zhuo is correct about data provenance for nomen strings, including identifiers. A nomen is already a reified statement that follows the general pattern entity has-appellation "nomen string".

The only statement in the example that I will quibble with is "has note on nomen".

The "has type of binding" property was added to take of ISBN qualifiers. Strictly speaking, paperback and hardback versions are distinct manifestations, so original RDA cataloguers are expected to say:

ex:Man rdamo:P30309{has type of binding] rdatb:1001[perfect binding] .

The publisher is assigned to the manifestation, so:

ex:Man rdamd:P30176[has name of publisher] "Random House" .

This is better than the note: ex:Man is already included in the reified nomen. Plus a cataloguer can use rdamo:P30420[has publisher corporate body] with a value taken from an authority file of "providers".

However, it may be too difficult to parse this out of the MARC 21 data, so the note may have to suffice.

Note also that other typical ISBN qualifiers, such as sub-unit designation ("v.1") are also normally given as specific manifestation statements rather than nomen statements:

ex:Man rdamd:P30014[has numbering within sequence] "v.1" .

lake44me Apr 18, 2023
Maintainer

@pan-zhuo and @GordonDunsire I like the minting URI for nomen approach, I think it would work fine for 088, the example we're looking at, but I'm starting a survey of the other identifier tags, at least ones in 01x and 02x, to see how many would need that reified modeling to capture all the data and how many could get by with a custom datatype. For example:

MARC identifier tag	Single Custom Datatype could capture all data?	Reified identifier nomen could capture all data?
010 LCCN	Yes (values of subfields as labeled)	Yes
013 Patent Control Information	No	Not sure - complex - come back to later
015 National Bibliography Number	No; multiple values to capture	Yes, $q "qualifying information" and $z Canceled/invalid

Darn, Markdown table didn't do like I wanted
... but, do you think doing this is useful? I may not have time to glance through them all before our meeting but will try. I think the choice comes down to those two approaches, and depending on the numbers, we may choose to be consistent with one.
@CECSpecialistI

lake44me Apr 18, 2023
Maintainer

And by the way, there are a lot of identifier "canceled/invalid" designations as either indictors or subfields. I know @GordonDunsire wants to throw them out, but the reality is, there are cases where they need to be searchable or at least accessible to programmers/catalogers because... it shows a history and could be useful for matching/verification purposes. It's difficult to divine WHY the number is labeled canceled/invalid without having more context (or the item in hand) and it probably varies depending on what the identifier is, but it could indicate a relationship to another manifestation, or a publisher error, or something else.

CECSpecialistI Apr 18, 2023
Maintainer

Possibly unpopular opinion: I think we should keep identifiers marked canceled/invalid. They are often qualified by something that makes it clear to human readers why it is included (identifier is for a different manifestation, made clear by $q (print), for instance) and are also often added because they appear on the manifestation as a result of a publisher error, as Laura points out. In my cataloging work, I often add such canceled/invalid identifiers when I think they will aid in discovery/selection. If we can retain these and express clearly that they are canceled/invalid, I vote we keep them.

JianPLee Apr 18, 2023
Maintainer

I agree with Laura and Crystal to keep canceled/invalid identifier. Publisher error is not uncommon for Chinese publications. I've seen ISBN numbers completely different from how it appears on the label and after scan with a barcode reader a different number comes up. In situations like this, I record the numbers on the label as the invalid one.

Also I have question about this statement: ex:Man rdamd:P30014[has numbering within sequence] "v.1". Can we tell which ISBN relates to which volume if we map it this way?

AdamSchiff · 2023-04-18T20:26:39Z

AdamSchiff
Apr 18, 2023
Maintainer

Crystal wrote: “I often add such canceled/invalid identifiers when I think they will aid in discovery/selection.” This isn’t a choice. If an ISBN appears on a resource it is required to record it. Adam L. Schiff Principal Cataloger University of Washington Libraries Box 352900 Seattle, WA 98195-2900 aschiff @ uw.edu

…

________________________________ From: Crystal Yragui (Clements) ***@***.***> Sent: Tuesday, April 18, 2023 8:13:10 AM To: uwlib-cams/MARC2RDA ***@***.***> Cc: Adam L Schiff ***@***.***>; Mention ***@***.***> Subject: Re: [uwlib-cams/MARC2RDA] Creating identifiers (Discussion #375) Possibly unpopular opinion: I think we should keep identifiers marked canceled/invalid. They are often qualified by something that makes it clear to human readers why it is included (identifier is for a different manifestation, made clear by $q (print), for instance) and are also often added because they appear on the manifestation as a result of a publisher error, as Laura points out. In my cataloging work, I often add such canceled/invalid identifiers when I think they will aid in discovery/selection. If we can retain these and express clearly that they are canceled/invalid, I vote we keep them. — Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/uwlib-cams/MARC2RDA/discussions/375*discussioncomment-5650411__;Iw!!K-Hz7m0Vt54!mfbUORV0-__NDn6i2EoW4pgQsF06Iil8Cnf-NmvojZsQnJ47JeYFtgM91vcBkWqeJxSz7ABqL7FulE3An-sEhdo$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ADFBVB7GJ7BS4Y724QGI723XB2VQNANCNFSM566ZKGZA__;!!K-Hz7m0Vt54!mfbUORV0-__NDn6i2EoW4pgQsF06Iil8Cnf-NmvojZsQnJ47JeYFtgM91vcBkWqeJxSz7ABqL7FulE3A13BUnwU$>. You are receiving this because you were mentioned.Message ID: ***@***.***>

1 reply

CECSpecialistI Apr 18, 2023
Maintainer

Are ISBN's for other manifestations required as well? I always record if I know about them but always understood that as optional. For example, if a physical book has an incorrect ISBN and the eBook version has an ISBN, I add them both but thought the eBook version was optional to add while the one that appears on the resource is not optional.

AdamSchiff · 2023-04-19T21:05:49Z

AdamSchiff
Apr 19, 2023
Maintainer

It doesn't appear that recording all ISBN's is required, but it is standard cataloging practice to record all that appear on a manifestation being cataloged, even those that are for a different format (print vs. ebook). 2.15.1.7: If the manifestation has more than one identifier of the same type, record a brief qualification after the identifier, if considered important for identification. LC policy (note there isn't a PCC policy) says in LC-PCC PS 2.15.1.7: LC practice: When transcribing multiple ISBNs, transcribe first the number that is applicable to the manifestation being described; transcribe other numbers in the order presented, with appropriate qualification to distinguish. Record ISBNs in $z (Canceled/invalid) of MARC field 020[https://original.rdatoolkit.org/images/externallink.png]<https://desktop.loc.gov/saved/Mabibl_020__z> if they clearly represent a different manifestation from the resource being cataloged and would require a separate record (e.g., an ISBN for the large print version, e-book, or teacher’s manual on the record for a regular trade publication). If separate records would not be made (e.g., most cases where ISBNs are given for both the hardback and paperback simultaneously), or in cases of doubt, record the ISBNs in $a (International Standard Book Number) of MARC field 020[https://original.rdatoolkit.org/images/externallink.png]<https://desktop.loc.gov/saved/Maauth_020__a>. Adam L. Schiff Principal Cataloger University of Washington Libraries Box 352900 Seattle, WA 98195-2900 aschiff @ uw.edu

…

________________________________ From: Crystal Yragui (Clements) ***@***.***> Sent: Tuesday, April 18, 2023 2:45:21 PM To: uwlib-cams/MARC2RDA ***@***.***> Cc: Adam L Schiff ***@***.***>; Mention ***@***.***> Subject: Re: [uwlib-cams/MARC2RDA] Creating identifiers (Discussion #375) Are ISBN's for other manifestations required as well? I always record if I know about them but always understood that as optional. For example, if a physical book has an incorrect ISBN and the eBook version has an ISBN, I add them both but thought the eBook version was optional to add while the one that appears on the resource is not optional. — Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/uwlib-cams/MARC2RDA/discussions/375*discussioncomment-5654091__;Iw!!K-Hz7m0Vt54!nX_XzLyfOIgXrKZjp-ExEx20I8lRv_UdvOtPTfz2oY0MzXf4tBqWHeBOBiOH_hkZa5bU5lfYmj7HE5f30nKTmBI$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ADFBVB35Z64L2B5WOIEYMCDXB4DPDANCNFSM566ZKGZA__;!!K-Hz7m0Vt54!nX_XzLyfOIgXrKZjp-ExEx20I8lRv_UdvOtPTfz2oY0MzXf4tBqWHeBOBiOH_hkZa5bU5lfYmj7HE5f3DOBxAzY$>. You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

GordonDunsire · 2023-04-21T09:08:08Z

GordonDunsire
Apr 21, 2023
Maintainer

The self-reification of a nomen provides advantages in simplifying data provenance, but must be used with caution.

Suppose we have a manifestation that has two ISBNs and an ISSN printed on the verso of the title page (e.g. the current consolidated ISBD:

"ISBN 978-3-11-026379-4
e-ISBN 978-3-11-026380-0
ISSN 1868-8438"

Official RDA records this data in two places: in a manifestation statement that reflects how the manifestation describes itself, and as an identifier for the manifestation:

ex:m1 rdamd:P30286 "ISBN 978-3-11-026379-4, e-ISBN 978-3-11-026380-0, ISSN 1868-8438". // This uses the normalized transcription option for adding punctuation (commas) for clarity.

ex:m1 rdamd:P30004 "9783110263794" . // For the printed volume (in a hardback binding)

The e-ISBN is not an identifer for the manifestation of the printed volume; the ISSN is not an identifier for the manifestation of the issue of the series (printed or e-book). The distinction is already apparent in the manifestation statement, so there is no need to add a note, etc.

Note that the identifier is normalized by removing the hyphens; "ISBN" is not part of the identifier. (On the other hand, "ISSN" is part of the identifier when it is recorded.)

An IRI for the ISBN nomen string is the URN, so we can also state:

ex:m1 rdamo:P30004 URN:ISBN:978-3-11-026379-4 .

There is no guarantee that this IRI will de-reference, or what it will de-reference to (e.g. metadata about the manifestation (what we want), a purchase order form, or a list of forthcoming titles, etc.). But that is not strictly necessary; we can add our own statements:

URN:ISBN:978-3-11-026379-4 rdand:P80068 "9783110263794" . // has nomen string
URN:ISBN:978-3-11-026379-4 rdano:P80048 ex:m1 . // is identifier for manifestation of
URN:ISBN:978-3-11-026379-4 rdand:P80073 "The International ISBN Agency" . // is assigned by agent

If we try and use the LC MARC 21 approach while conforming to RDA/LRM and treating the hardback and e-book as distinct manifestations, we want to say something like:

URN:ISBN:978-3-11-026380-0 rdand:P80168 "Invalid". // has status of identification

Meanwhile, another agency creates metadata for the e-book, so they say:

URN:ISBN:978-3-11-026379-4 rdand:P80168 "Invalid". // has status of identification

The problem is that the nomen IRIs refer to two distinct manifestations; a distinction that is acknowledged by the publisher, but not by LC. According to LC:

URN:ISBN:978-3-11-026379-4 rdano:P80048 ex:m1 . // hardback
URN:ISBN:978-3-11-026380-0 rdano:P80048 ex:m1 . // e-book

=> URN:ISBN:978-3-11-026379-4 owl:sameAs URN:ISBN:978-3-11-026380-0 .

Further complications ensue depending on how the two URNs de-reference.

To avoid this, do not map subfield z for invalid ISBNs. We cannot distinguish the reason for invalidity:

not an identifier for this manifestation
identifier for this manifestation, but replaced by publisher for some reason
identifier for this manifestation, but known to be incorrect

CECSpecialistI Apr 25, 2023
Maintainer

Thank you for clarifying, Gordon! I understand what you're saying now.

lake44me · 2023-05-02T14:29:18Z

lake44me
May 2, 2023
Maintainer

Here's what I hope is a readable table of all the "Identifier" tags I could find (there may be more outside of 01X-09X).
My conclusion from doing this is that there may be a few tags that could be fully mapped using a custom datatype, maybe more if we ignore $z invalid identifiers, but most have too many data elements and would be better served by minting a Nomen and using nomen properties to describe it. Your opinion may differ.
Question would be, better/more consistent to just treat all with the same (nomen) structure?
Any advantage to using custom datatype with nomen structure?

MARC identifier tag	Single Custom Datatype could capture all data?	Reified identifier nomen could capture all data?
010 LCCN	Yes; values of subfields as labeled	Yes
013 Patent Control Information	No	Not sure - complex - come back to later
015 National Bibliography Number	No; multiple values to capture	Yes, $q "qualifying information" and $z Candeled/invalid
016 National Bibliographic Agency Control Number	Maybe; only $a and $z for invalid + source	Yes
017 Copyright or Legal Deposit Number	No; multiple values to capture	Think so - complex - come back to later
018 Copyright article-fee code	Probably, if we determine how to map	Not sure - complex (indicates aggregation) come back
020 International Standard Book Number	No; multiple values to capture	Think so - qualifying info and price
022 International Standard Serial Number	Probably not; multiple values to capture	Probably - odd indicator values - come back to later
024 Other Standard Identifier	No; multiple values to capture	Yes
025 Overseas Acquisition Number	Yes; datatype would be the tag label	Yes
026 Fingerprint Identifier	No; multiple values to capture	Probably - complex - come back to later
027 Standard Technical Report Number	No; multiple values to capture	Yes - type of number, qualifying info, cancelled/invalid
028 Publisher or Distributor Number	No; multiple values to capture	Yes - type, source, qualifying info
030 CODEN Designation	Maybe; datatype CODEN, $a	Yes - only $a and $z if we map cancelled/invalid
031 Musical Incipits Information	No; multiple values to capture	Not sure - complex - come back later
032 Postal Registration Number	No; multiple values (type and source agency	Not sure - how is this related to an RDA entity?
035 System Control Number	Probably not; type, plus prefix plus number	Not sure - this is MARC data provenance, come back
036 Original Study Number for Computer Data Files	Probably not; type, plus source agency	Yes
074 GPO Item Number	Maybe; datatype GPO Item Number	Yes - only $a and $z if we map cancelled/invalid
088 Report Number	Maybe; datatype Report Number	Yes - only $a and $z if we map cancelled/invalid

0 replies

CECSpecialistI · 2023-05-02T21:26:02Z

CECSpecialistI
May 2, 2023
Maintainer

Thank you for this work, Laura! It looks pretty ready for discussion, can I put this on the agenda for tomorrow? From: Laura Akerman ***@***.***> Sent: Tuesday, May 2, 2023 7:29 AM To: uwlib-cams/MARC2RDA ***@***.***> Cc: Crystal Yragui ***@***.***>; Mention ***@***.***> Subject: Re: [uwlib-cams/MARC2RDA] Creating identifiers (Discussion #375) Here's what I hope is a readable table of all the "Identifier" tags I could find (there may be more outside of 01X-09X). My conclusion from doing this is that there may be a few tags that could be fully mapped using a custom datatype, maybe more if we ignore $z invalid identifiers, but most have too many data elements and would be better served by minting a Nomen and using nomen properties to describe it. Your opinion may differ. Question would be, better/more consistent to just treat all with the same (nomen) structure? Any advantage to using custom datatype with nomen structure? MARC identifier tag Single Custom Datatype could capture all data? Reified identifier nomen could capture all data? 010 LCCN Yes; values of subfields as labeled Yes 013 Patent Control Information No Not sure - complex - come back to later 015 National Bibliography Number No; multiple values to capture Yes, $q "qualifying information" and $z Candeled/invalid 016 National Bibliographic Agency Control Number Yes; only $a and $z for invalid + source Yes 017 Copyright or Legal Deposit Number No; multiple values to capture Think so - complex - come back to later 018 Copyright article-fee code Probably, if we determine how to map Not sure - complex (indicates aggregation) come back 020 International Standard Book Number No; multiple values to capture Think so - qualifying info and price 022 International Standard Serial Number Probably not; multiple values to capture Probably - odd indicator values - come back to later 024 Other Standard Identifier No; multiple values to capture Yes 025 Overseas Acquisition Number Yes; datatype would be the tag label Yes 026 Fingerprint Identifier No; multiple values to capture Probably - complex - come back to later 027 Standard Technical Report Number No; multiple values to capture Yes - type of number, qualifying info, cancelled/invalid 028 Publisher or Distributor Number No; multiple values to capture Yes - type, source, qualifying info 030 CODEN Designation Maybe; datatype CODEN, $a Yes - only $a and $z if we map cancelled/invalid 031 Musical Incipits Information No; multiple values to capture Not sure - complex - come back later 032 Postal Registration Number No; multiple values (type and source agency Not sure - how is this related to an RDA entity? 035 System Control Number Probably not; type, plus prefix plus number Not sure - this is MARC data provenance, come back 036 Original Study Number for Computer Data Files Probably not; type, plus source agency Yes 074 GPO Item Number Maybe; datatype GPO Item Number Yes - only $a and $z if we map cancelled/invalid 088 Report Number Maybe; datatype Report Number Yes - only $a and $z if we map cancelled/invalid — Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https:/github.com/uwlib-cams/MARC2RDA/discussions/375*discussioncomment-5781715__;Iw!!K-Hz7m0Vt54!n2PJZ-Fds5XHTQbu6ggYIJg1VMFA7OWGIhOI7JaWceEteKE0UQjEM3Qb4XSAz8k2vWAIDMrfYrOESJa3tTz0zBo$>, or unsubscribe<https://urldefense.com/v3/__https:/github.com/notifications/unsubscribe-auth/AKJWNZOVOSR2O34RQ5MG6GTXEEK4RANCNFSM566ZKGZA__;!!K-Hz7m0Vt54!n2PJZ-Fds5XHTQbu6ggYIJg1VMFA7OWGIhOI7JaWceEteKE0UQjEM3Qb4XSAz8k2vWAIDMrfYrOESJa3TxHhqqc$>. You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>

0 replies

lake44me · 2023-05-12T16:16:37Z

lake44me
May 12, 2023
Maintainer

@gerontakos @pan-zhuo Sorry to take so long, I should learn my lesson. Was writing this comment yesterday, power went out, lost it. Always draft somewhere that saves your keystrokes first, like Notepad++.

Among the identifier mappings awaiting review - from pan-zhuo - all the mappings would need to be redone because he was using the approach used in note fields to add a prefix indicating the "type" of identifier, which we agreed wasn't appropriate for identifiers.

So, whether we mint a nomen URI or not, a custom datatype would be in order to distinguish type of identifier - at least at tag level, or more granular if we can get that info.

088 Report Number - only $a (and $z IF we map cancelled/invalid number - separately) Good candidate for custom datatype = "report number" or "cancelled/invalid report number"
027 Standard Technical Report Number - Two types are defined for the field (but not coded): International Standard Technical Report number (ISRN) or a Standard Technical Report Number (STRN). Zhuo has some fancy conditions to identify which kind is in $a - based on punctuation. Using a custom datatype of, say, ="ISRN" or = "STRN" or "ISRN invalid" or "STRN invalid" would not capture the $q qualifier, so either it would need to be added parenthetically in the identifier string, or omitted.
030 CODEN designation - Again, Zhuo has some fancy conditions based on whether the identifier starts with a letter or a number, to determine whether it's a serial or a monograph identifier, and using identifier for work property P10002 with serial identifier and identifier for manifestation property P30004 for monograph. Haven't really reviewed this to see if necessary, but sounds right. Otherwise, there is just $a and $z defined so could be treted as custom datatype = "CODEN" or "CODEN invalid".
020 International Standard Book Number. Subfields a, c, q and z . $c, Terms of availability, gets separately mapped to manifestation property P30160 Has terms of availability. (Not sure of the mapping note Zhuo put there - should it matter which other subfields are in the field?) Same choices as with 027 - if using custom datatype, what to do with $q qualifier, which is repeatable? Do we map $z cancelled/invalid?

I'm skipping 001 and 003 because they work together and this needs to be better accounted for in Sita's mapping. 003 is the source organization for 001. We could decide to make this parenthetical before the identifier string (for the manifestation) in 001, or use a nomen/nomen string and use "Related entity of nomen" property for the institution code... . The 001 field name is Control Number; I'd suggest "MARC record control number" as the datatype.

Also skipping Sophia's 024 Other standard identifier, it seems to be missing a $a mapping.

Also skipping Sita's 028 Publisher Number or Distributor Number which has a lot of question marks, and includes ind.1 values for type of publisher number, $b for Source (agent), as well as $q and $z ...

I haven't done a full review of these mappings, just glanced over, but it'd be better to wait until we resolve the BIG QUESTIONS about how to map identifiers, right?

HTH

0 replies

AdamSchiff · 2023-05-13T00:03:37Z

AdamSchiff
May 13, 2023
Maintainer

The other one that I thought of that isn't in this list is 758 https://www.loc.gov/marc/bibliographic/bd758.html MARC 21 Format for Bibliographic Data: 758: Resource Identifier (Network Development and MARC Standards Office, Library of Congress) <https://www.loc.gov/marc/bibliographic/bd758.html> This field contains the (Network Development and MARC Standards Office, Library of Congress) www.loc.gov Adam Adam L. Schiff Principal Cataloger University of Washington Libraries (206) 543-8409 ***@***.***

…

________________________________ From: Laura Akerman ***@***.***> Sent: Tuesday, May 2, 2023 7:29 AM To: uwlib-cams/MARC2RDA ***@***.***> Cc: Adam L Schiff ***@***.***>; Mention ***@***.***> Subject: Re: [uwlib-cams/MARC2RDA] Creating identifiers (Discussion #375) Here's what I hope is a readable table of all the "Identifier" tags I could find (there may be more outside of 01X-09X). My conclusion from doing this is that there may be a few tags that could be fully mapped using a custom datatype, maybe more if we ignore $z invalid identifiers, but most have too many data elements and would be better served by minting a Nomen and using nomen properties to describe it. Your opinion may differ. Question would be, better/more consistent to just treat all with the same (nomen) structure? Any advantage to using custom datatype with nomen structure? MARC identifier tag Single Custom Datatype could capture all data? Reified identifier nomen could capture all data? 010 LCCN Yes; values of subfields as labeled Yes 013 Patent Control Information No Not sure - complex - come back to later 015 National Bibliography Number No; multiple values to capture Yes, $q "qualifying information" and $z Candeled/invalid 016 National Bibliographic Agency Control Number Yes; only $a and $z for invalid + source Yes 017 Copyright or Legal Deposit Number No; multiple values to capture Think so - complex - come back to later 018 Copyright article-fee code Probably, if we determine how to map Not sure - complex (indicates aggregation) come back 020 International Standard Book Number No; multiple values to capture Think so - qualifying info and price 022 International Standard Serial Number Probably not; multiple values to capture Probably - odd indicator values - come back to later 024 Other Standard Identifier No; multiple values to capture Yes 025 Overseas Acquisition Number Yes; datatype would be the tag label Yes 026 Fingerprint Identifier No; multiple values to capture Probably - complex - come back to later 027 Standard Technical Report Number No; multiple values to capture Yes - type of number, qualifying info, cancelled/invalid 028 Publisher or Distributor Number No; multiple values to capture Yes - type, source, qualifying info 030 CODEN Designation Maybe; datatype CODEN, $a Yes - only $a and $z if we map cancelled/invalid 031 Musical Incipits Information No; multiple values to capture Not sure - complex - come back later 032 Postal Registration Number No; multiple values (type and source agency Not sure - how is this related to an RDA entity? 035 System Control Number Probably not; type, plus prefix plus number Not sure - this is MARC data provenance, come back 036 Original Study Number for Computer Data Files Probably not; type, plus source agency Yes 074 GPO Item Number Maybe; datatype GPO Item Number Yes - only $a and $z if we map cancelled/invalid 088 Report Number Maybe; datatype Report Number Yes - only $a and $z if we map cancelled/invalid — Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/uwlib-cams/MARC2RDA/discussions/375*discussioncomment-5781715__;Iw!!K-Hz7m0Vt54!lvYJ1aryDs-oNk6U4mRwWniRgWU_S-djvbmJCs1I8Lbh5o_YhaczXG33wj7kNdssmjsGlyYfDFfV-3AAUERDk00$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ADFBVB6DXCNGOWREQU4WTE3XEEK4RANCNFSM566ZKGZA__;!!K-Hz7m0Vt54!lvYJ1aryDs-oNk6U4mRwWniRgWU_S-djvbmJCs1I8Lbh5o_YhaczXG33wj7kNdssmjsGlyYfDFfV-3AAXApcuM0$>. You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

pan-zhuo · 2023-05-17T07:39:37Z

pan-zhuo
May 17, 2023
Maintainer

A very small test dataset implementing different options for ISBNs.
The specific elements or value IRIs used to describe the nomens are for demonstration only and may be used incorrectly.

Simple literal without qualifiers
Simple literal with qualifiers
Typed literal without qualifiers
Typed literal with qualifiers
Nomen

https://github.com/uwlib-cams/MARC2RDA/blob/main/Working%20Documents/transformationCode/outputDataForReview/identifiers-withLabels.rdf

2 replies

lake44me Oct 3, 2023
Maintainer

@gerontakos @CECSpecialistI @AdamSchiff @GordonDunsire Time to catch up with this. I don't know if pan-zhuo is still watching this space, but having reviewed the example transform for 020, I think creating the nomen and assigning properties to reflect the type of identifier, etc. is the way to go. This is consistent with our discussion on June 7 (Identifiers: Strings or Things?) and the decision made then to go with Nomens for identifiers (although it doesn't look like this was clearly added to the Decisions index).

These excerpts from the test dataset output make sense in terms of the properties used, to me:

</rdf:Description>

fake:marcfieldF020 ## $a 0764212001 $q (cloth ; $q alk. paper)</fake:marcfield>

<rdf:Description rdf:about="http://marc2rda.edu/fake/nom/d14e14">
rdand:P800680764212001</rdand:P80068>
<rdano:P80048 rdf:resource="http://fakeIRI2.edu/020-test1man"/>
<rdan:P80069 rdf:resource="http://id.loc.gov/vocabulary/identifiers/isbn"/>
rdand:P80071cloth; alk. paper</rdand:P80071>
</rdf:Description>

fake:marcfieldF020 ## $z 9780764212017 $q (large print ; $q alk. paper)</fake:marcfield>

<rdf:Description rdf:about="http://marc2rda.edu/fake/nom/d14e21">
rdand:P800689780764212017</rdand:P80068>
<rdano:P80048 rdf:resource="http://fakeIRI2.edu/020-test1man"/>
<rdan:P80069 rdf:resource="http://id.loc.gov/vocabulary/identifiers/isbn"/>
<rdan:P80168 rdf:resource="http://id.loc.gov/vocabulary/mstatus/cancinv"/>
rdand:P80071large print; alk. paper</rdand:P80071>
</rdf:Description>
</rdf:RDF>

It looks like Zhuo updated the mapping for 088 field to follow this pattern on June 13, but did not do so for 020. He used this shorthand for the transformation in the mapping:

[Manifestation]-->rdamo:P30004-->[Nomen]-->rdand:P80068-->{$a} . [Nomen]-->rdand:P80078-->"Report number" -- ZP 6/12/2023

and for an invalid report number in $z
[Manifestation]-->rdamo:P30004-->[Nomen]-->rdand:P80068-->{$z} . [Nomen]-->rdand:P80078-->"Report number" . [Nomen]-->rdan:P80168--><http://id.loc.gov/vocabulary/mstatus/cancinv> -- ZP 6/12/2023

The only difference with the 020 mapping is use of rdand:P80078 (has category of nomen) instead of rdan:P80069 (has scheme of nomen). Both properties can be unstructured description on up through the 4 levels to IRI. I agree with his choice - when the identifier is in a known scheme, use that property.

Unless there are second thoughts, or offers, I will attempt to edit the 020 mapping transform notes (which use prefixes for the type of identifier in the identifier string, with no nomen IRI) to this approach. Other mappings Zhuo worked on that need similar updating: 027 (Standard Technical Report Number), 030 (Coden).

When these are done, will look at other identifier fields awaiting review to see if handling should be similar.

CECSpecialistI Oct 4, 2023
Maintainer

Following up on discussion from this morning - Added June 7 decision about identifiers to the decisions index here. Thanks for the prompt @lake44me !

cspayne · 2024-06-11T16:36:56Z

cspayne
Jun 11, 2024
Maintainer

Our decision on $6 when minting nomens is to use [Nomen1] isEquivalentTo ["literal value of 880"].
Is this something that should be coded for identifiers? Will there be linked 880s for the 0XX fields?

0 replies

AdamSchiff · 2024-06-11T17:23:11Z

AdamSchiff
Jun 11, 2024
Maintainer

I'm not aware of 0XX using $6, but $6 is defined for most of those fields. I cannot recall ever seeing a paired 0XX field, though there must be some out there. Adam Adam L. Schiff Principal Cataloger University of Washington Libraries (206) 543-8409 ***@***.***

…

________________________________ From: Cypress ***@***.***> Sent: Tuesday, June 11, 2024 9:37 AM To: uwlib-cams/MARC2RDA ***@***.***> Cc: Adam L Schiff ***@***.***>; Mention ***@***.***> Subject: Re: [uwlib-cams/MARC2RDA] Creating identifiers (Discussion #375) Our decision on $6 when minting nomens is to use [Nomen1] isEquivalentTo ["literal value of 880"]. Is this something that should be coded for identifiers? Will there be linked 880s for the 0XX fields? — Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/uwlib-cams/MARC2RDA/discussions/375*discussioncomment-9741515__;Iw!!K-Hz7m0Vt54!jskXjzzzOlEmPZ6QoWTZeh0ajcsJC_y0mdY4K8kshVjOA0JvHccSBhKesyL7qD_leSDQI4q8yR19u_mnyVI8fSA$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ADFBVB2MHAVWRGXTLCM6OF3ZG4RT5AVCNFSM6AAAAABJEWN6FCVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TONBRGUYTK__;!!K-Hz7m0Vt54!jskXjzzzOlEmPZ6QoWTZeh0ajcsJC_y0mdY4K8kshVjOA0JvHccSBhKesyL7qD_leSDQI4q8yR19u_mnIiG2wkU$>. You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

GordonDunsire · 2025-01-02T11:49:32Z

GordonDunsire
Jan 2, 2025
Maintainer

All identifiers in RDA are nomens with the category 'identifier'. The category is hard-wired into the appellation elements with a top-level property 'has identifier for {entity}'.

Properties for specific kinds of identifier are subtypes of the top-level property and are based on legacy elements. The preferred RDA approach is to use nomen data provenance elements such as 'has scheme of nomen' (rdan:P80069).

The default entity for an identifier in a MARC 21 record is Manifestation. Some identifier fields may indicate that the identifier is assigned to a different entity, but in the absence of such indications it can be assumed that the identifier is for the instance of Manifestation that is being described.

It can be assumed that an identifier that is assigned from a 'bibliography' scheme is associated with a manifestation (the output of publication processes).

The same identifier may be assigned to more than one instance of an entity. For example, the hardback and paperback ISBNs are often treated as two identifiers for the same manifestation. It is not a problem if a manifestation description set includes both ISBNs, or just the 'correct' ISBN for its binding. The result is false drops in information retrieval, with a paperback manifestation being included in the hits when the hardback is specified in the search, but this is no worse than the current level of retrievability in MARC 21 systems.

One instance of an entity may be assigned more than one identifier. This may occur within a single scheme (as noted above for the ISBN scheme) or between multiple schemes.

If the identifier is not associated with the manifestion that is described in a MARC 21 record, the usual problem occurs of determing which of the manifestation's related works or expressions to associate with the identifier.

In the past year, we have decided on the use of datatypes, minted nomens, nomen properties for data provenance, IRIS for schemes, etc. for 6XX and other fields. This supersedes the discussion above, thru October 2023. Note that invalid identifiers can be transformed as nomens with a catagory or status of 'invalid' or similar. This is noted in the decisions index.

I think the remaining work to be done is to decide which kind of entity is associated with the identifiers in specific tags, with the default being Manifestation. Subfields can be mapped to nomen provenance properties, as with 6xx.

0 replies

Creating identifiers #375

gerontakos Aug 18, 2022 Maintainer

Replies: 14 comments · 12 replies

pan-zhuo Aug 19, 2022 Maintainer

pan-zhuo Mar 16, 2023 Maintainer

GordonDunsire Mar 16, 2023 Maintainer

gerontakos Mar 16, 2023 Maintainer Author

CECSpecialistI Apr 18, 2023 Maintainer

pan-zhuo Mar 17, 2023 Maintainer

Reification

UW extensions

Sinopia

GordonDunsire Mar 17, 2023 Maintainer

lake44me Apr 18, 2023 Maintainer

lake44me Apr 18, 2023 Maintainer

CECSpecialistI Apr 18, 2023 Maintainer

JianPLee Apr 18, 2023 Maintainer

AdamSchiff Apr 18, 2023 Maintainer

CECSpecialistI Apr 18, 2023 Maintainer

AdamSchiff Apr 19, 2023 Maintainer

GordonDunsire Apr 21, 2023 Maintainer

CECSpecialistI Apr 25, 2023 Maintainer

lake44me May 2, 2023 Maintainer

CECSpecialistI May 2, 2023 Maintainer

lake44me May 12, 2023 Maintainer

AdamSchiff May 13, 2023 Maintainer

pan-zhuo May 17, 2023 Maintainer

lake44me Oct 3, 2023 Maintainer

CECSpecialistI Oct 4, 2023 Maintainer

cspayne Jun 11, 2024 Maintainer

AdamSchiff Jun 11, 2024 Maintainer

GordonDunsire Jan 2, 2025 Maintainer

gerontakos
Aug 18, 2022
Maintainer

Replies: 14 comments 12 replies

pan-zhuo
Aug 19, 2022
Maintainer

pan-zhuo
Mar 16, 2023
Maintainer

GordonDunsire Mar 16, 2023
Maintainer

gerontakos Mar 16, 2023
Maintainer Author

CECSpecialistI Apr 18, 2023
Maintainer

pan-zhuo
Mar 17, 2023
Maintainer

GordonDunsire Mar 17, 2023
Maintainer

lake44me Apr 18, 2023
Maintainer

lake44me Apr 18, 2023
Maintainer

CECSpecialistI Apr 18, 2023
Maintainer

JianPLee Apr 18, 2023
Maintainer

AdamSchiff
Apr 18, 2023
Maintainer

CECSpecialistI Apr 18, 2023
Maintainer

AdamSchiff
Apr 19, 2023
Maintainer

GordonDunsire
Apr 21, 2023
Maintainer

CECSpecialistI Apr 25, 2023
Maintainer

lake44me
May 2, 2023
Maintainer

CECSpecialistI
May 2, 2023
Maintainer

lake44me
May 12, 2023
Maintainer

AdamSchiff
May 13, 2023
Maintainer

pan-zhuo
May 17, 2023
Maintainer

lake44me Oct 3, 2023
Maintainer

CECSpecialistI Oct 4, 2023
Maintainer

cspayne
Jun 11, 2024
Maintainer

AdamSchiff
Jun 11, 2024
Maintainer

GordonDunsire
Jan 2, 2025
Maintainer