Skip to content

Latest commit

 

History

History
338 lines (223 loc) · 14.4 KB

argot_spreadsheet_documentation.adoc

File metadata and controls

338 lines (223 loc) · 14.4 KB

Argot spreadsheet documentation

How to use/interpret argot.xlsx and the CSV files derived from it.

Warning
Argot.xlsx is a supplemental aid for quickly digesting information about the Argot model and mappings. It is not meant to be a source of truth or to be 100% accurate. The marc-to-argot code itself is the canonical form of documentation.

Tabs

  • elements - Defines all Argot elements and subelements

  • mappings - Records rules for mapping MARC (and other source data) to Argot.

  • elements_issues - place to record and track outstanding questions or issues related to Argot elements

  • mappings_issues - place to record and track outstanding questions or issues related to specific mapping rules

General data conventions

Cells should not be left blank because a blank cell has an ambiguous meaning.

  • y : yes

  • n : no

  • x : not applicable, so no (used instead of n/a to minimize Excel "helpfully" suggesting "n/a" when you type "n" or vice versa)

  • ? : don’t know — outstanding question(s) or will determine later

  • . : blank because it hasn’t been examined yet.

When these codes have a more specific meaning in a given column, they are defined for that column below.

elements tab

argot_element

Name of Argot element or subelement.

Subelements are included in square brackets after the name of their parent element:

Example 1. Example

items is an element.

items[call_no] is the call_co subelement of items.

parent element

For subelements: name of parent element

For parent elements or simple/top-level-only elements: x

definition

Definition of the element.

rationale

Why this element is necessary, or what function it plays in the end application.

category

Column for working use. Sort/group/limit elements by their general category.

provisional

Status of the element in the data model.

  • n - No. Directly related to feature upcoming for development, as currently specified; implement with confidence

  • y - Yes. Proposed or to be proposed to extend/enhance feature which may not yet be selected for development. Included in hopes that certain distinctions/behaviors specified will make more sense.

  • ? : questions about how to model

responsibility

Whether this element is handled in the shared MARC-to-Argot config, or must be handled locally.

Note that any institution may opt to override any element that is handled in the MARC-to-Argot config. That will not be captured here. This represents elements known to be non-standard across institiutions, that are not handled in the shared config.

  • local - we expect that the mappings for this will be all (or mostly) institution-specific

  • shared - the mappings from standard data format to this field should be mostly applicable to all institutions providing data in that format.

Clarifications on selected columns

obligation

Represents optionality and cardinality of the element.

When recorded for a top-level (simple or parent) element, the second number refers to how many values may occur in that element.

When recorded for a subelement, the second number refers to how many values may occur for that subelement in a given instance.

Example 2. Example

url (parent element) = {0,n} = A record may have no urls, in which case there should not be a url element in the record. Or a record may have any number of urls, each of which will be described using a set of the url subelements.

url[href] = {1} = for every url value recorded, there must be one and only one url[href] value recorded.

  • {1} : required, single value only

  • {0, 1} : not required. If provided, must be single value

  • {0, n} : not required. Any number of values may be provided

  • {1, n} : required. One or more values may be provided

searchable in

Index in which this element’s data is expected to be searchable.

retain order

Whether it is important or not to retain order of fields

  • y : it is important

  • x : not important or not applicable

Our assumption, which has held so far, is that whatever order we send data into Solr is retained in the Argon application. It seems order is always retained.

In Endeca, all the values for a given field were alphabetized in the index, so we had to jump through a lot of hoops to retain record order. We were tracking the important ordered fields in this column in case we needed to do anything special to them in TRLN Discovery.

facet

Which facet gets values set from this element

Brief display

Brief diplay refers to the brief bibliographic display shown in search results lists and at the top of the full record page.

This column records whether data from this element should appear in brief display, and how it should appear (with a label, mapped to a display value, etc.).

This column was intended to inform development and may not currently reflect the actual Argon configuration.

Full display

Where data from this element appears in the full record page. Refers to the headings used to break up the full record page in the base Argon application, and 'clusters' within those headings that were intended to group similar information together

This column was intended to inform development and may not currently reflect the actual Argon configuration.

note on display

Special notes on how data from an element should display or behave in the display

relevance importance

Filled in only for searchable elements where values from different elements should receive different weighting in the same search index.

For example, in a search on the title index, title_main[value] should be ranked the highest, followed by title_variant[value] and included_work[title], followed by related_work[title].

endeca equivalent

The Endeca property or dimension name equivalent to this Argot element.

Used for comparing data model coverage as we developed Argot.

This column can eventually go away.

notes

Special notes on the end behavior of data in this field

implementation details

Notes or references for the person creating or maintaining transformation code

documentation

Link to fuller Argot documentation for this element (or the pattern it follows)

JIRA issue

The JIRA issue for implementing this element.

This column can eventually go away.

is parent?

Whether the element has subelements or not.

Populated by formula

vernacular treatment

How non-Roman character data in the element is treated.

  • na - no vernacular expected : we don’t expect any non-Roman data in this element, so we don’t do any special processing on it.

  • na - parent element — see subelements : non-Roman processing is handled only in simple elements and subelements

  • pass through/store vernacular — no special processing : special non-Roman processing is only needed for searchable elements.

  • vernacular processing needed : there is special non-Roman processing for this element

vernacular status

Temporary column supporting work being done on non-Roman processing. Indicates status of work on this element.

  • {na} : no work is needed

  • convert to nested element w/lang subelement - map/doc needed : Simple element needing to be converted to nested element. KMS needs to document this in the spreadsheet and relevant spec_doc, and write MTA test(s)

  • define new subelement - map/doc needed : Already a nested element, but needs lang subelement defined/specified and implemented

  • done : All documentation and implementation is complete. Final behavior in Solr/TRLN Discovery has been verified.

  • implementation needed : has been specified and MTA xit tests written. Needs implementation in MTA and final verification

  • implementation needed, institution-specific : has been specified and MTA xit tests written. Needs institution-specific implementation in MTA and final verification

  • partially mapped/doced : KMS is in the middle of specifying/writing MTA tests for this

  • spec-ed in work_entry pattern — implementation needed : support for this is specified/documented in the work_entry pattern. Needs implementation in MTA and final verification

  • test/verification needed : initial MTA implementation complete. Final Solr/TRLN Discovery behavior verification needed

element type

What type of element is this?

  • simple element : top-level element with no subelements

  • parent element : top-level element with subelements

  • subelement : child of a parent element

Populated by formula

argot-ruby processor/pattern

Which argot-ruby flattener pattern is applied to this element.

The logic of the different flatteners is in code at: https://github.com/trln/argot-ruby/tree/master/lib/argot/flatten

abstract processing pattern

Column for working use. Defines the data structure/behavior of the field. May be used to identify further argot-ruby processor/patterns

issue ct

Number of issues recorded for this element in the elements_issues tab.

Working column. Can eventually go away.

mapping ct

Count of how many rows in mappings tab are mappings to this element.

Working/validation column — every non-parent element should have at least one mapping.

Also possibly of interest to keep around.

done in mta?

Whether the field is implemented in MARC-to-Argot

This column has been used and updated spottily and should not be trusted overall.

tests?

Whether data transformation tests have been written for this element in MARC-to-Argot.

This column has been used and updated spottily and should not be trusted overall.

in schema?

Whether this element is represented in the in-progress Argot JSON schema.

This column is updated consistently and is trustworthy.

mappings tab

Records the rules for mapping from MARC/ICE/EAD/whatever into Argot. Does this in a structured way that will allow us to compare our transformation logic to source data specifications to check coverage as standards change.

parent element

The parent Argot element into which source data will be mapped.

Used for sorting/gathering mappings in a useful manner in spreadsheet.

When target element is a simple top-level element, value should be the same as in element column.

element

The specific target Argot element or subelement.

source schema

Metadata format of source data.

  • MARC - MARC 21 Format for Bibliographic Data (expressed either in binary files or as MARC-XML)

  • MARCish - refers to non-MARC data that has been smooshed into MARC fields in a non-standard way for TRLN Discovery-related transformation/ingest.

  • MARC-to-Argot - hard-coded in or derived by the MARC-to-Argot application

provisional?

Whether or not this is a provisional mapping

  • y : I’m proposing this, but it isn’t approved, or putting it in as a placeholder until a question is answered

  • n : proceed with as much confidence as we can muster for anything…​ :-)

institution

  • standard : based on current MARC standard and known legacy MARC data practices. Should apply more or less consistently to any MARC from any institution.

  • DUKE|NCCU|NCSU|UNC : institution-specific mapping

source data element

  • main field tag (MARC) or element (other schema) from which data is mapped

source data subelement

  • subfield(s) (MARC variable fields), byte positions (MARC fixed fields), or element refinement/qualification/subelement (other schema) from which data is to be mapped

constraints

  • further defines which fields data will be mapped from, based on MARC indicator values, values in subfields in the fields, or values in other parts of the record.

Conventions used here - i1, i2 = MARC indicators 1, 2 (in the field being examined) - $x = MARC subfield (in the field being examined) - LDR/06 = the value of byte position 06 of the MARC LDR - LDR/06-07 = the concatenated values of byte positions 06 and 07 of the MARC LDR - I’ve tried to follow a clear/simple method of logical expression, with logical operators in all caps and parentheses used to set up sub-logic

processing_type

  • Basic pattern of processing that is followed. Values explained:

concat_subelements

  • concatenate the contents of any subelements listed

  • keep original order of subelements

  • repeating subelements are fine

The other way of putting this:

  • take the whole field

  • remove any subelements not included in list

  • remove subfield/subelement delimiters

Either way:

  • keep any punctuation provided in between subelements

  • /unless otherwise specified/, add a space at the end of each subelement

Example 3. Example concat subelements

Example subelement/field(s) specified: abcde(g)jqu4

Example incoming data: "700 1 2 $aVaughan Williams, Ralph,$d1872-1958,$ecomposer.$tNorfolk rhapsody,$nno. 1.$0http://id.loc.gov/authorities/names/n79139255$0http://viaf.org/viaf/89801735"

Example mapped data: "Vaughan Williams, Ralph, 1872-1958, composer."

subelement_to_value

  • each instance of listed subelement(s) mapped to separate value (in multivalue field)

Example 4. Example subelement to value

Example subelement/field(s) specified: ax

Example incoming data: "650 _ 0 $aMapuche Indians$zPatagonia (Argentina and Chile)$xRites and ceremonies$xHistory."

Example mapped data: ["Mapuche Indians", "Rites and ceremonies", "History"]

processing instructions

  • special instructions beyond the general processing rules listed below

notes

  • notes of any type

  • separate individual notes in field with ";;;"

mapping_id

  • string derived from concatenating other columns

  • will be used to link up these mapping rules with fields, issues, examples and maybe, ambitiously, tests

mapping issue ct

Number of issues recorded for this mapping in the mappings_issues tab.

Working column. Can eventually go away.

field issue ct

Number of issues recorded for the Argot element in the elements_issues tab.

Working column. Can eventually go away.

field defined?

Dunno, but all values are "yes", so probably not useful information.

done in mta?

Whether the mapping is implemented in MARC-to-Argot

This column has been used and updated spottily and should not be trusted overall.

tests done?

Whether data transformation tests have been written for this mapping in MARC-to-Argot.

This column has been used and updated spottily and should not be trusted overall.