How to use/interpret argot.xlsx and the CSV files derived from it.
Warning
|
Argot.xlsx is a supplemental aid for quickly digesting information about the Argot model and mappings. It is not meant to be a source of truth or to be 100% accurate. The marc-to-argot code itself is the canonical form of documentation. |
-
elements - Defines all Argot elements and subelements
-
mappings - Records rules for mapping MARC (and other source data) to Argot.
-
elements_issues - place to record and track outstanding questions or issues related to Argot elements
-
mappings_issues - place to record and track outstanding questions or issues related to specific mapping rules
Cells should not be left blank because a blank cell has an ambiguous meaning.
-
y : yes
-
n : no
-
x : not applicable, so no (used instead of n/a to minimize Excel "helpfully" suggesting "n/a" when you type "n" or vice versa)
-
? : don’t know — outstanding question(s) or will determine later
-
. : blank because it hasn’t been examined yet.
When these codes have a more specific meaning in a given column, they are defined for that column below.
Name of Argot element or subelement.
Subelements are included in square brackets after the name of their parent element:
items is an element.
items[call_no] is the call_co subelement of items.
For subelements: name of parent element
For parent elements or simple/top-level-only elements: x
Status of the element in the data model.
-
n - No. Directly related to feature upcoming for development, as currently specified; implement with confidence
-
y - Yes. Proposed or to be proposed to extend/enhance feature which may not yet be selected for development. Included in hopes that certain distinctions/behaviors specified will make more sense.
-
? : questions about how to model
Whether this element is handled in the shared MARC-to-Argot config, or must be handled locally.
Note that any institution may opt to override any element that is handled in the MARC-to-Argot config. That will not be captured here. This represents elements known to be non-standard across institiutions, that are not handled in the shared config.
-
local - we expect that the mappings for this will be all (or mostly) institution-specific
-
shared - the mappings from standard data format to this field should be mostly applicable to all institutions providing data in that format.
Represents optionality and cardinality of the element.
When recorded for a top-level (simple or parent) element, the second number refers to how many values may occur in that element.
When recorded for a subelement, the second number refers to how many values may occur for that subelement in a given instance.
url (parent element) = {0,n} = A record may have no urls, in which case there should not be a url element in the record. Or a record may have any number of urls, each of which will be described using a set of the url subelements.
url[href] = {1} = for every url value recorded, there must be one and only one url[href] value recorded.
-
{1} : required, single value only
-
{0, 1} : not required. If provided, must be single value
-
{0, n} : not required. Any number of values may be provided
-
{1, n} : required. One or more values may be provided
Whether it is important or not to retain order of fields
-
y : it is important
-
x : not important or not applicable
Our assumption, which has held so far, is that whatever order we send data into Solr is retained in the Argon application. It seems order is always retained.
In Endeca, all the values for a given field were alphabetized in the index, so we had to jump through a lot of hoops to retain record order. We were tracking the important ordered fields in this column in case we needed to do anything special to them in TRLN Discovery.
Brief diplay refers to the brief bibliographic display shown in search results lists and at the top of the full record page.
This column records whether data from this element should appear in brief display, and how it should appear (with a label, mapped to a display value, etc.).
This column was intended to inform development and may not currently reflect the actual Argon configuration.
Where data from this element appears in the full record page. Refers to the headings used to break up the full record page in the base Argon application, and 'clusters' within those headings that were intended to group similar information together
This column was intended to inform development and may not currently reflect the actual Argon configuration.
Filled in only for searchable elements where values from different elements should receive different weighting in the same search index.
For example, in a search on the title index, title_main[value] should be ranked the highest, followed by title_variant[value] and included_work[title], followed by related_work[title].
The Endeca property or dimension name equivalent to this Argot element.
Used for comparing data model coverage as we developed Argot.
This column can eventually go away.
Notes or references for the person creating or maintaining transformation code
How non-Roman character data in the element is treated.
-
na - no vernacular expected : we don’t expect any non-Roman data in this element, so we don’t do any special processing on it.
-
na - parent element — see subelements : non-Roman processing is handled only in simple elements and subelements
-
pass through/store vernacular — no special processing : special non-Roman processing is only needed for searchable elements.
-
vernacular processing needed : there is special non-Roman processing for this element
Temporary column supporting work being done on non-Roman processing. Indicates status of work on this element.
-
{na} : no work is needed
-
convert to nested element w/lang subelement - map/doc needed : Simple element needing to be converted to nested element. KMS needs to document this in the spreadsheet and relevant spec_doc, and write MTA test(s)
-
define new subelement - map/doc needed : Already a nested element, but needs lang subelement defined/specified and implemented
-
done : All documentation and implementation is complete. Final behavior in Solr/TRLN Discovery has been verified.
-
implementation needed : has been specified and MTA xit tests written. Needs implementation in MTA and final verification
-
implementation needed, institution-specific : has been specified and MTA xit tests written. Needs institution-specific implementation in MTA and final verification
-
partially mapped/doced : KMS is in the middle of specifying/writing MTA tests for this
-
spec-ed in work_entry pattern — implementation needed : support for this is specified/documented in the work_entry pattern. Needs implementation in MTA and final verification
-
test/verification needed : initial MTA implementation complete. Final Solr/TRLN Discovery behavior verification needed
What type of element is this?
-
simple element : top-level element with no subelements
-
parent element : top-level element with subelements
-
subelement : child of a parent element
Populated by formula
Which argot-ruby flattener pattern is applied to this element.
The logic of the different flatteners is in code at: https://github.com/trln/argot-ruby/tree/master/lib/argot/flatten
Column for working use. Defines the data structure/behavior of the field. May be used to identify further argot-ruby processor/patterns
Number of issues recorded for this element in the elements_issues tab.
Working column. Can eventually go away.
Count of how many rows in mappings tab are mappings to this element.
Working/validation column — every non-parent element should have at least one mapping.
Also possibly of interest to keep around.
Whether the field is implemented in MARC-to-Argot
This column has been used and updated spottily and should not be trusted overall.
Whether data transformation tests have been written for this element in MARC-to-Argot.
This column has been used and updated spottily and should not be trusted overall.
Records the rules for mapping from MARC/ICE/EAD/whatever into Argot. Does this in a structured way that will allow us to compare our transformation logic to source data specifications to check coverage as standards change.
The parent Argot element into which source data will be mapped.
Used for sorting/gathering mappings in a useful manner in spreadsheet.
When target element is a simple top-level element, value should be the same as in element column.
Metadata format of source data.
-
MARC - MARC 21 Format for Bibliographic Data (expressed either in binary files or as MARC-XML)
-
MARCish - refers to non-MARC data that has been smooshed into MARC fields in a non-standard way for TRLN Discovery-related transformation/ingest.
-
MARC-to-Argot - hard-coded in or derived by the MARC-to-Argot application
Whether or not this is a provisional mapping
-
y : I’m proposing this, but it isn’t approved, or putting it in as a placeholder until a question is answered
-
n : proceed with as much confidence as we can muster for anything… :-)
-
standard : based on current MARC standard and known legacy MARC data practices. Should apply more or less consistently to any MARC from any institution.
-
DUKE|NCCU|NCSU|UNC : institution-specific mapping
-
subfield(s) (MARC variable fields), byte positions (MARC fixed fields), or element refinement/qualification/subelement (other schema) from which data is to be mapped
-
further defines which fields data will be mapped from, based on MARC indicator values, values in subfields in the fields, or values in other parts of the record.
Conventions used here - i1, i2 = MARC indicators 1, 2 (in the field being examined) - $x = MARC subfield (in the field being examined) - LDR/06 = the value of byte position 06 of the MARC LDR - LDR/06-07 = the concatenated values of byte positions 06 and 07 of the MARC LDR - I’ve tried to follow a clear/simple method of logical expression, with logical operators in all caps and parentheses used to set up sub-logic
-
Basic pattern of processing that is followed. Values explained:
-
concatenate the contents of any subelements listed
-
keep original order of subelements
-
repeating subelements are fine
The other way of putting this:
-
take the whole field
-
remove any subelements not included in list
-
remove subfield/subelement delimiters
Either way:
-
keep any punctuation provided in between subelements
-
/unless otherwise specified/, add a space at the end of each subelement
Example subelement/field(s) specified: abcde(g)jqu4
Example incoming data: "700 1 2 $aVaughan Williams, Ralph,$d1872-1958,$ecomposer.$tNorfolk rhapsody,$nno. 1.$0http://id.loc.gov/authorities/names/n79139255$0http://viaf.org/viaf/89801735"
Example mapped data: "Vaughan Williams, Ralph, 1872-1958, composer."
-
each instance of listed subelement(s) mapped to separate value (in multivalue field)
Example subelement/field(s) specified: ax
Example incoming data: "650 _ 0 $aMapuche Indians$zPatagonia (Argentina and Chile)$xRites and ceremonies$xHistory."
Example mapped data: ["Mapuche Indians", "Rites and ceremonies", "History"]
-
string derived from concatenating other columns
-
will be used to link up these mapping rules with fields, issues, examples and maybe, ambitiously, tests
Number of issues recorded for this mapping in the mappings_issues tab.
Working column. Can eventually go away.
Number of issues recorded for the Argot element in the elements_issues tab.
Working column. Can eventually go away.
Whether the mapping is implemented in MARC-to-Argot
This column has been used and updated spottily and should not be trusted overall.