-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathdocs
245 lines (185 loc) · 10.2 KB
/
docs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
Description of the data files in this directory
by John Mark Ockerbloom (last updated 1 Aug 2014)
The libraries and cattype files give instructions on how to link
to searches at various libraries registered in the Forward to Libraries
service.
The wikimap file gives correspondences between selected Library of
Congress authorized headings and Wikipedia article titles.
=====
I. LIBRARIES AND CATTYPE FILES
The libraries and cattype files are made up of records,
one for each library or catalog type,
separated by blank lines. Lines beginning with # are comments,
and ignored. Each line is in the form
ATTRIBUTE value
where ATTRIBUTE is all-caps with no spaces, and value extends to the end
of a line. At present, attributes are not repeatable in a record.
Here are how the attributes in each file are interpreted:
ID
In the libraries file, this gives the identifier used for the library
in question. We try to use ISIL identifiers (see http://biblstandard.dk/isil/)
whenever possible, and often use the ones OCLC assigns, in the interest
of compatibility with the WorldCat registry. Certain characters from the
ISIL identifiers are altered when they might cause problems in URLs; e.g.
# becomes -SHARP
$ becomes -DOLLAR
Identifiers without any hyphens are not ISIL identifiers, but were
coined by me for library systems that didn't appear to have ISIL identifiers
at the time I added them to the database.
WCRID
In the libraries file, this gives the numeric WorldCat Registry ID used
for the library in question. We don't normally use this field
except when a library does not have an OCLC- ID or has one with special
punctuation in it (which makes searching on the OCLC ID difficult),
or has most of its WorldCat registry information in a record not associated
with its OCLC- ID.
This field is not necessary for basic Forward to Libraries functionality.
but can be useful for other applications that want to retrieve other
information on the library, such as URLs and geolocation, from its
WorldCat Registry entry. (Note that any library is eligible to create
and maintain a free WorldCat Registry entry, regardless of
whether they are OCLC members or WorldCat subscribers. See
http://www.worldcat.org/registry/Institutions for details.)
CATTYPE
In the cattype file, CATTYPE gives the identifier for a catalog type record.
In the libraries file, it imports the attribute values from the catalog type
record with that identifier into the current library's attributes. Imported
attribute values can be overridden in the libraries file by a re-declaration
of the same attribute.
NAME
This is the name displayed for the library.
LOCATION
This is the location information displayed for the library.
COUNTRY
This is usually an uppercased rendition of the ISO 3166-1 alpha-2 country
code for the country in which the library is located (which is also its
top-level DNS domain). May be omitted for libraries in the US and Canada.
Exceptional values include "00", for libraries placed under "Global
library services" instead of a country, and "SUPPRESS", for libraries not
included in the list of destinations, but for which we still want to
record some information.
STATE
This is the two-letter postal code of the state, territory, or district
in which the library is located. If set, currently implies that the
library is in the US. (But we might use it for Australian states,
or other regions within a country, in the future.)
PROVINCE
This is the two-letter postal code of the province or Canadian territory
in which the library is located. If set, currently implies that the
library is in Canada.
SOURCE
This is one or more strings (separated by | when more than one), that
indicate patterns that will show up in a referer URL when a visitor is
coming from the website of the current library. We currently
use this when FTL's behavior varies depending on whether a link
is coming from the Online Books Page, Wikipedia, or some other source.
SUBURL, AUTURL, TITURL, ATIURL, KEYURL
These are patterns that indicate how to construct a URL for a search
at the current library by subject, author, title, author/title, and keyword,
respectively. The value generally contains one or more substitution
insructions, indicated by ${var} or ${var:filters}. In these substitution
instructions, ${var} is replaced by the value of the attribute named var,
and ${var:filters} is replaced by the value of the attribute named var
filtered through the filters denoted by filters. For example,
${BASEURL} will substitute the value of the BASEURL attribute in the
libraries record. The following special attributes are provided by
the search itself:
The ARG attribute is used for the subject search term for SUBURL,
the author search term for AUTURL, the title search term for TITURL,
and the keywords in KEYURL.
The AUTHOR and TITLE attributes are used for the author and title
search terms, respectively, in ATIURL.
The filters, if present, consist of one or more letters that indicate
a transformation of the attribute value, as follows:
A : Remove initial article ("A", "An", "The") from a value.
This is sometimes used for title searches when the target
search system doesn't expect them.
K : "Keyword-ify" a value by taking out extra punctuation and
other non-word content
N : "Normalize" a value by turning any non-ASCII characters into
their closest ASCII equivalent (e.g. accented-e becomes e).
Unknown non-ASCII characters are removed.
S; "Simplify" a value by removing material past the second comma,
or the first parenthesis after the first comma. This is sometimes
used, for instance, to take out parts of a name heading that
might not be represented, or handled correctly, in a given
library catalog. For instance, "Twain, Mark, 1835-1910"
would become simply "Twain, Mark", and
"Chesterton, G. K. (Gilbert Keith), 1874-1936" would become
simply "Chesterton, G. K.".
When a particular search cannot be determined for a given library, FTL
will try to use a simpler search that is defined. For example, a library
with no ATIURL might have author/title searches handled via TITURL,
AUTURL, or KEYURL, depending on what definitions and data are provided.
Similarly, a subject search requested for terms that do not appear to
correspond to known library headings might get handled via a general
keyword search. General keyword search is the search of last resort,
so all library records should declare or import KEYURL at least.
DEFAULT
Indicates the URL that should be used when FTL cannot figure out how
to construct a search at all. (For example, if FTL is called with
a VIAF identifier that it cannot map onto a heading, and there is
no way defined to search a catalog directly by VIAF, the best thing
to do is to give up and just put the user at the library's search home.)
If the value of DEFAULT is DOMAIN, then the default URL will be
the domain part of the BASEURL attribute value. If there is no value
defined for this attribute, the complete value of BASEURL will be used.
The same process is used to determine where a general link to the library
(without a search) should go.
FILTERS
Indicates additional filters that should be applied to substititions
in the *URL attributes for a particular library.
IPRANGE
This is one or more CIDR declarations (separated by | when more than one) that
indicate IP ranges used within the current library or institution.
When defined, FTL will route users within those IP ranges to that
library unless another preference has been expressed.
FORWARDER
This gives the URL of the forwarding service for this site, if applicable.
EXCLUDE
This gives the ID(s) of forwarding services to exclude from the
"You can also choose this" list when one picks a forwarder for one service.
(This is not often used, but can be useful if for some reason a service
is not updatable for a while, or cannot connect directly to a given library.)
====
Any other attribute value is simply made available for substitutions.
The most common such attribute is BASEURL, but some catalog types also
use other substitutions for things like location codes, search index
names, and so on.
======================
II. WIKIMAP FILE
The wikimap file is made up of lines, one for each correspondence.
Lines beginning with # are comments, and ignored. Each line consists
of 3 fields, in the form
LC heading|relation|Wikipedia article title
For largely historic reasons, non-ASCII
characters are represented as UTF-8 in Wikipedia article title field,
and as HTML character entities in the LC heading field.
There are four relations defined:
-> People exploring the given LC heading can be usefully referred to the
given Wikipedia article title.
<- People reading the given Wikipedia article can be usefully referred to
the given LC heading.
<-> People reading either the given Wikipedia article or exploring the
given LC heading can be usefully referred to the other.
= Same as <->, but also notes that the Wikipedia article and the
LC heading describe the same thing.
There should be only one referral specified from any given Wikipedia article
or LC heading. (But there can be multiple referrals to either.) The
"wpmapcheck" script in the code directory checks for this, and a few
other consistency issues.
The wikimap file only contains correspondences that need to be assigned
(or overridden) after automatic corrspondence mappings are done. Most
Wikipedia correspondences are in fact automatically assigned, by one
of these mechanisms:
* A common VIAF or LCCN identifier (for names)
* Exact name match, after normalizing accents, capitalization, and such
* Various other lexical matches of headings that exist in either
the Library of Congress subjects of The Online Books Page or
the titles of Wikipedia articles. These include (among others):
* <X> -- History = History of [the] <X>
* <X> (<Known Geolocation Abbrev.>) = <X>, <Known Geolocation>
* <X> in [the] <Known Geolocation> = <X> -- <Known Geolocation>
* <X [plural form]> = <X [singular form]>, when there is a common
singular-plural mapping (e.g. adding s), and there is a
Wikipedia redirect from the plural to the singular