While there seems to be an endless variety of citation formats, many publishers fortunately standardize on one, or at most a few, citation formats for their collection of journals. This reduces the number of "independent" CSL citation styles we need to maintain and allows us to create lightweight "dependent" CSL styles for the individual journals.
Whereas independent styles contain a full description of a citation format, dependent styles are light-weight and act as an alias to an independent style (while dependent styles inherit their complete style format from their parent style, they can re(define) one setting: the style locale).
This journals
repository holds a collection of journal metadata for publishers that have standardized their use of citation formats.
At a minimum, this metadata should include the journal titles and ISSNs (print and online).
Through a script, we can quickly generate dependent CSL styles from this metadata.
This repository was created on 2016-03-10 as a dedicated home for the journal metadata previously stored in the https://github.com/citation-style-language/utilities/ repository (in the now-removed "generate_dependent_styles" subdirectory).
The first step in creating dependent styles for the journals of a publisher is collecting the metadata. Some publishers conveniently publish spreadsheets with all the necessary details of their journals. Sometimes we contact publishers to ask for their journal metadata, or for small publishers, compile the metadata by hand.
Each publisher gets its own metadata directory in the root of this repository (e.g., asm
for the American Society for Microbiology).
When creating a new publisher directory, update the file publishers.json to match the directory name to a full publisher name.
Each publisher directory must contain the following files:
_journals.tab
is the main file for storing a publisher's journal metadata.
It must be a tab-delimited file, with one column per metadata field.
Metadata fields are identified by the field names in the column header.
By convention these field names are in all-caps, although case doesn't matter for the script.
Each row below the header should cover a single journal.
For example, the first few rows could look like:
TITLE ISSN EISSN DOCUMENTATION
Antimicrobial Agents and Chemotherapy 0066-4804 1098-6596 http://aac.asm.org/
Applied and Environmental Microbiology 0099-2240 1098-5336 http://aem.asm.org/
Genome Announcements x 2169-8287 http://genomea.asm.org/
When preparing _journals.tab
, keep the following in mind:
_journals.tab
should ideally only contain metadata that differs between journals. If, for example, all journals have the same documentation URL, it's better to add this URL to the shared style template.- the only required metadata field is "TITLE", providing the style title.
The field names "IDENTIFIER" and "XML-COMMENT" are reserved and may not be used within
_journals.tab
. There are no other restrictions on the naming of column headers. - for consistency, we recommend the following names for common metadata fields:
- "TITLESHORT": abbreviated journal title
- "PARENT": independent parent style
- "DOCUMENTATION": documentation URL
- "FORMAT": citation format (e.g., "numeric")
- "FIELD": field (e.g., "medicine")
- "ISSN": print ISSN
- "EISSN": electronic ISSN
- "LOCALE": locale code for style localization (e.g., "en-US" for US English; for a list of common locale codes, see the CSL locales repository)
- ampersands are automatically escaped for the "TITLE", "TITLESHORT", and "DOCUMENTATION" fields.
- not every journal will have a value for each metadata field. For example, online-only journals won't have a print ISSN. In this case, just leave the field empty, or fill it with one of the accepted placeholder characters, "x", "X", or a space.
_template.csl
is the template for the dependent CSL styles.
An example:
<?xml version="1.0" encoding="utf-8"?>
<style xmlns="http://purl.org/net/xbiblio/csl" version="1.0" default-locale="#LOCALE#">
<!-- #XML-COMMENT# -->
<info>
<title>#TITLE#</title>
<id>http://www.zotero.org/styles/#IDENTIFIER#</id>
<link href="http://www.zotero.org/styles/#IDENTIFIER#" rel="self"/>
<link href="http://www.zotero.org/styles/american-society-for-microbiology" rel="independent-parent"/>
<link href="#DOCUMENTATION#" rel="documentation"/>
<category citation-format="numeric"/>
<category field="biology"/>
<issn>#ISSN#</issn>
<eissn>#EISSN#</eissn>
<updated>2015-05-08T17:36:22+00:00</updated>
<rights license="http://creativecommons.org/licenses/by-sa/3.0/">This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 License</rights>
</info>
</style>
This template file should include all metadata that is the same for all journals.
In contrast, template placeholders should be used for metadata that differs between journals.
These placeholders, in all-caps and enclosed in hashes (e.g., #TITLE#
), are replaced by journal metadata during style generation.
Except for the automatically generated "IDENTIFIER" and "XML-COMMENT" metadata fields, all placeholders must therefore match metadata field names from _journals.tab
.
If the metadata field for a journal is empty or contains only a placeholder character ("x", "X", or space are accepted), the entire line containing the matching placeholder is deleted in the generated style.
The "#IDENTIFIER#" placeholder must be used to generate the style ID and "self" link, and generated styles are automatically named after the "IDENTIFIER" field. The value of the "IDENTIFIER" field is derived from "TITLE", by, among others, removing diacritics and punctuation, and replacing spaces by hyphens. For example, for the journal "Applied and Environmental Microbiology", the "IDENTIFIER" will become "applied-and-environmental-microbiology", the file name "applied-and-environmental-microbiology.csl", and the style ID should be "http://www.zotero.org/styles/applied-and-environmental-microbiology".
The "#XML-COMMENT#" placeholder should be included above the <info/>
element, and is replaced by a string that tells us that the style was automatically generated, and which subdirectory produced the style.
For example, the string could read "Generated with https://github.com/citation-style-language/utilities/tree/master/generate_dependent_styles/data/**subdirectory**", with "subdirectory" matching the name of the subdirectory being processed (this particular URL is outdated following our move to the journals
repository).
Each publisher directory may also contain any of the following files:
Some publishers helpfully publish a spreadsheet of their journal catalog with all required metadata. In this case, it's often handy to save a copy of this spreadsheet in the repository.
It's always a good idea to document your work in a _README.mdown
file, so that other contributors (or your future self!) can figure out what you did.
For instance, where does the metadata come from?
How up-to-date is the metadata you collected?
And if you got help from the publisher, who were your contacts?
_skip.txt
may be used to list journals that should be skipped during style generation.
Journals can be specified by their "TITLE" or "IDENTIFIER" value, one per line.
_rename.tab
may be used to change the value of "IDENTIFIER".
Each line in this tab-delimited file should list the original generated "IDENTIFIER" value, followed by the new value.
An example of _rename.tab
is shown below:
acm-computing-surveys computing-surveys
acm-transactions-on-algorithms transactions-on-algorithms
_extra.tab
may be used to add additional journals, or to overwrite the metadata for journals already stored in _journals.tab
.
For example, for large publishers it's handier to store the publisher-provided metadata untouched in _journals.tab
, while making any manual corrections in _extra.tab
.
_extra.tab
must be a tab-delimited file, with the exact same header and column order as _journals.tab
.
When entries in _journals.tab
and _extra.tab
generate the same "IDENTIFIER" value, the entry in _extra.tab
completely replaces the entry in _journals.tab
.
Journals defined in _extra.tab
are still subject to skipping (through _skip.txt
) and renaming (through _rename.tab
).
Once the journal metadata has been properly formatted, dependent styles can be generated by running the generate_styles.rb
Ruby script from the command line:
ruby generate_styles.rb
Generated styles are placed in the _dependent
subdirectory of the journals
repository, which will be created if it does not yet exist.
Existing directory contents will be deleted for each run of the script.
Since the _dependent
directory is listed in .gitignore
, you won't accidentally commit any generated styles to the journals
repository.
By default, generate_styles.rb
generates styles for all publisher directories.
However, you can limit style generation to a single directory with the --dir
(or -d
) option.
Just specify the name of the desired directory directly after the option. For example:
ruby generate_styles.rb -d asm
If you use the --sync
(or -s
) option, the script will automatically update your local styles
repository with the newly generated styles.
This requires that you cloned the CSL styles
and journals
repositories into the same parent directory.
When using the --sync
option, the script will by default delete all existing generated dependent styles from the local styles
repository, and copy over the newly generated styles.
Only those dependent styles are deleted that have an XML comment that matches one of the publisher directories processed by the script (e.g., "Generated with https://github.com/citation-style-language/utilities/tree/master/generate_dependent_styles/data/**subdirectory**").
The --sync
option may be followed by a qualifier to limit which styles are updated. The qualifiers are "additions", "deletions", and "modifications", which together cover all possible updates:
- "additions": only styles that don't yet exist in the
styles
repository are copied over - "deletions": only previously generated styles that aren't present in the newly generated set are deleted from the
styles
directory. - "modifications": only styles that already exist in the
styles
directory are copied over
The --dir
and --sync
options can be used in combination. Also note that it's enough to type the first letter of the desired qualifier.
For example, to generate the styles for the "asm" directory and copy over only the new styles, use:
ruby generate_styles.rb -d asm -s a
When you are preparing a pull request for a big metadata update for a large publisher, it's often a good idea to create separate commits for style additions, deletions, and modifications. This makes it easier to review the pull requests, since Git gets easily confused if a commit contains both style deletions and additions and will often mistake these for style renamings.
When using the --sync
option, the script will by default not touch those styles in the styles
repository that are identical to a newly generated style, or if the only differences are in the styles' timestamps (stored in <updated/>
) and/or the styles' XML comments.
This allows you to update the timestamp in _template.tab
with every metadata update, while avoiding changes to all styles for which the journal metadata stayed the same.
However, you can force the replacement of all styles by using the --sync
option in combination with --force
(or -f
) option.
You may encounter the following encoding error when running the Ruby script under Windows:
generate_styles.rb:21:in `delete!': incompatible character encodings: CP850 and UTF-8 (Encoding::CompatibilityError)
If this happens, try running Ruby with the -E UTF-8
option:
ruby -E UTF-8 generate_styles.rb