For this tutorial we need some command-line tools to process MARC data. You can download a VirtualBox image containing most of the required tools at the Catmandu project. Please follow the installation instructions. Or install all tools on your own system. All necessary steps are described using a Debian based system.
Some of these tools require a Perl interpreter. It is recommended to install a local Perl environment on your system using perlbrew
:
# install perlbrew
$ \curl -L https://install.perlbrew.pl | bash
# edit .bashrc
$ echo -e '\nsource ~/perl5/perlbrew/etc/bashrc\n' >> ~/.bashrc
$ source ~/.bashrc
# initialize
$ perlbrew init
# see what versions are available
$ perlbrew available
# install a Perl version
$ perlbrew install -j 2 -n perl-5.28.2
# see installed versions
$ perlbrew list
# switch to an installation and set it as default
$ perlbrew switch perl-5.28.2
# install cpanm
$ perlbrew install-cpanm
Catmandu is data toolkit which can be used for ETL processes. The project website provides some detailed instructions on how to install catmandu
on different systems.
# install dependencies
$ sudo apt install autoconf build-essential dconf-cli libexpat1-dev \
libgdbm-dev libssl-dev libxml2-dev libxslt1-dev libyaz-dev parallel perl-doc \
yaz zlib1g zlib1g-dev
# install Catmandu modules
$ cpanm Catmandu Catmandu::Breaker Catmandu::Exporter::Table \
Catmandu::Identifier Catmandu::Importer::getJSON Catmandu::MARC Catmandu::OAI \
Catmandu::PICA Catmandu::PNX Catmandu::RDF Catmandu::SRU Catmandu::Stat \
Catmandu::Template Catmandu::VIAF Catmandu::Validator::JSONSchema \
Catmandu::Wikidata Catmandu::XLS Catmandu::XSD Catmandu::Z3950
MARC::Schema provides the command-line utility marcvalidate
to validate MARC records.
$ cpanm MARC::Schema
MARC::Record::Stats provides the command-line utility marcstats.pl
to generate statistics for your MARC records.
$ cpanm MARC::Record::Stats
For Unicode normalizations we need the command-line utility uconv
.
$ sudo apt install libicu-dev
YAZ is a free open source toolkit from Index Data, that includes command-line utility programs like yaz-client
and yaz-marcdump
.
$ sudo apt install yaz
xmllint
is a command-line tool to process XML data.
$ sudo apt install libxml2-utils
For transformation of XML data with XSL stylesheets we need a XSLT processor.
$ sudo apt install xsltproc
For more information of these tools you can read their man
or help
pages, e.g.:
$ man yaz-marcdump
$ xmllint --help
Several other software tools and libraries to process MARC data are available, see:
- MARC Specialized Tools https://www.loc.gov/marc/marctools.html
- Working with MARC https://wiki.code4lib.org/Working_with_MARC
- Catmandu - Related projects https://librecat.org/Catmandu/#selected-formats