-
Notifications
You must be signed in to change notification settings - Fork 17
lesson: define a terminology with a nested hierarchy of terms
This lesson is known to work with om version 3.0.4.
Please update this wiki to reflect any other versions that have been tested.
Most XML is not flat. It has hierarchies of nodes nested in semantically relevant ways. For example, we might group a person's given name and family name within a name
node
<fields>
<title>ZOIA! Memoirs of Zoia Horn, Battler for the People's Right to Know.</title>
<name>
<givenName>Zoia</givenName>
<familyName>Horn</familyName>
<role>
<text>Author</text>
<code>AUT</code>
</role>
</name>
</fields>
Now we'll create a file called fancy_book_metadata.rb
Paste the following code into that file:
require "om"
class FancyBookMetadata
include OM::XML::Document
set_terminology do |t|
t.root(path: "fields")
t.title
# The underscore is purely to avoid namespace conflicts.
t.name_ {
t.family_name(path: "familyName")
t.given_name(path: "givenName")
t.role {
t.text
t.code
}
}
end
# This method is called when you create new XML documents from scratch.
# It must return a Nokogiri::Document. Other than that, you can make your "default" documents look however you want.
def self.xml_template
Nokogiri::XML.parse("<fields/>")
end
end
Note that we are using the :path
option to define Terms with names like family_name
that correspond to xml elements with names like familyName
. This allows you to have consistent method names that use Ruby conventions (or whatever conventions you prefer) even though the actual element names in your XML can widely vary.
Restart the console
bundle console
Require the FancyBookMetadata class definition.
require "./fancy_book_metadata"
fancybook = FancyBookMetadata.new
puts fancybook.to_xml
<?xml version="1.0"?>
<fields/>
Now you can use the terms to edit the XML. Call the Terms according to how they are nested in the Terminology. For example, in this Terminology we have nested given_name
inside name
, so you can call .name.given_name
on your document.
fancybook.name.given_name = "Zoia"
=> "Zoia"
fancybook.name.family_name = "Horn"
=> "Horn"
fancybook.name.role.text = "author"
=> "author"
fancybook.name.role.code = "AUT"
=> "AUT"
puts fancybook.to_xml
<?xml version="1.0"?>
<fields>
<name><givenName>Zoia</givenName><familyName>Horn</familyName><role><text>author</text><code>AUT</code></role></name>
</fields>
=> nil
Notice that we never had to explicitly create the <role>
node before inserting the <text>
and <code>
nodes into it. OM handled that for us.
Say we have two authors. We should create two <name>
nodes, each with a <givenName>
, and <familyName>
and <role>
<fields>
<title>ZOIA! Memoirs of Zoia Horn, Battler for the People's Right to Know.</title>
<name>
<givenName>Zoia</givenName>
<familyName>Horn</familyName>
<role>
<text>Author</text>
<code>AUT</code>
</role>
</name>
<name>
<givenName>Julius</givenName>
<familyName>Caesar</familyName>
<role>
<text>Contributor</text>
<code>CON</code>
</role>
</name>
</fields>
How do we prevent it from all getting bunched up into one <name>
node like below, where everything is dumped into one <name>
node?
<fields>
<title>ZOIA! Memoirs of Zoia Horn, Battler for the People's Right to Know.</title>
<name>
<givenName>Zoia</givenName>
<givenName>Julius</givenName>
<familyName>Horn</familyName>
<familyName>Caesar</familyName>
<role>
<text>Author</text>
<code>AUT</code>
</role>
<role>
<text>Contributor</text>
<code>CON</code>
</role>
</name>
</fields>
The answer is to specifically address which node you want to create/read/update by passing its index to the Term. These indexes start at 0 like Arrays in Ruby, Java, C, etc, so fancybook.name(1)
will be addressed to the second name entry in the document, not the first.
fancybook.name(1).family_name = "Caesar"
=> "Caesar"
fancybook.name(1).given_name = "Julius"
=> "Julius"
fancybook.name(1).role.text = "Contributor"
=> "Contributor"
fancybook.name(1).role.code = "CON"
=> "CON"
puts fancybook.to_xml
<?xml version="1.0"?>
<fields>
<name><givenName>Zoia</givenName><familyName>Horn</familyName><role><text>author</text><code>AUT</code></role></name>
<name><familyName>Caesar</familyName><givenName>Julius</givenName><role><text>Contributor</text><code>CON</code></role></name>
</fields>
Say I want to iterate over all of the names in a document and then do something to their child nodes. It is tempting to think that I could use fancybook.name.each do |name| ...
but that won't work because when you use a Term to access elements, OM returns the values of the elements rather than the elements themselves. This means that when you call fancybook.name
it's returning an array of Strings that are all the child elements' values concatenated together.
fancybook.name
=> ["ZoiaHornauthor", "CaesarJuliusContributor"]
That's not useful for the kind of task we're trying to handle here. The solution is to iterate over the Nokogiri Nodeset. Then you can use anything from the Nokogiri API to navigate through the nodeset and update nodes.
fancybook.name.nodeset.each {|namenode| puts "Node: "; puts namenode.inspect}
Node:
#<Nokogiri::XML::Element:0x80a0c2a8 name="name" children=[#<Nokogiri::XML::Text:0x8085fc34 "">, #<Nokogiri::XML::Element:0x809d0690 name="givenName" children=[#<Nokogiri::XML::Text:0x8085f284 "Zoia">]>, #<Nokogiri::XML::Element:0x8043ce7c name="familyName" children=[#<Nokogiri::XML::Text:0x80861df4 "Horn">]>, #<Nokogiri::XML::Element:0x8082799c name="role" children=[#<Nokogiri::XML::Text:0x808663e0 "author">]>]>
Node:
#<Nokogiri::XML::Element:0x809ec4bc name="name" children=[#<Nokogiri::XML::Text:0x80a81058 "">, #<Nokogiri::XML::Element:0x809f3758 name="familyName" children=[#<Nokogiri::XML::Text:0x80aae530 "Caesar">]>, #<Nokogiri::XML::Element:0x80a06934 name="givenName" children=[#<Nokogiri::XML::Text:0x80aade3c "Julius">]>, #<Nokogiri::XML::Element:0x80a0c7bc name="role" children=[#<Nokogiri::XML::Text:0x80ab2c34 "Contributor">]>]>
=> 0
Go on to Lesson: Make Terms that reference attributes on XML elements or return to the Tame your XML with OM page.