Lesson: Control Indexing of your RDF metadata (AF7)

Every ActiveFedora datastream can make a representation of itself to store in solr by using the built-in method to_solr. This method requires that the datastream is associated with a model, so that the RDF can be written with the pid as the subject of its assertions. Let's take a look at what the default to_solr method gives us:

class DublinCoreDatastream < ActiveFedora::NtriplesRDFDatastream
  property :title, predicate: RDF::DC.title
  property :created, predicate: RDF::DC.created
  #...
end
class MyObj < ActiveFedora::Base
  has_metadata 'descMetadata', type: DublinCoreDatastream
  has_attributes :title, :created, datastream: 'descMetadata', multiple: false
end

m = MyObj.new(title: 'One Hundred Years of Solitude', created: '1967')
=> #<MyObj pid: nil, title: "One Hundred Years of Solitude", created: "1967">
m.descMetadata.to_solr
=> {}

As you can see the default behavior is an empty hash. Nothing will be indexed in solr by default. Now, let's tweak the behavior to produced a more interesting and useful document:

class DublinCoreDatastream < ActiveFedora::NtriplesRDFDatastream
  property :title, predicate: RDF::DC.title do |index|
    index.as :sortable, :searchable
  end
  property :created, predicate: RDF::DC.created do |index|
    index.as :stored_searchable
  end
  #...
end
class MyObj < ActiveFedora::Base
  has_metadata 'descMetadata', type: DublinCoreDatastream
  has_attributes :title, :created, datastream: 'descMetadata', multiple: false
end

m = MyObj.new(title: 'One Hundred Years of Solitude', created: '1967')
=> #<MyObj pid: nil, title: "One Hundred Years of Solitude", created: "1967">
m.descMetadata.to_solr
=> {"desc_metadata__title_si"=>"One Hundred Years of Solitude",
    "desc_metadata__title_teim"=>["One Hundred Years of Solitude"],
    "desc_metadata__created_tesim"=>["1967"]}

This time we can see that the Solr document has three fields, which are derived from the two data fields. You will notice that different arguments on the index.as line produce different suffixes in the output. These suffixes control how solr indexes this behavior. The first one or two characters if the suffix determine the Solr field type:

For example:

dt = date
s = string (not tokenized)
te = text (tokenized with english assumptions)
i = integer
b = boolean

The last characters are:

s = if present, stored (can be displayed after retrieval)
i = if present, index this field (for searching)
m = if present, multivalued (can't sort on multivalued fields)

See https://github.com/projecthydra/active_fedora/blob/master/lib/generators/active_fedora/config/solr/templates/solr_conf/conf/schema.xml#L13-L152 for an exhaustive list.

Solrizer gives us some macros that help build the appropriate shortcuts:

:stored_searchable
- _tesim - for strings or text fields
- _dtsim - for dates
- _isim - for integers
:searchable
- _teim - for strings or text fields
- _dtim - for dates
- _iim - for integers
:facetable
- _sim
:symbol
- _ssim
and others. See https://github.com/projecthydra/solrizer/blob/master/lib/solrizer/default_descriptors.rb

Next Step

Go on to Lesson: using typed predicates in your models (AF7) or return to the Tame your RDF Metadata with ActiveFedora landing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lesson: Control Indexing of your RDF metadata (AF7)

Next Step

Clone this wiki locally