Skip to content

Commit

Permalink
schema/config updates and nameSort change
Browse files Browse the repository at this point in the history
  • Loading branch information
qqmyers committed Jan 30, 2025
1 parent c19d07e commit f1d9eeb
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 94 deletions.
51 changes: 26 additions & 25 deletions conf/solr/schema.xml
Original file line number Diff line number Diff line change
Expand Up @@ -38,36 +38,37 @@
catchall "text" field, and use that for searching.
-->

<schema name="default-config" version="1.6">
<schema name="default-config" version="1.7">
<!-- attribute "name" is the name of this schema and is only used for display purposes.
version="x.y" is Solr's version number for the schema syntax and
version="x.y" is Solr's version number for the schema syntax and
semantics. It should not normally be changed by applications.
1.0: multiValued attribute did not exist, all fields are multiValued
1.0: multiValued attribute did not exist, all fields are multiValued
by nature
1.1: multiValued attribute introduced, false by default
1.2: omitTermFreqAndPositions attribute introduced, true by default
1.1: multiValued attribute introduced, false by default
1.2: omitTermFreqAndPositions attribute introduced, true by default
except for text fields.
1.3: removed optional field compress feature
1.4: autoGeneratePhraseQueries attribute introduced to drive QueryParser
behavior when a single string produces multiple tokens. Defaults
behavior when a single string produces multiple tokens. Defaults
to off for version >= 1.4
1.5: omitNorms defaults to true for primitive field types
1.5: omitNorms defaults to true for primitive field types
(int, float, boolean, string...)
1.6: useDocValuesAsStored defaults to true.
1.7: docValues defaults to true, uninvertible defaults to false.
-->

<!-- Valid attributes for fields:
name: mandatory - the name for the field
type: mandatory - the name of a field type from the
type: mandatory - the name of a field type from the
fieldTypes section
indexed: true if this field should be indexed (searchable or sortable)
stored: true if this field should be retrievable
docValues: true if this field should have doc values. Doc Values is
recommended (required, if you are using *Point fields) for faceting,
grouping, sorting and function queries. Doc Values will make the index
faster to load, more NRT-friendly and more memory-efficient.
They are currently only supported by StrField, UUIDField, all
faster to load, more NRT-friendly and more memory-efficient.
They are currently only supported by StrField, UUIDField, all
*PointFields, and depending on the field type, they might require
the field to be single-valued, be required or have a default value
(check the documentation of the field type you're interested in for
Expand All @@ -82,9 +83,9 @@
given field.
When using MoreLikeThis, fields used for similarity should be
stored for best performance.
termPositions: Store position information with the term vector.
termPositions: Store position information with the term vector.
This will increase storage costs.
termOffsets: Store offset information with the term vector. This
termOffsets: Store offset information with the term vector. This
will increase storage costs.
required: The field is required. It will throw an error if the
value does not exist
Expand All @@ -102,10 +103,10 @@
<!-- In this _default configset, only four fields are pre-declared:
id, _version_, and _text_ and _root_. All other fields will be type guessed and added via the
"add-unknown-fields-to-the-schema" update request processor chain declared in solrconfig.xml.
Note that many dynamic fields are also defined - you can use them to specify a
Note that many dynamic fields are also defined - you can use them to specify a
field's type via field naming conventions - see below.
WARNING: The _text_ catch-all field will significantly increase your index size.
If you don't need it, consider removing it and the corresponding copyField directive."
-->
Expand All @@ -115,12 +116,12 @@
<field name="_version_" type="plong" indexed="false" stored="false"/>
<field name="_root_" type="string" indexed="true" stored="false" docValues="false" />

<!-- Start: Dataverse-specific -->




<!-- Start: Dataverse-specific -->

<!-- catchall field, containing all other searchable text fields (implemented
via copyField further on in this schema -->
<!-- Dataverse solr 7.3.0: for some reason the old text wasn't working so switched to _text_ for copyfields -->
Expand Down Expand Up @@ -212,7 +213,7 @@
<!-- https://redmine.hmdc.harvard.edu/issues/3482 -->
<!-- 'Sorting can be done on the "score" of the document, or on any multiValued="false" indexed="true" field provided that field is either non-tokenized (ie: has no Analyzer) or uses an Analyzer that only produces a single Term (ie: uses the KeywordTokenizer)' http://wiki.apache.org/solr/CommonQueryParameters#sort -->
<!-- http://stackoverflow.com/questions/13360706/solr-4-0-alphabetical-sorting-trouble/13361226#13361226 -->
<field name="nameSort" type="alphaOnlySort" indexed="true" stored="true"/>
<field name="nameSort" type="string" indexed="true" stored="true"/>

<field name="dateSort" type="pdate" indexed="true" stored="true"/>

Expand Down Expand Up @@ -751,7 +752,7 @@
<filter class="solr.TrimFilterFactory" />
<!-- The PatternReplaceFilter gives you the flexibility to use
Java Regular expression to replace any sequence of characters
matching a pattern with an arbitrary replacement string,
matching a pattern with an arbitrary replacement string,
which may include back references to portions of the original
string matched by the pattern.
Expand All @@ -764,8 +765,8 @@
<!-- https://redmine.hmdc.harvard.edu/issues/3482#note-11 -->
<!-- <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all" /> -->
</analyzer>
</fieldType>
</fieldType>

<!-- The StrField type is not analyzed, but indexed/stored verbatim. -->
<fieldType name="string" class="solr.StrField" sortMissingLast="true" docValues="true" />
<fieldType name="strings" class="solr.StrField" sortMissingLast="true" multiValued="true" docValues="true" />
Expand Down
69 changes: 0 additions & 69 deletions conf/solr/solrconfig.xml
Original file line number Diff line number Diff line change
Expand Up @@ -37,51 +37,6 @@
-->
<luceneMatchVersion>9.11</luceneMatchVersion>

<!-- <lib/> directives can be used to instruct Solr to load any Jars
identified and use them to resolve any "plugins" specified in
your solrconfig.xml or schema.xml (ie: Analyzers, Request
Handlers, etc...).
All directories and paths are resolved relative to the
instanceDir.
Please note that <lib/> directives are processed in the order
that they appear in your solrconfig.xml file, and are "stacked"
on top of each other when building a ClassLoader - so if you have
plugin jars with dependencies on other jars, the "lower level"
dependency jars should be loaded first.
If a "./lib" directory exists in your instanceDir, all files
found in it are included as if you had used the following
syntax...
<lib dir="./lib" />
-->

<!-- A 'dir' option by itself adds any files found in the directory
to the classpath, this is useful for including all jars in a
directory.
When a 'regex' is specified in addition to a 'dir', only the
files in that directory which completely match the regex
(anchored on both ends) will be included.
If a 'dir' option (with or without a regex) is used and nothing
is found that matches, a warning will be logged.
The example below can be used to load a Solr Module along
with their external dependencies.
-->
<!-- <lib dir="${solr.install.dir:../../../..}/modules/ltr/lib" regex=".*\.jar" /> -->

<!-- an exact 'path' can be used instead of a 'dir' to specify a
specific jar file. This will cause a serious error to be logged
if it can't be loaded.
-->
<!--
<lib path="../a-jar-that-does-not-exist.jar" />
-->

<!-- Data Directory
Used to specify an alternate directory to hold all index data
Expand Down Expand Up @@ -256,16 +211,9 @@
is recommended (see below).
"dir" - the target directory for transaction logs, defaults to the
solr data directory.
"numVersionBuckets" - sets the number of buckets used to keep
track of max version values when checking for re-ordered
updates; increase this value to reduce the cost of
synchronizing access to version buckets during high-volume
indexing, this requires 8 bytes (long) * numVersionBuckets
of heap space per Solr core.
-->
<updateLog>
<str name="dir">${solr.ulog.dir:}</str>
<int name="numVersionBuckets">${solr.ulog.numVersionBuckets:65536}</int>
</updateLog>

<!-- AutoCommit
Expand Down Expand Up @@ -509,23 +457,6 @@
-->
<queryResultMaxDocsCached>200</queryResultMaxDocsCached>

<!-- Use Filter For Sorted Query
A possible optimization that attempts to use a filter to
satisfy a search. If the requested sort does not include
score, then the filterCache will be checked for a filter
matching the query. If found, the filter will be used as the
source of document ids, and then the sort will be applied to
that.
For most situations, this will not be useful unless you
frequently get the same search repeatedly with different sort
options, and none of them ever use "score"
-->
<!--
<useFilterForSortedQuery>true</useFilterForSortedQuery>
-->

<!-- Query Related Event Listeners
Various IndexSearcher related events can trigger Listeners to
Expand Down

0 comments on commit f1d9eeb

Please sign in to comment.