Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6142 - Flexible Solr schema deployment #6146

Merged
merged 25 commits into from
Sep 12, 2019
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
5db3425
Split up Solr schema.xml.
poikilotherm Sep 5, 2019
7f53762
Change relevant scripts (Installer Makefile, Docker AIO, Vagrant) to …
poikilotherm Sep 5, 2019
020ba11
Fix docs regarding schema.xml copies.
poikilotherm Sep 5, 2019
a0c12ad
Add processing script for Solr schema.
poikilotherm Sep 6, 2019
e82a345
Change relevant scripts (Installer Makefile, Docker AIO) to include t…
poikilotherm Sep 6, 2019
8db5cac
Add TODO in developer guide tips section about moving and extending S…
poikilotherm Sep 6, 2019
9898b01
Change metadata customization docs with explanations for using the ne…
poikilotherm Sep 6, 2019
5ddcfa0
Remove Harvard-specific fields from Solr schema.
poikilotherm Sep 9, 2019
811c6cb
Solves #3976. Fix docs not mentioning Journals metadata schema. See #…
poikilotherm Sep 9, 2019
5fb21f5
Update appendix.rst
jggautier Sep 9, 2019
9439056
fix 404 to tsv file for journal metadata block
pdurbin Sep 9, 2019
555d1ce
Rename updateSchemaCMB.sh to updateSchemaMDB.sh.
poikilotherm Sep 10, 2019
a5cb7e8
Fix JATS reference for #3976.
poikilotherm Sep 10, 2019
17f03a4
Extend updateSchemaMDB.sh with option handling in addition to env vars.
poikilotherm Sep 10, 2019
27bc4b2
Enhance documentation for updateSchemaMDB.sh in guides.
poikilotherm Sep 10, 2019
8a2d9f5
Add updateSchemaMDB.sh to setup-optional-harvard.sh as a usage example.
poikilotherm Sep 10, 2019
ca83607
Merge branch 'develop' into 6142-flex-solr-schema #6142
pdurbin Sep 10, 2019
886dc8d
make updateSchemaMDB.sh downloadable #6142
pdurbin Sep 10, 2019
4889887
Rename XML files from schema_dv_cmb_XXX.xml to schema_dv_mdb_XXX.xml.
poikilotherm Sep 11, 2019
ecbaca8
Add release notes for Solr schema.xml separation. Relates to #6142.
poikilotherm Sep 11, 2019
87a8e53
tweak the release notes #6142
pdurbin Sep 11, 2019
d8ae325
enumerate files, fix formatting #6142
pdurbin Sep 11, 2019
aebc850
echo a *suggestion* to run updateSchemaMDB.sh #6142
pdurbin Sep 11, 2019
7b2becd
explain updateSchemaMDB.sh must be run on Solr server #6142
pdurbin Sep 11, 2019
863af4f
Refactor update script to avoid csplit and use plain grep.
poikilotherm Sep 11, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion conf/docker-aio/1prep.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@
# this was based off the phoenix deployment; and is likely uglier and bulkier than necessary in a perfect world

mkdir -p testdata/doc/sphinx-guides/source/_static/util/
cp ../solr/7.3.1/schema.xml testdata/
cp ../solr/7.3.1/schema*.xml testdata/
cp ../solr/7.3.1/solrconfig.xml testdata/
cp ../solr/7.3.1/updateSchemaMDB.sh testdata/
cp ../jhove/jhove.conf testdata/
cp ../jhove/jhoveConfig.xsd testdata/
cd ../../
Expand Down
5 changes: 3 additions & 2 deletions conf/docker-aio/c7.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ RUN yum install -y jq lsof awscli

# copy and unpack dependencies (solr, glassfish)
COPY dv /tmp/dv
COPY testdata/schema.xml /tmp/dv
COPY testdata/schema*.xml /tmp/dv/
COPY testdata/solrconfig.xml /tmp/dv

# ITs need files
Expand All @@ -29,7 +29,7 @@ RUN sudo -u postgres /usr/pgsql-9.6/bin/initdb -D /var/lib/pgsql/data
# copy configuration related files
RUN cp /tmp/dv/pg_hba.conf /var/lib/pgsql/data/
RUN cp -r /opt/solr-7.3.1/server/solr/configsets/_default /opt/solr-7.3.1/server/solr/collection1
RUN cp /tmp/dv/schema.xml /opt/solr-7.3.1/server/solr/collection1/conf/schema.xml
RUN cp /tmp/dv/schema*.xml /opt/solr-7.3.1/server/solr/collection1/conf/
RUN cp /tmp/dv/solrconfig.xml /opt/solr-7.3.1/server/solr/collection1/conf/solrconfig.xml

# skipping glassfish user and solr user (run both as root)
Expand Down Expand Up @@ -58,6 +58,7 @@ COPY dv/install/ /opt/dv/
COPY install.bash /opt/dv/
COPY entrypoint.bash /opt/dv/
COPY testdata /opt/dv/testdata
COPY testdata/updateSchemaMDB.sh /opt/dv/testdata/
COPY testscripts/* /opt/dv/testdata/
COPY setupIT.bash /opt/dv
WORKDIR /opt/dv
Expand Down
460 changes: 4 additions & 456 deletions conf/solr/7.3.1/schema.xml

Large diffs are not rendered by default.

157 changes: 157 additions & 0 deletions conf/solr/7.3.1/schema_dv_mdb_copies.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
<schema>
<copyField source="accessToSources" dest="_text_" maxChars="3000"/>
<copyField source="actionsToMinimizeLoss" dest="_text_" maxChars="3000"/>
<copyField source="alternativeTitle" dest="_text_" maxChars="3000"/>
<copyField source="alternativeURL" dest="_text_" maxChars="3000"/>
<copyField source="astroFacility" dest="_text_" maxChars="3000"/>
<copyField source="astroInstrument" dest="_text_" maxChars="3000"/>
<copyField source="astroObject" dest="_text_" maxChars="3000"/>
<copyField source="astroType" dest="_text_" maxChars="3000"/>
<copyField source="author" dest="_text_" maxChars="3000"/>
<copyField source="authorAffiliation" dest="_text_" maxChars="3000"/>
<copyField source="authorIdentifier" dest="_text_" maxChars="3000"/>
<copyField source="authorIdentifierScheme" dest="_text_" maxChars="3000"/>
<copyField source="authorName" dest="_text_" maxChars="3000"/>
<copyField source="characteristicOfSources" dest="_text_" maxChars="3000"/>
<copyField source="city" dest="_text_" maxChars="3000"/>
<copyField source="cleaningOperations" dest="_text_" maxChars="3000"/>
<copyField source="collectionMode" dest="_text_" maxChars="3000"/>
<copyField source="collectorTraining" dest="_text_" maxChars="3000"/>
<copyField source="contributor" dest="_text_" maxChars="3000"/>
<copyField source="contributorName" dest="_text_" maxChars="3000"/>
<copyField source="contributorType" dest="_text_" maxChars="3000"/>
<copyField source="controlOperations" dest="_text_" maxChars="3000"/>
<copyField source="country" dest="_text_" maxChars="3000"/>
<copyField source="coverage.Depth" dest="_text_" maxChars="3000"/>
<copyField source="coverage.ObjectCount" dest="_text_" maxChars="3000"/>
<copyField source="coverage.ObjectDensity" dest="_text_" maxChars="3000"/>
<copyField source="coverage.Polarization" dest="_text_" maxChars="3000"/>
<copyField source="coverage.Redshift.MaximumValue" dest="_text_" maxChars="3000"/>
<copyField source="coverage.Redshift.MinimumValue" dest="_text_" maxChars="3000"/>
<copyField source="coverage.RedshiftValue" dest="_text_" maxChars="3000"/>
<copyField source="coverage.SkyFraction" dest="_text_" maxChars="3000"/>
<copyField source="coverage.Spatial" dest="_text_" maxChars="3000"/>
<copyField source="coverage.Spectral.Bandpass" dest="_text_" maxChars="3000"/>
<copyField source="coverage.Spectral.CentralWavelength" dest="_text_" maxChars="3000"/>
<copyField source="coverage.Spectral.MaximumWavelength" dest="_text_" maxChars="3000"/>
<copyField source="coverage.Spectral.MinimumWavelength" dest="_text_" maxChars="3000"/>
<copyField source="coverage.Spectral.Wavelength" dest="_text_" maxChars="3000"/>
<copyField source="coverage.Temporal" dest="_text_" maxChars="3000"/>
<copyField source="coverage.Temporal.StartTime" dest="_text_" maxChars="3000"/>
<copyField source="coverage.Temporal.StopTime" dest="_text_" maxChars="3000"/>
<copyField source="dataCollectionSituation" dest="_text_" maxChars="3000"/>
<copyField source="dataCollector" dest="_text_" maxChars="3000"/>
<copyField source="dataSources" dest="_text_" maxChars="3000"/>
<copyField source="datasetContact" dest="_text_" maxChars="3000"/>
<copyField source="datasetContactAffiliation" dest="_text_" maxChars="3000"/>
<copyField source="datasetContactEmail" dest="_text_" maxChars="3000"/>
<copyField source="datasetContactName" dest="_text_" maxChars="3000"/>
<copyField source="datasetLevelErrorNotes" dest="_text_" maxChars="3000"/>
<copyField source="dateOfCollection" dest="_text_" maxChars="3000"/>
<copyField source="dateOfCollectionEnd" dest="_text_" maxChars="3000"/>
<copyField source="dateOfCollectionStart" dest="_text_" maxChars="3000"/>
<copyField source="dateOfDeposit" dest="_text_" maxChars="3000"/>
<copyField source="depositor" dest="_text_" maxChars="3000"/>
<copyField source="deviationsFromSampleDesign" dest="_text_" maxChars="3000"/>
<copyField source="distributionDate" dest="_text_" maxChars="3000"/>
<copyField source="distributor" dest="_text_" maxChars="3000"/>
<copyField source="distributorAbbreviation" dest="_text_" maxChars="3000"/>
<copyField source="distributorAffiliation" dest="_text_" maxChars="3000"/>
<copyField source="distributorLogoURL" dest="_text_" maxChars="3000"/>
<copyField source="distributorName" dest="_text_" maxChars="3000"/>
<copyField source="distributorURL" dest="_text_" maxChars="3000"/>
<copyField source="dsDescription" dest="_text_" maxChars="3000"/>
<copyField source="dsDescriptionDate" dest="_text_" maxChars="3000"/>
<copyField source="dsDescriptionValue" dest="_text_" maxChars="3000"/>
<copyField source="eastLongitude" dest="_text_" maxChars="3000"/>
<copyField source="frequencyOfDataCollection" dest="_text_" maxChars="3000"/>
<copyField source="geographicBoundingBox" dest="_text_" maxChars="3000"/>
<copyField source="geographicCoverage" dest="_text_" maxChars="3000"/>
<copyField source="geographicUnit" dest="_text_" maxChars="3000"/>
<copyField source="grantNumber" dest="_text_" maxChars="3000"/>
<copyField source="grantNumberAgency" dest="_text_" maxChars="3000"/>
<copyField source="grantNumberValue" dest="_text_" maxChars="3000"/>
<copyField source="journalArticleType" dest="_text_" maxChars="3000"/>
<copyField source="journalIssue" dest="_text_" maxChars="3000"/>
<copyField source="journalPubDate" dest="_text_" maxChars="3000"/>
<copyField source="journalVolume" dest="_text_" maxChars="3000"/>
<copyField source="journalVolumeIssue" dest="_text_" maxChars="3000"/>
<copyField source="keyword" dest="_text_" maxChars="3000"/>
<copyField source="keywordValue" dest="_text_" maxChars="3000"/>
<copyField source="keywordVocabulary" dest="_text_" maxChars="3000"/>
<copyField source="keywordVocabularyURI" dest="_text_" maxChars="3000"/>
<copyField source="kindOfData" dest="_text_" maxChars="3000"/>
<copyField source="language" dest="_text_" maxChars="3000"/>
<copyField source="northLongitude" dest="_text_" maxChars="3000"/>
<copyField source="notesText" dest="_text_" maxChars="3000"/>
<copyField source="originOfSources" dest="_text_" maxChars="3000"/>
<copyField source="otherDataAppraisal" dest="_text_" maxChars="3000"/>
<copyField source="otherGeographicCoverage" dest="_text_" maxChars="3000"/>
<copyField source="otherId" dest="_text_" maxChars="3000"/>
<copyField source="otherIdAgency" dest="_text_" maxChars="3000"/>
<copyField source="otherIdValue" dest="_text_" maxChars="3000"/>
<copyField source="otherReferences" dest="_text_" maxChars="3000"/>
<copyField source="producer" dest="_text_" maxChars="3000"/>
<copyField source="producerAbbreviation" dest="_text_" maxChars="3000"/>
<copyField source="producerAffiliation" dest="_text_" maxChars="3000"/>
<copyField source="producerLogoURL" dest="_text_" maxChars="3000"/>
<copyField source="producerName" dest="_text_" maxChars="3000"/>
<copyField source="producerURL" dest="_text_" maxChars="3000"/>
<copyField source="productionDate" dest="_text_" maxChars="3000"/>
<copyField source="productionPlace" dest="_text_" maxChars="3000"/>
<copyField source="publication" dest="_text_" maxChars="3000"/>
<copyField source="publicationCitation" dest="_text_" maxChars="3000"/>
<copyField source="publicationIDNumber" dest="_text_" maxChars="3000"/>
<copyField source="publicationIDType" dest="_text_" maxChars="3000"/>
<copyField source="publicationURL" dest="_text_" maxChars="3000"/>
<copyField source="redshiftType" dest="_text_" maxChars="3000"/>
<copyField source="relatedDatasets" dest="_text_" maxChars="3000"/>
<copyField source="relatedMaterial" dest="_text_" maxChars="3000"/>
<copyField source="researchInstrument" dest="_text_" maxChars="3000"/>
<copyField source="resolution.Redshift" dest="_text_" maxChars="3000"/>
<copyField source="resolution.Spatial" dest="_text_" maxChars="3000"/>
<copyField source="resolution.Spectral" dest="_text_" maxChars="3000"/>
<copyField source="resolution.Temporal" dest="_text_" maxChars="3000"/>
<copyField source="responseRate" dest="_text_" maxChars="3000"/>
<copyField source="samplingErrorEstimates" dest="_text_" maxChars="3000"/>
<copyField source="samplingProcedure" dest="_text_" maxChars="3000"/>
<copyField source="series" dest="_text_" maxChars="3000"/>
<copyField source="seriesInformation" dest="_text_" maxChars="3000"/>
<copyField source="seriesName" dest="_text_" maxChars="3000"/>
<copyField source="socialScienceNotes" dest="_text_" maxChars="3000"/>
<copyField source="socialScienceNotesSubject" dest="_text_" maxChars="3000"/>
<copyField source="socialScienceNotesText" dest="_text_" maxChars="3000"/>
<copyField source="socialScienceNotesType" dest="_text_" maxChars="3000"/>
<copyField source="software" dest="_text_" maxChars="3000"/>
<copyField source="softwareName" dest="_text_" maxChars="3000"/>
<copyField source="softwareVersion" dest="_text_" maxChars="3000"/>
<copyField source="southLongitude" dest="_text_" maxChars="3000"/>
<copyField source="state" dest="_text_" maxChars="3000"/>
<copyField source="studyAssayCellType" dest="_text_" maxChars="3000"/>
<copyField source="studyAssayMeasurementType" dest="_text_" maxChars="3000"/>
<copyField source="studyAssayOrganism" dest="_text_" maxChars="3000"/>
<copyField source="studyAssayOtherMeasurmentType" dest="_text_" maxChars="3000"/>
<copyField source="studyAssayOtherOrganism" dest="_text_" maxChars="3000"/>
<copyField source="studyAssayPlatform" dest="_text_" maxChars="3000"/>
<copyField source="studyAssayTechnologyType" dest="_text_" maxChars="3000"/>
<copyField source="studyDesignType" dest="_text_" maxChars="3000"/>
<copyField source="studyFactorType" dest="_text_" maxChars="3000"/>
<copyField source="subject" dest="_text_" maxChars="3000"/>
<copyField source="subtitle" dest="_text_" maxChars="3000"/>
<copyField source="targetSampleActualSize" dest="_text_" maxChars="3000"/>
<copyField source="targetSampleSize" dest="_text_" maxChars="3000"/>
<copyField source="targetSampleSizeFormula" dest="_text_" maxChars="3000"/>
<copyField source="timeMethod" dest="_text_" maxChars="3000"/>
<copyField source="timePeriodCovered" dest="_text_" maxChars="3000"/>
<copyField source="timePeriodCoveredEnd" dest="_text_" maxChars="3000"/>
<copyField source="timePeriodCoveredStart" dest="_text_" maxChars="3000"/>
<copyField source="title" dest="_text_" maxChars="3000"/>
<copyField source="topicClassValue" dest="_text_" maxChars="3000"/>
<copyField source="topicClassVocab" dest="_text_" maxChars="3000"/>
<copyField source="topicClassVocabURI" dest="_text_" maxChars="3000"/>
<copyField source="topicClassification" dest="_text_" maxChars="3000"/>
<copyField source="unitOfAnalysis" dest="_text_" maxChars="3000"/>
<copyField source="universe" dest="_text_" maxChars="3000"/>
<copyField source="weighting" dest="_text_" maxChars="3000"/>
<copyField source="westLongitude" dest="_text_" maxChars="3000"/>
</schema>
Loading