Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Geoportal Facets app and DCAT in 1.2.9 #313

Open
cybersea opened this issue Dec 18, 2018 · 16 comments
Open

Geoportal Facets app and DCAT in 1.2.9 #313

cybersea opened this issue Dec 18, 2018 · 16 comments

Comments

@cybersea
Copy link

I have deployed the geoportal facets application (solr v.4.1.0) to index a geoportal v.1.2.9 database. It is not indexing all the records, and it appears to be missing new records harvested from a DCAT source. I followed the instructions from the wiki here: https://github.com/Esri/geoportal-server/wiki/Geoportal-Facets-using-Apache-Solr

Any suggestions for debugging this issue? Is there a configuration that I'm missing?

@mhogeweg
Copy link
Member

hi, I recommend switching to Geoportal Server 2.x. This is a new 'generation' of Geoportal Server, based on elastic and implementing configurable faceted search as its starting point. You can find the application in its own GitHub repository.

@cybersea
Copy link
Author

Thanks @mhogeweg

We would like to switch, but we have a front-end application that is dependent upon the v.1.2.x architecture including the (optional) solr index. At this time we do not have the resources to rewrite this front-end app, so I need to try to make the latest 1.2.x version work for now. Hopefully we can migrate the app in the future to your new architecture.

Any assistance with debugging this issue or pointers is greatly appreciated.

@mhogeweg
Copy link
Member

is your site public by chance?

@cybersea
Copy link
Author

The front-end production website is here: http://portal.westcoastoceans.org/discover/

If you mean the upgraded geoportal 1.2.9 site with solr that I'm trying to debug, I'm experimenting with that on our dev server and it's not yet hooked up to the front-end.
http://207.141.116.172/geoportal129/catalog/main/home.page
http://207.141.116.172/gc129/ (this Geoportal-Solr webpage is not working for some reason, so I am viewing the solr index from our Tomcat Manager app, but that requires a login).

@mhogeweg
Copy link
Member

I checked out the gc129 site and can see filters and apply them:

image

@mhogeweg
Copy link
Member

what seems to break is the link to the xml. for example for the first entry on the page above the links are:

the XML link points to 127.0.0.1, which would be my machine.

also the link in the solrjson response url.metadata_s points to 127.0.0.1:8080.

I suggest checking the configuration and see what gpt.instance.url is set to.

I see this page has about 1600 items, while the vanilla gpt site has some 2100. did you follow step 7 and have the GcService web app deployed?

@cybersea
Copy link
Author

Thanks @mhogeweg. Glad to see gc129 site is working on your end -- I'll try to do some more debugging on this end.

I copied the configuration from our existing site, and noticed that they are set up to point to local host, which I assumed was intended. I'm not too worried about the links not working since we are not using that aspect, but I could change it to the main (dev) URL: http://207.141.116.172

The discrepancy that you see between the Geoportal-Solr page and the vanilla gpt site is what I'm trying to debug. That difference is equal to the number of records that were pulled from the DCAT source: http://geo.wa.gov/data.json (WA Geospatial Open Data Portal)

I followed step 7 and deployed a new gc service web app to go with this geoportal instance, and named it gc129 (instead of GcService). And, it is successfully working as of yesterday and as it spun up I could see the count of indexed files increasing until it hit 1585.

@mhogeweg
Copy link
Member

This app was created before we harvested DCAT. the app takes metadata and applies an xslt transformation. That transformation did not include support for DCAT as a structure. I'm making some updates and will share shortly.

@mhogeweg
Copy link
Member

attached are two xslt that should replace the corresponding files in the folder:
...\GcService\WEB-INF\classes\gc-config\xmltypes

These transformations take the metadata in the geoportal server index and prepare them for solr. The DCAT items were not indexed as the xslt did not know how to deal with the format yet.

xmltypes.zip

please check with these and see if the DCAT items do get indexed (may require tomcat stop/start)

@cybersea
Copy link
Author

Thank you very much @mhogeweg! This is a big help to us.

I have installed the new config files and restarted tomcat, but haven't seen a change in the indexed files. Is there a way to manually do this -- I know it is scheduled via one of the config files to run in the middle of the night.

@cybersea
Copy link
Author

I did not detect any changes between the dc-toSolr.xslt you provided in the zip file, and the one from the existing repo. Should there be changes in that file? or just the dc-base-toSolr.xslt

@mhogeweg
Copy link
Member

it is just the dc-base one. the other one imports this one, so you may keep the existing one. I included it as they 'go together'. I'll check on forcing solr to reindex the content.

@cybersea
Copy link
Author

I stopped tomcat, deleted all the files from the solr index (data folder), restarted tomcat and watched the solr index repopulate from 0 records and stop at 1585 again. So, unfortunately, this .xslt file change does not appear to be working for me.

@cybersea
Copy link
Author

@mhogeweg I added your xslt files to my Geoportal Facets for DCAT and solr is still not indexing the DCAT entries.
http://207.141.116.172/geoportal129/catalog/main/home.page (2129 results)
http://207.141.116.172/gc129/ (1585 results)

Any suggestions?

@mhogeweg
Copy link
Member

I'm going to look into this a bit more. I harvested the geoportal129 site into our geoportal 2 sandbox: http://geoss.esri.com/geoportal2/#. If you open the 'source of origin' facet, you'll see your ip address listed with 1477 documents. My harvested indicated that 651 docs failed to publish (total 2128 retrieved).

I'll try to understand why so many failed (likely a validation issue).

Do you see any errors in your solr logs?

@cybersea
Copy link
Author

Thanks @mhogeweg.

There are no solr errors in the tomcat (Catalina) logs. Is there another set of logs I should check?

We have customized or site a bit as far as validation to loosen it up a bit, so maybe that's the reason for validation failing on your side. (?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants