Skip to content

Installing

Paul Cuddihy GE Research edited this page Jan 5, 2023 · 46 revisions

These are instructions to get the SemTK services and UI up and running. It presumes Linux, but can run on Windows as well. An easy way to do this in windows is by running the commands shown on this page inside a bash shell such as git bash, which is included in the windows git distribution.

Prerequisite: Install a triple store

Install a triple store such as Virtuoso or Fuseki.

Install Fuseki

Fuseki is the recommended triplestore for SemTk. The latest distribution is at https://jena.apache.org/download/index.cgi.

Startup instructions are at https://jena.apache.org/documentation/fuseki2/fuseki-quick-start.html

Create a dataset (e.g. named "SemTK") that persists across Fuseki restarts.

Or, Install Virtuoso

Virtuoso is available through OpenLink Software. Installation instructions are at http://virtuoso.openlinksw.com/howto/

Prerequisite: Install a web server

Install a web server such as Apache Tomcat or Apache HTTP Server (httpd)

Create a directory (referred to below as WEBAPPS) within your web server for the SemTK web app.

  • Example for Tomcat: /no_backup/tomcat/apache-tomcat-8.0.18/webapps/semtk
  • Example for httpd: /var/www/html

Install SemTK from source code or binary distribution

Create a directory (referred to below as SEMTK) for SemTK.

If updating/replacing an existing SemTK installation, be sure to save the existing ENV_OVERRIDE file.

To install SemTK from source code

If you need to install GIT, this might work for you:

$ sudo yum install git
$ git config user.name “Your Name”
$ git config user.email “[email protected]”`

If you need to install Maven, this might work for you:

$ sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
$ sudo sed -i s/\$releasever/6/g /etc/yum.repos.d/epel-apache-maven.repo
$ sudo yum install -y apache-maven

Clone and build SemTK:

$ cd SEMTK
$ git clone https://github.com/ge-semtk/semtk.git
$ mvn clean install -DskipTests

To install SemTK from binary distribution

  1. Download the binary distribution file (e.g. semtk-opensource-*-dist.tar.gz) from GitHub Releases to the SEMTK directory
  2. Unzip/untar the binary distribution file, which will create SEMTK/semtk-opensource

Create a local config (ENV_OVERRIDE) file

A default configuration file (.env) can be found in the top-level semtk-opensource directory. Typically, some of the settings in this file will need to be overridden for the local environment. This should be done by creating a file called ENV_OVERRIDE (do not change the .env file). Some common ENV_OVERRIDE entries are as follows:

To start only a subset of the SemTK services (this example represents the 10 core SemTK services):

export ENABLED_SERVICES="nodeGroupExecutionService nodeGroupService nodeGroupStoreService ontologyInfoService sparqlExtDispatchService sparqlGraphIngestionService sparqlGraphResultsService sparqlGraphStatusService sparqlQueryService utilityService"

To change the temporary results file directory:

export resultsFileLocation=/directory12345/semtk-results

If you are using Fuseki: your ENV_OVERRIDE should contain these settings:

export SERVICES_DATASET_SERVER_URL=http://localhost:3030/SemTK
export SERVICES_DATASET_ENDPOINT_TYPE=fuseki

Note: the ENV_OVERRIDE file will not be changed if the SemTK code is updated from GIT (e.g. with a git pull)

Start the SemTK services:

$ ./startServices.sh

Install the SemTK UI (SPARQLgraph):

Install the SemTK UI to your web server with the following command, where WEBAPPS is the web server directory described above:

$ ./updateWebapps.sh WEBAPPS

Test that the UI is working by hitting my.machine.com/sparqlGraph/index.html

Optionally try the "Hello World" demo.

Working with a reverse proxy

If your web machine can only be reached on ports like 80, 8080, 443 then you’ll need to use a reverse proxy.

There are many ways to do this, but here are some example lines for a reverse proxy .conf file (e.g. /etc/httpd/conf.d/default-site.conf)

    ProxyPass               /sparqlquery          http://127.0.0.1:12050/
    ProxyPassReverse        /sparqlquery          http://127.0.0.1:12050/

    ProxyPass               /ingestion            http://127.0.0.1:12091/
    ProxyPassReverse        /ingestion            http://127.0.0.1:12091/

In this case, the services are running on the same machine as the web server. If they a running somewhere else, use that url or IP instead of 127.0.0.1. Your configuration file will already have a line for:

    ProxyPass               /            http://127.0.0.1:8080/

(but it might not direct to port 8080). In any event, make sure the lines are inserted into the reverse proxy config file before this default line.

When using a reverse proxy, the urls in the “Configuration” step above would change to use the new urls instead of the port numbers:

url : "http://my.machine.ge.com/ingestion/ingestion/",                       
url : "http://my.machine.ge.com/sparqlquery/sparqlQueryService/",

Installing on AWS

Set up an Apache web server (httpd) aws docs

  • start http: sudo service httpd start
  • find root directory for server: grep DocumentRoot /etc/httpd/conf/httpd.conf. We'll refer to this directory (e.g. /var/www/html) as WEBAPPS
  • Set up /etc/httpd/conf.d/default-site.conf as outlined above in the "proxy" section.

Download the binary distribution file, move it to the AWS EC2 instance, and unzip it (all as described above).

The ENV_OVERRIDE file should look something like this:

# I needed to copy this whole folder to the node
export storeTemplateLocation=/run/semtk/semtk-opensource/sparqlGraphLibrary/src/main/resources/nodegroups/store.json

# TODO: I needed to create this folder
export resultsFileLocation=/tmp/DISPATCH_RESULTS

# TODO: on this host I can't find a name that works
export HOST_IP=10.200.100.200
export WEB_INGESTION_HOST=${HOST_IP}
export WEB_SPARQL_QUERY_HOST=${HOST_IP}
export WEB_STATUS_HOST=${HOST_IP}
export WEB_RESULTS_HOST=${HOST_IP}
export WEB_DISPATCH_HOST=${HOST_IP}
export WEB_HIVE_HOST=${HOST_IP}
export WEB_NODEGROUPSTORE_HOST=${HOST_IP}
export WEB_ONTOLOGYINFO_HOST=${HOST_IP}
export WEB_NODEGROUPEXECUTION_HOST=${HOST_IP}
export WEB_NODEGROUP_HOST=${HOST_IP}

# set the ports to use a proxy
export WEB_NODEGROUP_PORT=80/nodegroup
export WEB_INGESTION_PORT=80/ingestion
export WEB_SPARQL_QUERY_PORT=80/sparqlquery
export WEB_STATUS_PORT=80/status
export WEB_RESULTS_PORT=80/results
export WEB_HIVE_PORT=80/hive
export WEB_DISPATCH_PORT=80/dispatch
export WEB_NODEGROUPSTORE_PORT=80/nodegroupstore
export WEB_ONTOLOGYINFO_PORT=80/ontologyinfo
export WEB_NODEGROUPEXECUTION_PORT=80/nodegroupexec

# this is the only way to load a GE-specific variable that is needed in semtk-oss
export DISPATCHER_CLASS_NAME=com.ge.research.semtk.sparqlX.dispatch.EdcDispatcher

# set this to FQDN in order to get maximum speed to DGX within GE network
export resultsBaseURL=http://10.200.100.200/${PORT_SPARQLGRAPH_RESULTS_SERVICE}

TODO: I needed to edit semtk-opensource .fun because the host function didn't work

function sethostname
{
                export HOST_NAME=$(hostname)
}

Install the SemTK UI, as described above.

Start the SemTK Services, as described above.

SemTK Docker image

Here are instructions to build and run a SemTK Docker image: https://github.com/ge-semtk/semtk/blob/master/deploy/README.md

Google Analytics

To optionally attach Google Analytics to your SemTK UI (SPARQLgraph) installation:

  • Create a Google Analytics account
  • Get your Tracking ID from Google
  • Download googleAnalyticsLogger.js
  • In googleAnalyticsLogger.js, replace YOUR_GOOGLE_ANALYTICS_TRACKING_ID with your tracking id from Google
  • copy the file to your webapps, overwriting sparqlForm/main-oss/KDLEasyLoggerConfig.js
  • copy the file to your webapps, overwriting sparqlGraph/main-oss/KDLEasyLoggerConfigOss.js

Reload your web page and Google Analytics will begin flowing.