Skip to content

Hello World from CORB

Mads Hansen edited this page Feb 4, 2018 · 6 revisions

Easiest Example EVER

To illustrate how easy it is to build a CORB job the following trivial example is proposed. Suppose we wanted to search for all documents that had the phrase "Hello World" in them and print out the document's title, author and synopsis. Well, if you didn't have many documents I would propose that you NOT use CORB for such an easy task. It could instead be run in MarkLogic's QueryConsole. But let's say there are a million documents out of several million that for some inexplicable reason had the phrase "Hello World" in them. In that case, I would suggest using CORB. You see, CORB is great for running multiple threads when executing transforms against thousands or millions of documents that have to be filtered (opened) in order to perform the task at hand. In other words, the transform can't simply use indexes to do what you need to do so the process is a little slower.

What You'll Need

To tackle this task, you'll need the jar for CORB, you'll need MarkLogic's XCC connection jar and a MarkLogic XDBC server attached to a MarkLogic database. You'll also need a selector module and a transform module. In this example, we're going to use XQuery for our selector and transform. However, if you choose, you may use JavaScript modules. Finally, we'll put a few properties in a properties file to handle the report that we want to generate.

Sample Document

Assume the following document is in a database under the URI /document/corbnewworld.xml

<document>
  <title>It's a CORB New World</title>
  <author>Corbyn Corbado</author>
  <synopsis>Like many bands, CORB has struggled for years in relative obscurity before finally having the overnight success that it is now experiencing.  This book serves to aid CORB in finally saying Hello World where have you been all my life?</synopsis>
</document>   

Selector: selector.xqy

let $uris := cts:uris((),(),cts:word-query("Hello World"))
return (fn:count($uris),$uris)

Transform: transform.xqy

declare variable $URI as xs:string external;

let $document := fn:doc($URI)/document
let $author := $document/author/string()
let $title := $document/title/string()
   
return fn:string-join(($title,$author,$URI),",")

Properties: my.properties

THREAD-COUNT=8
URIS-MODULE=selector.xqy|ADHOC
PROCESS-MODULE=transform.xqy|ADHOC
PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
EXPORT-FILE-NAME=HelloWorldReport.csv
PRE-BATCH-TASK=com.marklogic.developer.corb.PreBatchUpdateFileTask
EXPORT-FILE-TOP-CONTENT=Title,Author,URI

Command Line Syntax: script.sh

Assume there is a database called FFE set up on a MarkLogic instance running locally with user/password of admin/admin and a MarkLogic XDBC server listening on port 9000.

LIB=/path/to/where/your/jars/are/located

java -cp "$LIB/marklogic-xcc-6.0.2.jar:$LIB/corb2.jar" \
     -DOPTIONS-FILE=my.properties \
     com.marklogic.developer.corb.Manager \
     xcc://admin:admin@localhost:9000/FFE 

Note: On a Windows system, use ; instead of : as a delimiter for the jars in the classpath. i.e. java -cp "$LIB/marklogic-xcc-6.0.2.jar;$LIB/corb2.jar"

Results

Executing the shell script should produce a file called HelloWorldReport.csv in the same directory as the script. Opening the file should reveal the following two lines of text:

Title,Author,URI
It's a CORB New World,Corbyn Corbado,/document/corbnewworld.xml