Skip to content

Java Client

Matthew Davis edited this page May 31, 2022 · 44 revisions

Zulia Java Client

Gradle

repositories {
    mavenCentral()
    maven {
        url "https://maven.ascend-tech.us/repo/"
    }
}


dependencies {
    implementation 'io.zulia:zulia-client:2.4.1'
    implementation 'org.mongodb:mongodb-driver-sync:4.5.0'
}

Maven

<repository>
   <id>astMaven</id>
   <name>AST Maven</name>
   <url>https://maven.ascend-tech.us/repo</url>
</repository>
<dependencies>
  <dependency>
      <groupId>io.zulia</groupId>
      <artifactId>zulia-client</artifactId>
      <version>2.4.1</version>
  </dependency>
  <dependency>
      <groupId>org.mongodb</groupId>
      <artifactId>mongodb-driver-sync</artifactId>
      <version>4.5.0</version>
  </dependency>
</dependencies>

Creating a Client

The Zulia java client is named ZuliaWorkPool. ZuliaWorkPool is a thread safe connection pool using a gRPC connection to Zulia on the service port. There are async versions methods of all methods that return a ListenableFuture<> of the result.

Simple Client creation

ZuliaWorkPool zuliaWorkPool = new ZuliaWorkPool(new ZuliaPoolConfig().addNode("someIp")); 

Full Client Configuration

ZuliaPoolConfig zuliaPoolConfig = new ZuliaPoolConfig();
zuliaPoolConfig.addNode("someIp");
//optionally give ports if not default values
//zuliaPoolConfig.addNode("localhost", 32191, 32192);

//optional settings (default values shown)
zuliaPoolConfig.setDefaultRetries(0);//Number of attempts to try before throwing an exception
zuliaPoolConfig.setMaxConnections(10); //Maximum connections per server
zuliaPoolConfig.setMaxIdle(10); //Maximum idle connections per server
zuliaPoolConfig.setCompressedConnection(false); //Use this for WAN client connections
zuliaPoolConfig.setPoolName(null); //For logging purposes only, null gives default of zuliaPool-n
zuliaPoolConfig.setNodeUpdateEnabled(true); //Periodically update the nodes of the cluster and to enable smart routing to the correct node. Do not use this with ssh port forwarding.  This can be done manually with zuliaWorkPool.updateNodes();
zuliaPoolConfig.setNodeUpdateInterval(10000); //Interval to update the nodes in ms
zuliaPoolConfig.setRoutingEnabled(true); //enable routing indexing to the correct server, this only works if automatic node updating is enabled or it is periodically called manually.

//create the connection pool
ZuliaWorkPool zuliaWorkPool = new ZuliaWorkPool(zuliaPoolConfig);

Creating an Index

ClientIndexConfig indexConfig = new ClientIndexConfig().setIndexName("test").addDefaultSearchField("test");
indexConfig.addFieldConfig(FieldConfigBuilder.createString("title").indexAs(DefaultAnalyzers.STANDARD));
indexConfig.addFieldConfig(FieldConfigBuilder.createString("issn").indexAs(DefaultAnalyzers.LC_KEYWORD).facet());
indexConfig.addFieldConfig(FieldConfigBuilder.createInt("an").index().sort());
// createLong, createFloat, createDouble, createBool, createDate, createVector, createUnitVector is also available
// or create(storedFieldName, fieldType)
CreateIndex createIndex = new CreateIndex(indexConfig);
zuliaWorkPool.createIndex(createIndex);
  • The number of shards and unique id field cannot be changed for the index once the index is created. Set number of shards to greater than or equal to the maximum number of nodes possible in the cluster

  • Zulia supports indexes created from object annotations. For more info see section on Object Persistence.

  • Changing or adding analyzers for fields that are already indexed may require re-indexing for desired results.

Index Config Details

Full ClientIndexConfig settings are explained below:

defaultSearchField - The field that is searched if no field is given to a query (missing query fields or direct fielded search)
defaultAnalyzer - The default analyzer for all fields not specified by a field config
fieldConfig - Overrides the default analyzer for a field
shardCommitInterval - Indexes or deletes to shard before a commit is forced (default 3200)
idleTimeWithoutCommit - Time without indexing before commit is forced in seconds (0 disables) (default 30)
applyUncommitedDeletes - Apply all deletes before search (default true)
shardQueryCacheSize - Number of queries cached at the shard level
shardQueryCacheMaxAmount - Queries with more than this amount of documents returned are not cached

//The following are used in optimizing federation of shards when more than one shard is used. 
//The amount requested from each shard on a query is (((amountRequestedByQuery / numberOfShards) + minShardRequest) * requestFactor).
requestFactor - Used in calculation of request size for a shard (default 2.0)
minShardRequest - Added to the calculated request for a shard (default 2)
shardTolerance - Difference in scores between shards tolerated before requesting full results (query request amount) from the shard (default 0.05)

These Field Types are Available

STRING
NUMERIC_INT
NUMERIC_LONG
NUMERIC_FLOAT 
NUMERIC_DOUBLE
DATE
BOOL 
UNIT_VECTOR
VECTOR

These built-in Analyzers are available (DefaultAnalyzers)

KEYWORD - Field is searched as one token
LC_KEYWORD - Field is searched as one token in lowercase (case insenstive, use for wildcard searches)
LC_CONCAT_ALL
STANDARD - Standard lucene analyzer (good for general full text)
MIN_STEM - Minimal English Stemmer
KSTEMMED - K Stemmer
LSH - Locality Sensitive Hash
TWO_TWO_SHINGLE - (n-grams)
THREE_THREE_SHINGLE - (n-grams)

Custom Analyzer

clientIndexConfig.addAnalyzerSetting("myAnalyzer", Tokenizer.WHITESPACE, Arrays.asList(Filter.ASCII_FOLDING, Filter.LOWERCASE), Similarity.BM25);
clientIndexConfig.addFieldConfig(FieldConfigBuilder.create("abstract", FieldType.STRING).indexAs("myAnalyzer"));

Delete Index

zuliaWorkPool.deleteIndex("myIndex");

Storing / Indexing Documents

Zulia supports indexing and storing from object annotations. For more info see section on Object Persistence

BSON Document (org.mongodb.bson)

Document document = new Document();
document.put("id", "myid222");
document.put("title", "Magic Java Beans");
document.put("issn", "4321-4321");

Store store = new Store("myid222", "myIndexName");

ResultDocBuilder resultDocumentBuilder = new ResultDocBuilder().setDocument(document);
//optional metadata document 
resultDocumentBuilder.setMetadata(new Document().append("test1", "val1").append("test2", "val2"));
store.setResultDocument(resultDocumentBuilder);

zuliaWorkPool.store(store);

Storing Associated Documents

AssociatedBuilder associatedBuilder = new AssociatedBuilder();
associatedBuilder.setFilename("myfile2.txt");
// either set as text
associatedBuilder.setDocument("Some Text3");
// or as bytes
associatedBuilder.setDocument(new byte[]{0, 1, 2, 3});
associatedBuilder.setMetadata(new Document().append("mydata", "myvalue2").append("sometypeinfo", "text file2"));

//can be part of the same store request as the document
Store store = new Store("myid123", "someIndex");

//multiple associated documented can be added at once
store.addAssociatedDocument(associatedBuilder);

zuliaWorkPool.store(store);

Storing Large Associated Documents (Streaming)

StoreLargeAssociated storeLargeAssociated = new StoreLargeAssociated("myid333", "myIndexName", "myfilename", new File("/tmp/myFile"));
zuliaWorkPool.storeLargeAssociated(storeLargeAssociated);

Fetching Documents

Fetch Document

FetchDocument fetchDocument = new FetchDocument("myid222", "myIndex");

FetchResult fetchResult = zuliaWorkPool.fetch(fetchDocument);

if (fetchResult.hasResultDocument()) {
    Document document = fetchResult.getDocument();

    //Get optional Meta
    Document meta = fetchResult.getMeta();
}

Fetch All Associated

FetchAllAssociated fetchAssociated = new FetchAllAssociated("myid123", "myIndexName");

FetchResult fetchResult = zuliaWorkPool.fetch(fetchAssociated);

if (fetchResult.hasResultDocument()) {
    Document object = fetchResult.getDocument();

    //Get optional metadata
    Document meta = fetchResult.getMeta();
}

for (AssociatedResult ad : fetchResult.getAssociatedDocuments()) {
    //use correct function for document type
    String text = ad.getDocumentAsUtf8();
    // OR
    byte[] documentAsBytes = ad.getDocumentAsBytes();

    //get optional metadata
    Document meta = ad.getMeta();

    String filename = ad.getFilename();

}

Fetch Associated

FetchAssociated fetchAssociated = new FetchAssociated("myid123", "myIndexName", "myfile2");

FetchResult fetchResult = zuliaWorkPool.fetch(fetchAssociated);


AssociatedResult ad = fetchResult.getFirstAssociatedDocument();
//use correct function for document type
String text = ad.getDocumentAsUtf8();
// OR
byte[] documentAsBytes = ad.getDocumentAsBytes();

//get optional metadata
Document meta = ad.getMeta();

String filename = ad.getFilename();

Fetch Large Associated (Streaming)

FetchLargeAssociated fetchLargeAssociated = new FetchLargeAssociated("myid333", "myIndexName", "myfilename", new File("/tmp/myFetchedFile"));
zuliaWorkPool.fetchLargeAssociated(fetchLargeAssociated);

Querying

Simple Query with only ids returned

Search search = new Search("myIndexName").setAmount(10);
search.addQuery(new ScoredQuery("issn:1234-1234 AND title:special"));
search.setResultFetchType(ZuliaQuery.FetchType.NONE); // just return the score and unique id 

SearchResult searchResult = zuliaWorkPool.search(search);

long totalHits = searchResult.getTotalHits();

System.out.println("Found <" + totalHits + "> hits");
for (ZuliaQuery.ScoredResult sr : searchResult.getResults()) {
    System.out.println("Matching document <" + sr.getUniqueId() + "> with score <" + sr.getScore() + ">");
}

Simple Query with full documents returned

Search search = new Search("myIndexName").setAmount(10);
search.addQuery(new ScoredQuery("issn:1234-1234 AND title:special"));
search.setResultFetchType(ZuliaQuery.FetchType.FULL); //return the full bson document that was stored

SearchResult searchResult = zuliaWorkPool.search(search);

long totalHits = searchResult.getTotalHits();

System.out.println("Found <" + totalHits + "> hits");
for (Document document : searchResult.getDocuments()) {
    System.out.println("Matching document <" + document + ">");
}

Search Multiple Indexes

Search search = new Search("myIndexName", "myOtherIndex").setAmount(10);
search.addQuery(new ScoredQuery("issn:1234-1234 AND title:special"));


SearchResult searchResult = zuliaWorkPool.search(search);

long totalHits = searchResult.getTotalHits();

System.out.println("Found <" + totalHits + "> hits");
for (ZuliaQuery.ScoredResult sr : searchResult.getResults()) {
    Document doc = ResultHelper.getDocumentFromScoredResult(sr);
    System.out.println("Matching document <" + sr.getUniqueId() + "> with score <" + sr.getScore() + "> from index <" + sr.getIndexName() + ">");
    System.out.println(" full document <" + doc + ">");
}

Sorting

Search search = new Search("myIndexName").setAmount(100);
search.addQuery(new FilterQuery("title:(brown AND bear)"));
// can add multiple sorts with ascending or descending (default ascending)
// can also specify whether missing values are returned first or last (default missing first)
search.addSort(new Sort("year").descending());
search.addSort(new Sort("journal").ascending().missingLast());
SearchResult searchResult = zuliaWorkPool.search(search);

Query Fields

Query fields set the search field used when one is not given for a term. if query fields are not set on the query and a term is not qualified, the default search fields on the index will be used.

// search for lung in title,abstract AND cancer in title,abstract AND treatment in title
search.addQuery(new ScoredQuery("lung cancer title:treatment").addQueryFields("title", "abstract").setDefaultOperator(Operator.AND));

// search for lung in default index fields OR cancer in default index fields
// OR is the default operator unless set
search.addQuery(new ScoredQuery("lung cancer"));

Filter Queries

Filter queries are the same as scored queries except they do not require the search engine to compute a score. They should be used in cases where a sort is being applied and a score is not needed or when a filter should not influence the relevance score. Filter queries and scored queries can be combined together.

Search search = new Search("myIndexName").setAmount(100);
// include only years 2020 forward
search.addQuery(new FilterQuery("year:[2020 TO *]"));
// require both terms to be matched in either the title or abstract
search.addQuery(new FilterQuery("cheetah cub").setDefaultOperator(Operator.AND).addQueryFields("title", "abstract"));
// require two out of the three terms in the abstract
search.addQuery(new FilterQuery("sleep play run").setMinShouldMatch(2).addQueryField("abstract"));
// exclude the journal nature
search.addQuery(new FilterQuery("journal:Nature").exclude());
SearchResult searchResult = zuliaWorkPool.search(search);

Count Facets

// Can set number of documents to return to 0 or omit setAmount unless you want the documents at the same time
// normally is combined with a FilterQuery or ScoredQuery to count a set of results
Search search = new Search("myIndexName").setAmount(0);

search.addCountFacet(new CountFacet("issn").setTopN(20));

SearchResult searchResult = zuliaWorkPool.search(search);
for (ZuliaQuery.FacetCount fc : searchResult.getFacetCounts("issn")) {
    System.out.println("Facet <" + fc.getFacet() + "> with count <" + fc.getCount() + ">");
}

Numeric Stat

// show number of values, number of documents, min, max, and sum for field pubYear 
// normally is combined with a FilterQuery or ScoredQuery to count a set of results
Search search = new Search("myIndexName").setAmount(100);
search.addStat(new NumericStat("pubYear")); 
SearchResult searchResult = zuliaWorkPool.search(search);

Stat Facet

// return the highest sum on author count for each journal name
Search search = new Search("myIndexName").setAmount(100);
search.addStat(new StatFacet("authorCount", "journalName"));
SearchResult searchResult = zuliaWorkPool.search(search);

Drilling Down Facets

Search search = new Search("myIndexName").setAmount(100);
search.addFacetDrillDown("issn", "1111-1111");
SearchResult searchResult = zuliaWorkPool.search(search);

Getting the second page of results with a cursor

Search search = new Search("myIndexName");
search.setAmount(100);
search.addQuery(new ScoredQuery("issn:1234-1234 AND title:special"));

// on a changing index a sort on  is necessary
// it can be sort on another field AND id as well
search.addSort(new Sort("id"));

SearchResult firstResult = zuliaWorkPool.search(search);

search.setLastResult(firstResult);


SearchResult secondResult = zuliaWorkPool.search(search);

Getting the all results with a cursor

Search search = new Search("myIndexName");
search.setAmount(100); //this will be the page size
search.addQuery(new ScoredQuery("issn:1234-1234 AND title:special"));

// on a changing index a sort on  is necessary
// it can be sort on another field AND id as well
search.addSort(new Sort("id"));

//option 1 - requires fetch type full (default)
zuliaWorkPool.searchAllAsDocument(search, document -> {
    // do something with mongo bson document
});

//variation 2 - when score is needed, searching multiple indexes and index name is needed, or fetch type is NONE/META
zuliaWorkPool.searchAllAsScoredResult(search, scoredResult -> {
    System.out.println(scoredResult.getUniqueId() + " has score " + scoredResult.getScore() + " for index " + scoredResult.getIndexName());
    // if result fetch type is full (default)
    Document document = ResultHelper.getDocumentFromScoredResult(scoredResult);
});

//variation 3 - each page is a returned as a search result.  less convenient but gives access to total hits
zuliaWorkPool.searchAll(search, searchResult -> {
    System.out.println("There are " + searchResult.getTotalHits());

    // variation 3a - requires fetch type full (default)
    for (Document document : searchResult.getDocuments()) {

    }

    // variation 3b - when score is needed, searching multiple indexes and index name is needed, or fetch type is NONE/META
    for (ZuliaQuery.ScoredResult result : searchResult.getResults()) {

    }
});

Deleting

Delete From Index

//Deletes the document from the index but not any associated documents
DeleteFromIndex deleteFromIndex = new DeleteFromIndex("myid111", "myIndexName");
zuliaWorkPool.delete(deleteFromIndex);

Delete Completely

//Deletes the result document, the index documents and all associated documents associated with an id
DeleteFull deleteFull = new DeleteFull("myid123", "myIndexName");
zuliaWorkPool.delete(deleteFull);

Delete Single Associated

//Removes a single associated document with the unique id and filename given
DeleteAssociated deleteAssociated = new DeleteAssociated("myid123", "myIndexName", "myfile2");
zuliaWorkPool.delete(deleteAssociated);

Delete All Associated

DeleteAllAssociated deleteAllAssociated = new DeleteAllAssociated("myid123", "myIndexName");
zuliaWorkPool.delete(deleteAllAssociated);

Other Operations

Get Current Document Count for Index

GetNumberOfDocsResult result = zuliaWorkPool.getNumberOfDocs("myIndexName");
System.out.println(result.getNumberOfDocs());

Get Fields for Index

GetFieldsResult result = zuliaWorkPool.getFields(new GetFields("myIndexName"));
System.out.println(result.getFieldNames());

Get Terms for Field

GetTermsResult getTermsResult = zuliaWorkPool.getTerms(new GetTerms("myIndexName", "title"));
for (ZuliaBase.Term term : getTermsResult.getTerms()) {
    System.out.println(term.getValue() + ": " + term.getDocFreq());
}

Get Cluster Nodes

GetNodesResult getNodesResult = zuliaWorkPool.getNodes();
for (Node node : getNodesResult.getNodes()) {
    System.out.println(node);
}

Async API

Every Function has a Corresponding Async Version

Executor executor = Executors.newCachedThreadPool();

Search search = new Search("myIndexName").setAmount(10);

ListenableFuture<SearchResult> resultFuture = zuliaWorkPool.searchAsync(search);

Futures.addCallback(resultFuture, new FutureCallback<>() {
    @Override
    public void onSuccess(SearchResult result) {

    }

    @Override
    public void onFailure(Throwable t) {

    }
}, executor);

Object Persistence / Mapping

Annotated Object Example

@Settings(indexName = "wikipedia", numberOfShards = 16, shardCommitInterval = 6000)
public class Article {

	public Article() {

	}

	@UniqueId
	private String id;

	@Indexed(analyzerName = DefaultAnalyzers.STANDARD)
	private String title;

	@Indexed
	private Integer namespace;

	@DefaultSearch
	@Indexed(analyzerName = DefaultAnalyzers.STANDARD)
	private String text;

	private Long revision;

	@Indexed
	private Integer userId;

	@Indexed(analyzerName = DefaultAnalyzers.STANDARD)
	private String user;

	@Indexed
	private Date revisionDate;

	//Getters and Setters
	//....
}

Creating Index for Annotated Class Example

Mapper<Article> mapper = new Mapper<>(Article.class);
zuliaWorkPool.createIndex(mapper.createOrUpdateIndex());

Storing an Object with Mapper

Article article = new Article();
//...
Store store = mapper.createStore(article);
zuliaWorkPool.store(store);

Querying with Mapper

Search search = new Search("wikipedia").setAmount(10);
search.addQuery(new ScoredQuery("title:technology"));

SearchResult searchResult = zuliaWorkPool.search(search);
List<Article> articles = searchResult.getMappedDocuments(mapper);
Clone this wiki locally