-
Notifications
You must be signed in to change notification settings - Fork 9
Java Client
repositories {
mavenCentral()
maven {
url "https://maven.ascend-tech.us/repo/"
}
}
dependencies {
implementation 'io.zulia:zulia-client:2.4.1'
implementation 'org.mongodb:mongodb-driver-sync:4.5.0'
}
<repository>
<id>astMaven</id>
<name>AST Maven</name>
<url>https://maven.ascend-tech.us/repo</url>
</repository>
<dependencies>
<dependency>
<groupId>io.zulia</groupId>
<artifactId>zulia-client</artifactId>
<version>2.4.1</version>
</dependency>
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongodb-driver-sync</artifactId>
<version>4.5.0</version>
</dependency>
</dependencies>
The Zulia java client is named ZuliaWorkPool. ZuliaWorkPool is a thread safe connection pool using a gRPC connection to Zulia on the service port. There are async versions methods of all methods that return a ListenableFuture<> of the result.
ZuliaWorkPool zuliaWorkPool = new ZuliaWorkPool(new ZuliaPoolConfig().addNode("someIp"));
ZuliaPoolConfig zuliaPoolConfig = new ZuliaPoolConfig();
zuliaPoolConfig.addNode("someIp");
//optionally give ports if not default values
//zuliaPoolConfig.addNode("localhost", 32191, 32192);
//optional settings (default values shown)
zuliaPoolConfig.setDefaultRetries(0);//Number of attempts to try before throwing an exception
zuliaPoolConfig.setMaxConnections(10); //Maximum connections per server
zuliaPoolConfig.setMaxIdle(10); //Maximum idle connections per server
zuliaPoolConfig.setCompressedConnection(false); //Use this for WAN client connections
zuliaPoolConfig.setPoolName(null); //For logging purposes only, null gives default of zuliaPool-n
zuliaPoolConfig.setNodeUpdateEnabled(true); //Periodically update the nodes of the cluster and to enable smart routing to the correct node. Do not use this with ssh port forwarding. This can be done manually with zuliaWorkPool.updateNodes();
zuliaPoolConfig.setNodeUpdateInterval(10000); //Interval to update the nodes in ms
zuliaPoolConfig.setRoutingEnabled(true); //enable routing indexing to the correct server, this only works if automatic node updating is enabled or it is periodically called manually.
//create the connection pool
ZuliaWorkPool zuliaWorkPool = new ZuliaWorkPool(zuliaPoolConfig);
ClientIndexConfig indexConfig = new ClientIndexConfig().setIndexName("test").addDefaultSearchField("test");
indexConfig.addFieldConfig(FieldConfigBuilder.createString("title").indexAs(DefaultAnalyzers.STANDARD));
indexConfig.addFieldConfig(FieldConfigBuilder.createString("issn").indexAs(DefaultAnalyzers.LC_KEYWORD).facet());
indexConfig.addFieldConfig(FieldConfigBuilder.createInt("an").index().sort());
// createLong, createFloat, createDouble, createBool, createDate, createVector, createUnitVector is also available
// or create(storedFieldName, fieldType)
CreateIndex createIndex = new CreateIndex(indexConfig);
zuliaWorkPool.createIndex(createIndex);
-
The number of shards and unique id field cannot be changed for the index once the index is created. Set number of shards to greater than or equal to the maximum number of nodes possible in the cluster
-
Zulia supports indexes created from object annotations. For more info see section on Object Persistence.
-
Changing or adding analyzers for fields that are already indexed may require re-indexing for desired results.
Full ClientIndexConfig settings are explained below:
defaultSearchField - The field that is searched if no field is given to a query (missing query fields or direct fielded search)
defaultAnalyzer - The default analyzer for all fields not specified by a field config
fieldConfig - Overrides the default analyzer for a field
shardCommitInterval - Indexes or deletes to shard before a commit is forced (default 3200)
idleTimeWithoutCommit - Time without indexing before commit is forced in seconds (0 disables) (default 30)
applyUncommitedDeletes - Apply all deletes before search (default true)
shardQueryCacheSize - Number of queries cached at the shard level
shardQueryCacheMaxAmount - Queries with more than this amount of documents returned are not cached
//The following are used in optimizing federation of shards when more than one shard is used.
//The amount requested from each shard on a query is (((amountRequestedByQuery / numberOfShards) + minShardRequest) * requestFactor).
requestFactor - Used in calculation of request size for a shard (default 2.0)
minShardRequest - Added to the calculated request for a shard (default 2)
shardTolerance - Difference in scores between shards tolerated before requesting full results (query request amount) from the shard (default 0.05)
These Field Types are Available
STRING
NUMERIC_INT
NUMERIC_LONG
NUMERIC_FLOAT
NUMERIC_DOUBLE
DATE
BOOL
UNIT_VECTOR
VECTOR
These built-in Analyzers are available (DefaultAnalyzers)
KEYWORD - Field is searched as one token
LC_KEYWORD - Field is searched as one token in lowercase (case insenstive, use for wildcard searches)
LC_CONCAT_ALL
STANDARD - Standard lucene analyzer (good for general full text)
MIN_STEM - Minimal English Stemmer
KSTEMMED - K Stemmer
LSH - Locality Sensitive Hash
TWO_TWO_SHINGLE - (n-grams)
THREE_THREE_SHINGLE - (n-grams)
Custom Analyzer
clientIndexConfig.addAnalyzerSetting("myAnalyzer", Tokenizer.WHITESPACE, Arrays.asList(Filter.ASCII_FOLDING, Filter.LOWERCASE), Similarity.BM25);
clientIndexConfig.addFieldConfig(FieldConfigBuilder.create("abstract", FieldType.STRING).indexAs("myAnalyzer"));
zuliaWorkPool.deleteIndex("myIndex");
Zulia supports indexing and storing from object annotations. For more info see section on Object Persistence
Document document = new Document();
document.put("id", "myid222");
document.put("title", "Magic Java Beans");
document.put("issn", "4321-4321");
Store store = new Store("myid222", "myIndexName");
ResultDocBuilder resultDocumentBuilder = new ResultDocBuilder().setDocument(document);
//optional metadata document
resultDocumentBuilder.setMetadata(new Document().append("test1", "val1").append("test2", "val2"));
store.setResultDocument(resultDocumentBuilder);
zuliaWorkPool.store(store);
AssociatedBuilder associatedBuilder = new AssociatedBuilder();
associatedBuilder.setFilename("myfile2.txt");
// either set as text
associatedBuilder.setDocument("Some Text3");
// or as bytes
associatedBuilder.setDocument(new byte[]{0, 1, 2, 3});
associatedBuilder.setMetadata(new Document().append("mydata", "myvalue2").append("sometypeinfo", "text file2"));
//can be part of the same store request as the document
Store store = new Store("myid123", "someIndex");
//multiple associated documented can be added at once
store.addAssociatedDocument(associatedBuilder);
zuliaWorkPool.store(store);
StoreLargeAssociated storeLargeAssociated = new StoreLargeAssociated("myid333", "myIndexName", "myfilename", new File("/tmp/myFile"));
zuliaWorkPool.storeLargeAssociated(storeLargeAssociated);
FetchDocument fetchDocument = new FetchDocument("myid222", "myIndex");
FetchResult fetchResult = zuliaWorkPool.fetch(fetchDocument);
if (fetchResult.hasResultDocument()) {
Document document = fetchResult.getDocument();
//Get optional Meta
Document meta = fetchResult.getMeta();
}
FetchAllAssociated fetchAssociated = new FetchAllAssociated("myid123", "myIndexName");
FetchResult fetchResult = zuliaWorkPool.fetch(fetchAssociated);
if (fetchResult.hasResultDocument()) {
Document object = fetchResult.getDocument();
//Get optional metadata
Document meta = fetchResult.getMeta();
}
for (AssociatedResult ad : fetchResult.getAssociatedDocuments()) {
//use correct function for document type
String text = ad.getDocumentAsUtf8();
// OR
byte[] documentAsBytes = ad.getDocumentAsBytes();
//get optional metadata
Document meta = ad.getMeta();
String filename = ad.getFilename();
}
FetchAssociated fetchAssociated = new FetchAssociated("myid123", "myIndexName", "myfile2");
FetchResult fetchResult = zuliaWorkPool.fetch(fetchAssociated);
AssociatedResult ad = fetchResult.getFirstAssociatedDocument();
//use correct function for document type
String text = ad.getDocumentAsUtf8();
// OR
byte[] documentAsBytes = ad.getDocumentAsBytes();
//get optional metadata
Document meta = ad.getMeta();
String filename = ad.getFilename();
FetchLargeAssociated fetchLargeAssociated = new FetchLargeAssociated("myid333", "myIndexName", "myfilename", new File("/tmp/myFetchedFile"));
zuliaWorkPool.fetchLargeAssociated(fetchLargeAssociated);
Search search = new Search("myIndexName").setAmount(10);
search.addQuery(new ScoredQuery("issn:1234-1234 AND title:special"));
search.setResultFetchType(ZuliaQuery.FetchType.NONE); // just return the score and unique id
SearchResult searchResult = zuliaWorkPool.search(search);
long totalHits = searchResult.getTotalHits();
System.out.println("Found <" + totalHits + "> hits");
for (ZuliaQuery.ScoredResult sr : searchResult.getResults()) {
System.out.println("Matching document <" + sr.getUniqueId() + "> with score <" + sr.getScore() + ">");
}
Search search = new Search("myIndexName").setAmount(10);
search.addQuery(new ScoredQuery("issn:1234-1234 AND title:special"));
search.setResultFetchType(ZuliaQuery.FetchType.FULL); //return the full bson document that was stored
SearchResult searchResult = zuliaWorkPool.search(search);
long totalHits = searchResult.getTotalHits();
System.out.println("Found <" + totalHits + "> hits");
for (Document document : searchResult.getDocuments()) {
System.out.println("Matching document <" + document + ">");
}
Search search = new Search("myIndexName", "myOtherIndex").setAmount(10);
search.addQuery(new ScoredQuery("issn:1234-1234 AND title:special"));
SearchResult searchResult = zuliaWorkPool.search(search);
long totalHits = searchResult.getTotalHits();
System.out.println("Found <" + totalHits + "> hits");
for (ZuliaQuery.ScoredResult sr : searchResult.getResults()) {
Document doc = ResultHelper.getDocumentFromScoredResult(sr);
System.out.println("Matching document <" + sr.getUniqueId() + "> with score <" + sr.getScore() + "> from index <" + sr.getIndexName() + ">");
System.out.println(" full document <" + doc + ">");
}
Search search = new Search("myIndexName").setAmount(100);
search.addQuery(new FilterQuery("title:(brown AND bear)"));
// can add multiple sorts with ascending or descending (default ascending)
// can also specify whether missing values are returned first or last (default missing first)
search.addSort(new Sort("year").descending());
search.addSort(new Sort("journal").ascending().missingLast());
SearchResult searchResult = zuliaWorkPool.search(search);
Query fields set the search field used when one is not given for a term. if query fields are not set on the query and a term is not qualified, the default search fields on the index will be used.
// search for lung in title,abstract AND cancer in title,abstract AND treatment in title
search.addQuery(new ScoredQuery("lung cancer title:treatment").addQueryFields("title", "abstract").setDefaultOperator(Operator.AND));
// search for lung in default index fields OR cancer in default index fields
// OR is the default operator unless set
search.addQuery(new ScoredQuery("lung cancer"));
Filter queries are the same as scored queries except they do not require the search engine to compute a score. They should be used in cases where a sort is being applied and a score is not needed or when a filter should not influence the relevance score. Filter queries and scored queries can be combined together.
Search search = new Search("myIndexName").setAmount(100);
// include only years 2020 forward
search.addQuery(new FilterQuery("year:[2020 TO *]"));
// require both terms to be matched in either the title or abstract
search.addQuery(new FilterQuery("cheetah cub").setDefaultOperator(Operator.AND).addQueryFields("title", "abstract"));
// require two out of the three terms in the abstract
search.addQuery(new FilterQuery("sleep play run").setMinShouldMatch(2).addQueryField("abstract"));
// exclude the journal nature
search.addQuery(new FilterQuery("journal:Nature").exclude());
SearchResult searchResult = zuliaWorkPool.search(search);
// Can set number of documents to return to 0 or omit setAmount unless you want the documents at the same time
// normally is combined with a FilterQuery or ScoredQuery to count a set of results
Search search = new Search("myIndexName").setAmount(0);
search.addCountFacet(new CountFacet("issn").setTopN(20));
SearchResult searchResult = zuliaWorkPool.search(search);
for (ZuliaQuery.FacetCount fc : searchResult.getFacetCounts("issn")) {
System.out.println("Facet <" + fc.getFacet() + "> with count <" + fc.getCount() + ">");
}
// show number of values, number of documents, min, max, and sum for field pubYear
// normally is combined with a FilterQuery or ScoredQuery to count a set of results
Search search = new Search("myIndexName").setAmount(100);
search.addStat(new NumericStat("pubYear"));
SearchResult searchResult = zuliaWorkPool.search(search);
// return the highest sum on author count for each journal name
Search search = new Search("myIndexName").setAmount(100);
search.addStat(new StatFacet("authorCount", "journalName"));
SearchResult searchResult = zuliaWorkPool.search(search);
Search search = new Search("myIndexName").setAmount(100);
search.addFacetDrillDown("issn", "1111-1111");
SearchResult searchResult = zuliaWorkPool.search(search);
Search search = new Search("myIndexName");
search.setAmount(100);
search.addQuery(new ScoredQuery("issn:1234-1234 AND title:special"));
// on a changing index a sort on is necessary
// it can be sort on another field AND id as well
search.addSort(new Sort("id"));
SearchResult firstResult = zuliaWorkPool.search(search);
search.setLastResult(firstResult);
SearchResult secondResult = zuliaWorkPool.search(search);
Search search = new Search("myIndexName");
search.setAmount(100); //this will be the page size
search.addQuery(new ScoredQuery("issn:1234-1234 AND title:special"));
// on a changing index a sort on is necessary
// it can be sort on another field AND id as well
search.addSort(new Sort("id"));
//option 1 - requires fetch type full (default)
zuliaWorkPool.searchAllAsDocument(search, document -> {
// do something with mongo bson document
});
//variation 2 - when score is needed, searching multiple indexes and index name is needed, or fetch type is NONE/META
zuliaWorkPool.searchAllAsScoredResult(search, scoredResult -> {
System.out.println(scoredResult.getUniqueId() + " has score " + scoredResult.getScore() + " for index " + scoredResult.getIndexName());
// if result fetch type is full (default)
Document document = ResultHelper.getDocumentFromScoredResult(scoredResult);
});
//variation 3 - each page is a returned as a search result. less convenient but gives access to total hits
zuliaWorkPool.searchAll(search, searchResult -> {
System.out.println("There are " + searchResult.getTotalHits());
// variation 3a - requires fetch type full (default)
for (Document document : searchResult.getDocuments()) {
}
// variation 3b - when score is needed, searching multiple indexes and index name is needed, or fetch type is NONE/META
for (ZuliaQuery.ScoredResult result : searchResult.getResults()) {
}
});
//Deletes the document from the index but not any associated documents
DeleteFromIndex deleteFromIndex = new DeleteFromIndex("myid111", "myIndexName");
zuliaWorkPool.delete(deleteFromIndex);
//Deletes the result document, the index documents and all associated documents associated with an id
DeleteFull deleteFull = new DeleteFull("myid123", "myIndexName");
zuliaWorkPool.delete(deleteFull);
//Removes a single associated document with the unique id and filename given
DeleteAssociated deleteAssociated = new DeleteAssociated("myid123", "myIndexName", "myfile2");
zuliaWorkPool.delete(deleteAssociated);
DeleteAllAssociated deleteAllAssociated = new DeleteAllAssociated("myid123", "myIndexName");
zuliaWorkPool.delete(deleteAllAssociated);
GetNumberOfDocsResult result = zuliaWorkPool.getNumberOfDocs("myIndexName");
System.out.println(result.getNumberOfDocs());
GetFieldsResult result = zuliaWorkPool.getFields(new GetFields("myIndexName"));
System.out.println(result.getFieldNames());
GetTermsResult getTermsResult = zuliaWorkPool.getTerms(new GetTerms("myIndexName", "title"));
for (ZuliaBase.Term term : getTermsResult.getTerms()) {
System.out.println(term.getValue() + ": " + term.getDocFreq());
}
GetNodesResult getNodesResult = zuliaWorkPool.getNodes();
for (Node node : getNodesResult.getNodes()) {
System.out.println(node);
}
Every Function has a Corresponding Async Version
Executor executor = Executors.newCachedThreadPool();
Search search = new Search("myIndexName").setAmount(10);
ListenableFuture<SearchResult> resultFuture = zuliaWorkPool.searchAsync(search);
Futures.addCallback(resultFuture, new FutureCallback<>() {
@Override
public void onSuccess(SearchResult result) {
}
@Override
public void onFailure(Throwable t) {
}
}, executor);
@Settings(indexName = "wikipedia", numberOfShards = 16, shardCommitInterval = 6000)
public class Article {
public Article() {
}
@UniqueId
private String id;
@Indexed(analyzerName = DefaultAnalyzers.STANDARD)
private String title;
@Indexed
private Integer namespace;
@DefaultSearch
@Indexed(analyzerName = DefaultAnalyzers.STANDARD)
private String text;
private Long revision;
@Indexed
private Integer userId;
@Indexed(analyzerName = DefaultAnalyzers.STANDARD)
private String user;
@Indexed
private Date revisionDate;
//Getters and Setters
//....
}
Mapper<Article> mapper = new Mapper<>(Article.class);
zuliaWorkPool.createIndex(mapper.createOrUpdateIndex());
Article article = new Article();
//...
Store store = mapper.createStore(article);
zuliaWorkPool.store(store);
Search search = new Search("wikipedia").setAmount(10);
search.addQuery(new ScoredQuery("title:technology"));
SearchResult searchResult = zuliaWorkPool.search(search);
List<Article> articles = searchResult.getMappedDocuments(mapper);