-
Notifications
You must be signed in to change notification settings - Fork 3
[OEP 11] Lucene Improvements #11
Comments
Sorry for the long delay.
|
Hi Roberto, Thanks for responding. I figured I'd put the most important suggestions together for the Lucene index here. Unfortunately, I am not the initiator of some of the suggestions, so I can't give you the answers to some of your concerns or needs. I will say to 2. The suggestion is purely about the full text features. Groupby and Orderby wouldn't be a part of such a search, as the ranking would determine the order. Groupings might be a consideration, if it is possible, but ranking would certainly only be against the Lucene indexed fields. For 8. I think it would be enough, if we could get the results with grouping on the classes, maybe limited to 5 or 10 results at first. Would that simplify the ability to search across classes? Consider classes domain objects and a search across these domains would bring in results from any domain object, but reduced to only a few of the best ones. If a user needs more in-depth searching in one domain, they'd request/ask for the full list of results from that one domain(class). For 11. Here is the issue I got the suggestion from. It explains more I believe. orientechnologies/orientdb#5185 Let me add, the full-text searching in other NoSQL and even SQL solutions leaves a lot to be desired. It is why solutions like Elasticsearch wins in population. This nugget of gold in ODB, if polished to shine, would blow the socks off of other NoSQL solutions, especially like MongoDB, who's full-text capabilities are majorly weak. One other thing that need looking at is 12 - Full-text search for non-schema defined fields. I am personally still up in the air about this myself, as I am not sure how to get the definitions done, to properly index without the schema. But for sure, this is necessary, in order for ODB to call itself a NoSQL database. The fact ODB can't handle the indexing of fields without defined schema means to me, ODB isn't NoSQL. It is only SQL. Scott |
Hi @smolinari, we are going forward, take a look at orientechnologies/orientdb#7155 We are moving to functions to allow more flexibility using the search feature and this way full-text and spatial will be homolog. BTW, I really don't understand the 12, Full-text search for non-schema defined fields. I'm in the field of search for a while right now, 10 years more or less. For sure we can index ALL props of all documents being stored inside OrientDB. But: what can you find this way? Which analyzers should be used to indexing? The StandardAnalyzer is suitable only for western languages. Try it on Chinese or Japanese and you will get a completely useless index. |
Looking good! Can't wait to see how 3.0 runs. With 12 I mean, being able to call an index on any property, without it being specifically created in ODB's schema system. This is actually not a problem with search, but the fact that ODB requires a schema definition to create an index in general. What I am looking for is what Mongo offers in the way of creating indexes without schema definitions in the database. In other words, if I say there is a property in a class as a developer, than that is all ODB should need to allow indexing. It is up to me, as a developer, to make sure that property is really there. If this concept doesn't change, ODB's claim to be a NoSQL database is only a half-truth. Scott |
Ok. Here is a go at my first OEP suggestion.
Summary:
Along with the work being done to upgrade the Lucene version to 6.2.0, this OEP will take note of the features requested by customers to make the full text search within OrientDB one of the most powerful among any current database.
Goals:
List the requested features for the full-text indexing within OrientDB and come up with a final list to be accomplished for 3.0. If there were issues already created with more information, I've added links to them.
Non-Goals:
This lists only fulltext search features. Geolocation features are not included.
Success metrics:
Unknown currently.
Motivation:
In order to be a truly multi-model database, a very good text search system should also be implemented into the database.
Description:
Here the list of requested improvements:
Add support for all possible Lucene query types orientdb#5189
Alternatives:
A bridge/ river/ interface to Elasticsearch could theoretically be an alternative, where data is only entered or removed from the Elasticseach indexes as needed. All the other Elasticseach APIs can be used directly by the end user for fulltext searches.
Risks and assumptions:
None currently.
Impact matrix
The text was updated successfully, but these errors were encountered: