Skip to content
This repository has been archived by the owner on Dec 20, 2024. It is now read-only.

When OLAP Works Best

JoeWinter edited this page Sep 22, 2014 · 7 revisions

[Table of Contents](https://github.com/dell-oss/Doradus/wiki/OLAP Databases: Table-of-Contents) | Previous | [Next](https://github.com/dell-oss/Doradus/wiki/The Email Sample Application (OLAP))
OLAP Database Overview: When OLAP Works Best


Doradus OLAP works best for applications that fit the following criteria:
  • Partitionable data: For smaller databases (a few million objects), all data may fit in a single shard. Otherwise, an application will need some criteria on which to divide data into shards. Time-based data (events, log entries, transactions, etc.) is the easiest to partition: for example, each shard holds data from the same hour or day. But other criteria for partitioning will also work.

  • Immutable/semi-mutable data: Objects can be modified and deleted after they are added to a shard. However, since updates are performed in batches, OLAP is not intended for frequent, fine-grained updates. Ideally, objects are write-once or only occasionally updated.

  • Batchable data: Data must be added and updated in batches, typically thousands of objects per batch. Load performance degrades with frequent, small-batch updates.

  • Not absolute real time: Batch updates do not become visible to queries until the containing shard is merged. (Shards can be merged repeatedly, after every batch or after several batches.) Merge time is typically a few seconds to a few minutes, but this means there is a lag between the time data is added and when it is queryable.

  • Emphasis on statistical queries: The fastest Doradus OLAP queries are single-shard aggregate queries. Multi-shard queries perform proportionally to the number of shards queried. Object queries are similar to aggregate queries but are affected by the number of fields are returned for each object. Full text searching is supported, but it works best for short text fields, not large document bodies. In other words, the primary focus of OLAP is analytics via aggregate queries.

OLAP is not intended for applications with these requirements:

  • Unstructured data: All tables and fields used in a Doradus OLAP application must be defined in the schema. The schema can evolve over time, but queries are evaluated in context of the most recent schema. Variable fields can be supported with techniques such as a link to a name/value object, but OLAP does not support schemaless applications (like Doradus Spider).

  • OLTP transactions: Because it requires batch loading and shard merging, OLAP does not work for application that need frequent, fine-grained updates to data.

  • Real time applications: Because of the lag between the time data is loaded and visible to queries, OLAP does not work for applications that require data to be visible immediately after it is added.

  • Document management: OLAP doesn’t work well for applications that need to store "documents" with large text bodies that are subsequently searched with full text expressions. OLAP supports large text (and binary) fields, but text fields are not pre-indexed with term vectors like Doradus Spider. Instead, full text searches dynamically tokenize each text field, which is slower for numerous, large text fields.

Clone this wiki locally