-
Notifications
You must be signed in to change notification settings - Fork 22
Grouping Parameter (Spider)
[Table of Contents](https://github.com/dell-oss/Doradus/wiki/Spider Databases: Table-of-Contents) | Previous | Next
Spider Aggregate Queries: Grouping Parameter Overview
When a grouping parameter is provided, it causes the aggregate query to compute multiple values, one per group value as defined by the grouping expression. Either of two methods can be used to pass a grouping parameter depending upon whether or not composite grouping is desired. A wide range of grouping expressions are allowed as described in the following sections.
Without a grouping parameter, an aggregate query returns a single value: the metric function computed across all selected objects. Consider the following aggregate query URI REST command:
GET /Msgs/Message/_aggregate?m=MAX(Size)
This aggregate query returns the largest Size
value among all messages. The response in XML is:
<results>
<aggregate metric="MAX(Size)"/>
<value>16796009</value>
</results>
In JSON:
{"results": {
"aggregate": {"metric": "MAX(Size)"},
"value": "16796009"
}}
As shown, an aggregate
element lists the parameters used for the aggregate query. In this case, only a metric parameter was provided. For a global aggregate, the metric value is provided in the value
element.
When the grouping field is multi-valued with respect to the perspective table, the metric is applied across all values for each perspective object. For example, in this query:
GET /Msgs/Message/_aggregate?m=COUNT(Tags)
If Tags
is an MV scalar, all values for each message are counted, so the total returned may be more than the number of objects. Furthermore, an object related to a perspective object may be processed more than once. Consider this query:
GET /Msgs/Message/_aggregate?m=COUNT(ExternalRecipients.MessageAddress.Domain)
Some messages are likely to have multiple external recipients linked to the same domain. Therefore, ExternalRecipients.MessageAddress.Domain
will count the same domain multiple times for those messages.
When the grouping parameter consists of a single grouping field, objects are divided into sets based on the distinct values found for the grouping field. A separate metric value is computed for each group. For example, this aggregate query uses a single-valued scalar as the grouping field:
GET /Msgs/Message/_aggregate?m=MAX(Size)&f=Tags
The Tags field logically partitions objects into groups: one for each field value. Each object is included in each group for which is has a Tags value. If an object has no Tags value, it is placed in a (null) group. The maximum Size is then computed for each group. A typical result in XML is shown below:
<results>
<aggregate metric="MAX(Size)" group="Tags"/>
<totalobjects>6030</totalobjects>
<summary>16796009</summary>
<groups>
<group>
<metric>4875</metric>
<field name="Tags">(null)</field>
</group>
<group>
<metric>16796009</metric>
<field name="Tags">AfterHours</field>
</group>
<group>
<metric>16796009</metric>
<field name="Tags">Customer</field>
</group>
</groups>
</results>
In JSON:
{"results": {
"aggregate": {"metric": "MAX(Size)", "group": "Tags"},
"totalobjects": "6030",
"summary": "16796009",
"groups": [
{"group": {
"metric": "4875",
"field": {"Tags": "(null)"}
}},
{"group": {
"metric": "16796009",
"field": {"Tags": "AfterHours"}
}},
{"group": {
"metric": "16796009",
"field": {"Tags":"Customer"}
}}
]
}}
For grouped aggregate queries, the results
element contains a groups
element, which contains one group
for each group value. Each group
contains the field
name and value for that group and the corresponding metric
value. The totalobjects
value computes the number of objects selected in the computation. The summary
value computes the metric value across all selected objects independent of groups. For the AVERAGE
function, this provides the true average, not the average of averages.
By default, aggregate query group results use the fully-qualified path name of each grouping field. You can shorten the output result by using an alias for each grouping field. The alias name is used instead of the fully-qualified name. For example:
GET /Msgs/Message/_aggregate?m=COUNT(*)&f=TOP(3,Sender.Person.Name)
This query produces a result such as the following:
<results>
<aggregate metric="COUNT(*)" group="TOP(3,Sender.Person.Name)"/>
<totalobjects>6030</totalobjects>
<summary>6030</summary>
<totalgroups>186</totalgroups>
<groups>
<group>
<metric>5256</metric>
<field name="Sender.Person.Name">(null)</field>
</group>
<group>
<metric>82</metric>
<field name="Sender.Person.Name">Quest Support</field>
</group>
<group>
<metric>80</metric>
<field name="Sender.Person.Name">spb_setupbuilder</field>
</group>
</groups>
</results>
The fully-qualified field name Sender.Person.Name
is used for the field parameter in each group element. An alias can be substituted for a grouping field by appending AS *name*
to the grouping field. Example:
GET /Msgs/Message/_aggregate?m=COUNT(*)&f=TOP(3,Sender.Person.Name) AS Name
The alias Name is used in the output as shown below:
<results>
<aggregate metric="COUNT(*)" group="TOP(3,Sender.Person.Name)"/>
<totalobjects>6030</totalobjects>
<summary>6030</summary>
<totalgroups>186</totalgroups>
<groups>
<group>
<metric>5256</metric>
<field name="Name">(null)</field>
</group>
<group>
<metric>82</metric>
<field name="Name">Quest Support</field>
</group>
<group>
<metric>80</metric>
<field name="Name">spb_setupbuilder</field>
</group>
</groups>
</results>
When many groups are returned and/or the grouping field name is long, the alias name can significantly shorten the output results. An alias name can also improve the identity of the groups.
The grouping parameter can list multiple grouping expressions to form multi-level grouping. Each grouping expression must be a path from the perspective table to a link or scalar field. Below is an example multi-level aggregate query:
GET /Msgs/Message/_aggregate?m=COUNT(*)&f=TRUNCATE(SendDate,DAY),Tags
In this example, TRUNCATE(SendDate,DAY)
is the top-level grouping field and Tags
is the second-level grouping field. The query creates groups based on the cross-product of all grouping field values, and a metric is computed for each combination for which at least one object has a value. A perspective object is included in the metric computation for each group for which it has actual values. A summary
value is computed for each non-leaf group, and a totalobjects
value is computed for the top-level group. An example result in XML is shown below:
<results>
<aggregate metric="COUNT(*)" group="TRUNCATE(SendDate,DAY),Tags"/>
<totalobjects>6030</totalobjects>
<summary>6030</summary>
<groups>
<group>
<summary>4752</summary>
<field name="SendDate">2010-07-17 00:00:00</field>
<groups>
<group>
<metric>1</metric>
<field name="Tags">(null)</field>
</group>
<group>
<metric>4751</metric>
<field name="Tags">AfterHours</field>
</group>
<group>
<metric>1524</metric>
<field name="Tags">Customer</field>
</group>
</groups>
</group>
<group>
<summary>1278</summary>
<field name="SendDate">2010-07-18 00:00:00</field>
<groups>
<group>
<metric>1278</metric>
<field name="Tags">AfterHours</field>
</group>
<group>
<metric>700</metric>
<field name="Tags">Customer</field>
</group>
</groups>
</group>
</groups>
</results>
The same response in JSON:
{"results": {
"aggregate": {"metric": "COUNT(*)", "group": "TRUNCATE(SendDate,DAY),Tags"},
"totalobjects": "6030",
"summary": "6030",
"groups": [
{"group": {
"summary": "4752",
"field": {"SendDate":"2010-07-17 00:00:00"},
"groups": [
{"group": {
"metric": "1",
"field": {"Tags": "(null)"}
}},
{"group": {
"metric": "4751",
"field": {"Tags":"AfterHours"}
}},
{"group": {
"metric": "1524",
"field": {"Tags":"Customer"}
}}
]
}},
{"group": {
"summary": "1278",
"field": {"SendDate": "2010-07-18 00:00:00"},
"groups": [
{"group": {
"metric": "1278",
"field": {"Tags":"AfterHours"}
}},
{"group": {
"metric": "700",
"field": {"Tags":"Customer"}
}}
]
}}
]
}}
Each non-leaf group
has an inner groups
element containing its corresponding lower-level group
elements and a summary
element that provides a group-level metric value. Leaf-level groups have a metric
element. This structure is recursive if there are more than two grouping levels.
Doradus Spider allows a group field to be used as a grouping expression if all of its leaf fields are links. For example, in the example schema the group field Participants
contains links Sender
, InternalRecipients
, and ExternalRecipients
, all pointing to the Participant
table. If the Participants
field is used as a grouping parameter, the values (object IDs) for all three links are combined into a set for each perspective object. The object is included in the metric computation for each group for which it has a value. If a perspective object has no values for any of the links, it is included in the (null)
group.
Link paths and WHERE
filters can be used on the group field with the same syntax as allowed for links. For example:
f=Participants.Person.Name
f=Participants.WHERE(ReceiptDate > 2014-02-01).Person.Name
f=Participants.Person.WHERE(Department:Sales).Name
In the first example, the combined Person.Name
values of each link (ExternalRecipients
, InternalRecipients
, and Sender
) form the grouping sets; each perspective object is included in each set in which it participates. If a perspective object has no Participants.Person.Name
values, it is included in the null group.
The second and third examples using the same grouping parameter: Participants.Person.Name
. However, the presence of the WHERE
filter causes nulls to be handled differently:
-
In the second example, a perspective object is simply skipped if it has no participants or none of its participants are selected by the expression
ReceiptDate > 2014-02-01
. When a perspective is selected by theWHERE
expression but it is has noPerson.Name
values, it is included in the(null)
group. -
In the third example, a perspective object is skipped if it has no participants, no participants have
Person
values, or noPerson.Department
field includes the termSales
. Only if a perspective object is chosen but has noName
values is it included in the(null)
group.
Doradus Spider supports a special grouping feature called composite grouping. It is meaningful only for aggregate queries with 2 or more grouping levels. It causes the metric function(s) to be computed for parent (non-leaf) groups in addition to leaf-most groups. The extra computations are returned as composite results within the corresponding parent groups.
Composite grouping is requested by using a special grouping parameter instead of the normal grouping parameter. For example, consider the following URI aggregate query:
GET /Msgs/Message/_aggregate?m=COUNT(*)&f=TRUNCATE(SendDate,DAY),Tags
Composite grouping can be requested for this 2-level query by using the &cf
parameter instead of &f
:
GET /Msgs/Message/_aggregate?m=COUNT(*)&cf=TRUNCATE(SendDate,DAY),Tags
A typical result for this 2-level aggregate query in XML is shown below:
<results>
<aggregate metric="COUNT(*)" group="TRUNCATE(SendDate,DAY),Tags"/>
<totalobjects>6032</totalobjects>
<summary>6032</summary>
<groups>
<group>
<summary>4753</summary>
<field name="SendDate">2010-07-17 00:00:00</field>
<groups>
<group>
<metric>1</metric>
<field name="Tags">(null)</field>
</group>
<group>
<metric>4752</metric>
<field name="Tags">AfterHours</field>
</group>
<group>
<metric>1524</metric>
<field name="Tags">Customer</field>
</group>
</groups>
</group>
<group>
<summary>1279</summary>
<field name="SendDate">2010-07-18 00:00:00</field>
<groups>
<group>
<metric>1279</metric>
<field name="Tags">AfterHours</field>
</group>
<group>
<metric>701</metric>
<field name="Tags">Customer</field>
</group>
</groups>
</group>
<group composite="true">
<field name="SendDate">*</field>
<groups>
<group>
<metric>1</metric>
<field name="Tags">(null)</field>
</group>
<group>
<metric>6031</metric>
<field name="Tags">AfterHours</field>
</group>
<group>
<metric>2225</metric>
<field name="Tags">Customer</field>
</group>
</groups>
</group>
/groups>
</results>
The same response in JSON is shown below:
{"results": {
"aggregate": {"metric": "COUNT(*)", "group": "TRUNCATE(SendDate,DAY),Tags"},
"totalobjects": "6032",
"summary": "6032",
"groups": [
{"group": {
"summary": "4753",
"field": {"SendDate": "2010-07-17 00:00:00"},
"groups": [
{"group": {
"metric": "1",
"field": {"Tags": "(null)"}
}},
{"group": {
"metric": "4752",
"field": {"Tags": "AfterHours"}
}},
{"group": {
"metric": "1524",
"field": {"Tags": "Customer"}
}}
]
}},
{"group": {
"summary": "1279",
"field": {"SendDate": "2010-07-18 00:00:00"},
"groups": [
{"group": {
"metric": "1279",
"field": {"Tags": "AfterHours"}
}},
{"group": {
"metric": "701",
"field": {"Tags": "Customer"}
}}
]
}},
{"group": {
"composite": "true",
"field": {"SendDate": "*"},
"groups": [
{"group": {
"metric": "1",
"field": {"Tags": "(null)"}
}},
{"group": {
"metric": "6031",
"field": {"Tags": "AfterHours"}
}},
{"group": {
"metric": "2225",
"field": {"Tags":"Customer"}
}}
]
}}
]
}}
As shown, composite grouping produces an extra group for non-leaf grouping levels. This group is marked with a composite
property of true
, and the value for its field
element is "*". *Within the composite group, lower-level metric groups are provided for each lower-level grouping field value, however, these lower-level metrics are computed across all objects at the composite grouping level. In the example above, the composite group computes the metric function (COUNT(*))
for all second-level groups (Tags
) across all first-level group values (SendDate
).
Composite grouping is only meaningful for multi-level grouping.
Doradus Spider allows the aggregate query grouping parameter to consist of multiple grouping sets. Each grouping set is enclosed in a GROUP
function; multiple grouping sets are separated by commas. This feature is known as compound grouping. The general syntax is:
GROUP(<expression 1>),GROUP(<expression 2>),...,GROUP(<expression n>)
Each <expression n>
parameter must use one of the following forms:
-
A
"*"
can be used to compute a global aggregate (i.e.,GROUP(*)
). The metric function is computed for all selected objects just as in an aggregate query with no grouping parameter. TheGROUP(*)
function should be specified at most once since there is only one metric value for a global aggregate. -
A single-level grouping expression, consisting of a single scalar field or a field path (e.g.,
GROUP(Tags)
). -
A multi-level grouping expression, consisting of a comma-separated list of scalar fields and/or field paths (e.g.,
GROUP(TRUNCATE(SendDate,DAY),Tags)
).
Each single- and multi-level grouping expression must be relative to the perspective table. The same set of objects selected by the aggregate query is passed to each grouping set, and separate metric computations are performed for each grouping set. Aggregate queries that use compound grouping perform a single pass through the selected objects and computes multiple grouping sets at the same time.
Consider this aggregate query:
GET /Msgs/Message/_aggregate?m=MAX(Size)
&cf=GROUP(*),GROUP(TRUNCATE(SendDate,WEEK)),GROUP(TOP(2,TERMS(Subject)),Tags)
&q=SendDate > 2013-10-15
This compound grouping aggregate query selects messages whose SendDate
is >= 2013-10-15
, and it computes the following:
-
The maximum
Size
value of selected messages (GROUP(*)
). -
The maximum
Size
of selected messages grouped bySendDate
truncated to WEEK granularity (GROUP(TRUNCATE(SendDate,WEEK))
). -
The maximum
Size
of selected messages grouped first by the top 2 terms used in theSubject
field and then by theTags
field (GROUP(TOP(2,TERMS(Subject)),Tags)
). Because composite grouping was requested (&cf
), this multi-level grouping expression uses the composite grouping technique.
Compound aggregate queries compute all grouping sets in a single pass. The query above returns XML results such as the following:
<results>
<aggregate metric="MAX(Size)" query="SendDate > 2009-10-15"
group="GROUP(*),GROUP(TRUNCATE(SendDate,WEEK)),GROUP(TOP(2,TERMS(Subject)),Tags)"/>
<totalobjects>6032</totalobjects>
<groupsets>
<groupset>
<value>16796009</value>
</groupset>
<groupset group="TRUNCATE(SendDate,WEEK)">
<summary>16796009</summary>
<groups>
<group>
<field name="SendDate">2010-07-12 00:00:00</field>
<metric>965230</metric>
</group>
...
</groups>
</groupset>
<groupset group="TOP(2,TERMS(Subject)),Tags">
<summary>16796009</summary>
<totalgroups>15267</totalgroups>
<groups>
<group>
<summary>16796009</summary>
<field name="Subject">scalepan</field>
<groups>
<group>
<metric>16796009</metric>
<field name="Tags">AfterHours</field>
</group>
...
</groups>
</group>
...
<group composite="true">
<field name="Subject">*</field>
<groups>
<group>
<metric>7317</metric>
<field name="Tags">(null)</field>
</group>
...
</groups>
</group>
</groups>
</groupset>
</groupsets>
</results>
In JSON:
{"results": {
"aggregate": {
"group": "GROUP(*),GROUP(TRUNCATE(SendDate,WEEK)),GROUP(TOP(2,TERMS(Subject)),Tags)",
"query": "SendDate > 2009-10-15",
"metric":"MAX(Size)"
},
"totalobjects": "6032",
"groupsets": [
{"groupset": {
"value": "16796009"
}},
{"groupset": {
"group": "TRUNCATE(SendDate,WEEK)",
"summary": "16796009",
"groups": [
{"group": {
"field": {"SendDate": "2010-07-12 00:00:00"},
"metric": "16796009"
}},
...
]
}},
{"groupset": {
"group": " TOP(2,TERMS(Subject)),Tags ",
"summary": "16796009",
"totalgroups": "15267",
"groups": [
{"group": {
"summary": "16796009",
"field": {"Subject": "scalepan"},
"groups": [
{"group": {
"metric": "16796009",
"field": {"Tags": "AfterHours"}
}},
{"group": {
"metric": "16796009",
"field": {"Tags": "Customer"}
}}
]
}},
...
{"group": {
"composite": "true",
"field": {"Origin": "*"},
"groups": [
{"group": {
"metric": "7317",
"field": {"Tags": "(null)"}
}},
...
]
}}
]
}}
]
}}
Notable aspects of a compound group result:
-
As with all aggregate queries, the outer
results
element contains anaggregate
element that confirms the aggregate query parameters. -
The
results
element also contains agroupsets
element, which contains onegroupset
element per grouping set, that is, for eachGROUP
function. -
The contents of each
groupset
element follows the format applicable for global, single-level, or multi-level aggregate queries, except that they do not contain anaggregate
element. -
As with all grouped aggregate queries, each
groupset
and each non-leaf group contains asummary
value. -
Only multi-level grouping sets can contain a composite group, denoted by a
composite=true
element and afield
value of "*".
Multi-metric aggregate queries can also use compound grouping. That is, a single aggregate query can specify multiple metric functions in the &m
parameter and multiple GROUP
functions in the &f
or &cf
parameter. Such queries compute multiple metric functions, like a multi-metric query, and provide multiple grouping operations for each metric, all in a single query.
The results of compound/multi-metric queries are returned using groupset
elements for each combination of metric function and GROUP
function. If there are m metric functions and n GROUP
functions, the result will contain m x n groupset
elements. Each groupset
identifies the metric and grouping parameter for which it provides results.
As an example, the following compound/multi-metric aggregate query has 3 metric functions and 2 GROUP
functions:
GET /Msgs/Message/_aggregate?m=COUNT(*),MAX(Size),AVERAGE(Size)
&cf=GROUP(TOP(2,Tags),Subject),GROUP(TRUNCATE(SendDate,DAY))
This means the result will contain 6 groupset elements, as shown in the following XML outline:
<results>
<aggregate metric="COUNT(*),MAX(Size),AVERAGE(Size)"
group="GROUP(TOP(2,Tags),Subject),GROUP(TRUNCATE(SendDate,DAY))"/>
<groupsets>
<groupset group="TOP(2,Tags),Subject" metric="COUNT(*)">...</groupset>
<groupset group="TRUNCATE(SendDate,DAY)" metric="COUNT(*)">...</groupset>
<groupset group="TOP(2,Tags),Subject" metric="MAX(Size)">...</groupset>
<groupset group="TRUNCATE(SendDate,DAY)" metric="MAX(Size)">...</groupset>
<groupset group="TOP(2,Tags),Subject" metric="AVERAGE(Size)">...</groupset>
<groupset group="TRUNCATE(SendDate,DAY)" metric="AVERAGE(Size)">...</groupset>
</groupsets>
</results>
In JSON:
{"results": {
"aggregate": {
"group": "GROUP(TOP(2,Tags),Origin),GROUP(TRUNCATE(SendDate,DAY))",
"metric": "COUNT(*),MAX(Size),AVERAGE(Size)"
},
"groupsets": [
{"groupset": {"group": "TOP(2,Subject),Origin", "metric": "COUNT(*)", ...}},
{"groupset": {"group": "TRUNCATE(SendDate,DAY)", "metric": "COUNT(*)", ...}},
{"groupset": {"group": "TOP(2,Subject),Origin", "metric": "MAX(Size)", ...}},
{"groupset": {"group": "TRUNCATE(SendDate,DAY)", "metric": "MAX(Size)", ...},
{"groupset": {"group": "TOP(2,Subject),Origin", "metric": "AVERAGE(Size)", ...}},
{"groupset": {"group": "TRUNCATE(SendDate,DAY)", "metric": "AVERAGE(Size)", ...}}
]
}}
Though not shown here, each groupset
will contain groups
, summary
, and totalgroups
elements as required by each GROUP function. If the query requests composite grouping (&cf
), a groupset
with multi-level grouping will contain a composite group for non-leaf groups.
This section describes special functions that provide enhanced behavior for the grouping parameter.
The BATCH
function divides a scalar field’s values into specific ranges. Each range becomes a grouping field value, and objects contribute to the metric computation for the ranges for which it has values. The BATCH
function’s first value must be a scalar field. The remaining values must be literal values compatible with the field’s type (text, timestamp, or numeric), and they must be given in ascending order. Example:
GET /Msgs/Message/_aggregate?m=COUNT(*)&f=BATCH(Size,100,1000,10000,100000)
This query counts messages grouped by specific ranges of the Size
field. The ranges are divided at the given literal values: 100
, 1000
, 10000
, and 100000
. The lowest value implicitly creates an extra less than group; the highest value is open-ended and creates a greater than or equal to group. The query in the example above defines the following 5 groups:
Group 1: Size < 100
Group 2: Size >= 100 AND Size < 1000
Group 3: Size >= 1000 AND Size < 10000
Group 4: Size >= 10000 AND Size < 100000
Group 5: Size >= 100000
The example above returns a result such as the following:
<results>
<aggregate metric="COUNT(Size)" group="BATCH(Size,100,1000,10000,100000)"/>
<groups>
<group>
<field name="Size"><100</field>
<metric>0</metric>
</group>
<group>
<field name="Size">100-1000</field>
<metric>125</metric>
</group>
<group>
<field name="Size">1000-10000</field>
<metric>4651</metric>
</group>
<group>
<field name="Size">10000-100000</field>
<metric>1149</metric>
</group>
<group>
<field name="Size">>=100000</field>
<metric>105</metric>
</group>
</groups>
<summary>6030</summary>
</results>
In JSON:
{"results": {
"aggregate": {"group": "BATCH(Size,100,1000,10000,100000)", "metric": "COUNT(Size)"},
"groups": [
{"group": {
"field": {"Size": "<100"},
"metric": "0"
}},
{"group": {
"field": {"Size": "100-1000"},
"metric": "125"
}},
{"group": {
"field": {"Size": "1000-10000"},
"metric": "4651"
}},
{"group": {
"field": {"Size":"10000-100000"},
"metric": "1149"
}},
{"group": {
"field": {"Size": ">=100000"},
"metric": "105"
}}
],
"summary":"6030"
}}
As shown above in the <100
group, if no selected object has a value that falls into one of the specified groups, that group is still returned: the group’s metric is 0 for the COUNT
function and empty for all other metric functions. As with all grouped aggregate queries, a summary
value is returned that applies the metric function across all groups.
In an aggregate query, normally all values of a scalar grouping field are used to create groups. For example:
GET /Email/Message/_aggregate?f=Tags&...
All values of the Tags
field for selected objects are used to create grouping fields. To eliminate specific values from being used for grouping—without affecting the selection of the owning object—the EXCLUDE
function can be used:
GET /Email/Message/_aggregate?f=Tags.EXCLUDE('Confidential, 'Internal')&...
When the grouping field is a text field, the values passed to the EXCLUDE function are whole, case-insensitive values—not terms—and must be enclosed in quotes. In the example above, groups matching the value Confidential or Internal or case variations of these are excluded. The values used for text scalars can contain wildcards ? and *. For example:
GET /Email/Message/_aggregate?f=Tags.EXCLUDE('*sam?')&...
This aggregate query excludes all groups that end with the sam*x*
, where x
is any letter, or case variations of this sequence.
To generate only groups that match specific scalar values—without affecting the selection of the owning object—the INCLUDE
function can be used:
GET /Email/Message/_aggregate?f=Tags.INCLUDE('Confidential, 'Internal')&...
The only groups generated are those matching Confidential
and Internal
and case variations of these; all other values are skipped. Again, when the grouping field is a text scalar, the value must be enclosed in quotes, and it can contain wildcards ?
and *
.
The values passed to INCLUDE
and EXCLUDE
must be compatible with the corresponding scalar type field: integers for integer
or long
fields, Booleans for boolean
fields, etc. Additionally, the keyword NULL
(uppercase) can be used to include or exclude the (null)
group normally generated when at least one object has a null value for the grouping field. Example:
GET /Email/Message/_aggregate?f=Tags.EXCLUDE(NULL)&...
Groups can be created from the terms used within a specific field. The general format of the TERMS
function is:
TERMS(<field name> [, (<stop term 1> <stop term 2> ...)])
Any predefined scalar field can be used, but TERMS
is most effective with text fields. The optional stop term list is a list of terms, enclosed in parentheses, that are excluded from the unique set of terms found within the specified field.
For example, the following request fetches the COUNT
of messages grouped by terms found within the Subject
field for a particular sender. For brevity, the TERMS
function is wrapped by the TOP
function to limit the results to the five groups with the highest counts:
GET /Msgs/Message/_aggregate?m=COUNT(*)&q=Sender.MessageAddress.Person.Name:Support
&f=TOP(5,TERMS(Subject))
Similar to the BATCH
function, the TERMS
function creates dynamic groups from a text field based on the terms it uses. To do this, as objects matching the query parameter (if any) are found, the field passed to TERMS
is parsed into alphanumeric terms, and a group is created for each unique term. Each contributes to the group metric computation for each term it contains. If a term appears multiple times within a field (e.g., "plan for a plan"), the object is only counted once. An example result in XML is shown below:
<results>
<aggregate query="Sender.MessageAddress.Person.Name:Support" metric="COUNT(*)"
group="TOP(5,TERMS(Subject))"/>
<groups>
<group>
<field name="Subject">dilemmas</field>
<metric>14</metric>
</group>
<group>
<field name="Subject">unchary</field>
<metric>14</metric>
</group>
<group>
<field name="Subject">sundae</field>
<metric>14</metric>
</group>
<group>
<field name="Subject">tillage</field>
<metric>14</metric>
</group>
<group>
<field name="Subject">infernal</field>
<metric>14</metric>
</group>
</groups>
<summary>82</summary>
<totalgroups>85</totalgroups>
</results>
In JSON:
{"results": {
"aggregate": {
"group": "TOP(5,TERMS(Subject))",
"metric": "COUNT(*)",
"query": "Sender.MessageAddress.Person.Name:Support"
},
"groups": [
{"group": {
"field": {"Subject": "dilemmas"},
"metric": "14"
}},
{"group": {
"field": {"Subject": "unchary"},
"metric": "14"
}},
{"group": {
"field": {"Subject": "sundae"},
"metric": "14"
}},
{"group": {
"field": {"Subject": "tillage"},
"metric": "14"
}},
{"group": {
"field": {"Subject": "infernal"},
"metric": "14"
}}
],
"summary": "82",
"totalgroups": "85"
}}
If the terms "sundae" and “tillage” were considered uninteresting, they could be eliminated from the results by listing them as stop terms in a second parenthetical parameter to the TERMS
function:
GET /Msgs/Message/_aggregate?m=COUNT(*)&q=Sender.MessageAddress.Person.Name='Quest Support'
&f=TOP(5,TERMS(Subject,(sundae tillage)))
By default, all group values are returned at each grouping level, and the groups are returned in ascending order of the grouping field value. The TOP
and BOTTOM
functions can be used to return groups in the order of the metric value. Optionally, they can also be used to limit the number of groups returned to the highest or lowest metric values. The TOP
and BOTTOM
functions wrap a grouping field expression and specify a limit parameter. For example:
GET /Msgs/Message/_aggregate?m=SUM(Size)&f=TOP(3,InternalRecipients.Person.Name)
The first parameter to TOP/BOTTOM is the limit value; the second parameter is the grouping field expression. This aggregate query sums the Size
field of message objects, grouped by internal recipient names, but it only returns the groups with the three highest SUM
values. Typical results for the example above in XML:
<results>
<aggregate metric="SUM(Size)" group="TOP(3,InternalRecipients.Person.Name)"/>
<totalobjects>6030</totalobjects>
<summary>190643320</summary>
<totalgroups>836</totalgroups>
<groups>
<group>
<metric>103198352</metric>
<field name="InternalRecipients.Person.Name">(null)</field>
</group>
<group>
<metric>20060808</metric>
<field name="InternalRecipients.Person.Name">Marc Bourauel</field>
</group>
<group>
<metric>16798901</metric>
<field name="InternalRecipients.Person.Name">Nina Cantauw</field>
</group>
</groups>
</results>
In JSON:
{"results": {
"aggregate": {
"metric": "SUM(Size)",
"group": "TOP(3,InternalRecipients.Person.Name)"
},
"totalobjects": "6030",
"summary": "190643320",
"totalgroups": "836",
"groups": [
{"group": {
"metric": "103198352",
"field": {"InternalRecipients.Person.Name": "(null)"}
}},
{"group": {
"metric": "20060808",
"field": {"InternalRecipients.Person.Name": "Marc Bourauel"}
}},
{"group": {
"metric": "16798901",
"field": {"InternalRecipients.Person.Name": "Nina Cantauw"}
}}
]
}}
The BOTTOM
parameter works the same way but returns the groups with the lowest metric values. When either the TOP
or BOTTOM
function is used, the total number of groups that were actually computed is returned in the element totalgroups
, as shown above.
When the limit parameter is 0, all groups are returned, but they are returned in metric-computation order. Using 0 for the limit parameter essentially means unlimited.
When the aggregate query has multiple grouping fields, a TOP
or BOTTOM
function can be used with each grouping field. In secondary groups, TOP
and BOTTOM
return groups whose metric values are computed relative to their parent groups. Below is an example aggregate query 2-level grouping using TOP
for the outer level and BOTTOM
for the inner level:
GET /Msgs/Message/_aggregate?m=COUNT(*)&f=TOP(3,Sender.Person),BOTTOM(2,TRUNCATE(SendDate,DAY))`
The FIRST and LAST functions are similar to the TOP and BOTTOM functions except that a limited set of groups are returned based on the group names instead of the metric values.
GET /Email/Person/_aggregate?q=*&m=COUNT(*)&f=FIRST(10,LastName)&range=0
This query returns the first 10 LastName groups alphabetically. LAST returns the last groups (descending sort). This should help with aggregate queries that must be “paged”.
The TRUNCATE
function truncates a timestamp field to a given granularity, yielding a value that can be used as a grouping field. Before the timestamp field is truncated, the TRUNCATE
function can optionally shift the value to another time first. The syntax for the function is:
TRUNCATE(<timestamp field>, <precision> [, <time shift>])
For example:
GET /Msgs/Message/_aggregate?m=COUNT(*)&f=TRUNCATE(SendDate,DAY,GMT-2)
&q=SendDate >= "2010-07-17"
This query finds all messages whose SendDate
is >= "2010-07-17"
. For each one, it subtracts 2 hours from the SendDate
value and then truncates ("rounds down") to the nearest day. The count of all objects for each modified timestamp is computed in a separate group.
The value must be one of the following mnemonics:
Precision | Meaning |
---|---|
SECOND |
The milliseconds component of the timestamp is set to 0. |
MINUTE |
The milliseconds and seconds components are set to 0. |
HOUR |
The milliseconds, seconds, and minutes components are set to 0. |
DAY |
All time components are set to 0. |
WEEK |
All time components are set to 0, and the date components are set to the Monday of the calendar week in which the timestamp falls (as defined by ISO 8601). For example 2010-01-02 is truncated to the week 2009-12-28. |
MONTH |
All time components are set to 0, and the day component is set to 1. |
QUARTER |
All time components are set to 0, the day component is set to 1, and the month component is rounded "down" to January, April, July, or October. |
YEAR |
All time components are set to 0 and the day and month components are set to 1. |
The optional parameter adds or subtracts a specific amount to each object’s timestamp value before truncating it to the requested granularity. Optionally, the parameter can be quoted in single or double quotes. The syntax of the parameter is:
<timezone> | <GMT offset>
Where uses the same format as the NOW function:
GMT<sign><hours>[:<minutes>]
The meaning of each format is summarized below:
-
<timezone>
: A timezone abbreviation (e.g.,"PST"
) or name (e.g.,"America/Los_Angeles"
) can be given. Each object’s timestamp value is assumed to be in GMT (UTC) time and adjusted by the necessary amount to reflect the equivalent value in the given timezone. The allowable values for a abbreviation or name are those recognized by the Java functionjava.util.TimeZone.getAvailableIDs()
. -
GMT+<hour>
orGMT-<hour>
: The term GMT followed by a plus or minus signed followed by an integer hour value adjust each object’s timestamp up or down by the given number of hours. -
GMT+<hour>:<minute>
orGMT-<hour>:<minute>
: This is the same as the previous format except that each object’s timestamp is adjusted up or down by the given hour and minute value.
Note that in the GMT versions, the sign (‘+’ or ‘-‘) is required, and in URIs, the ‘+’ sign must be escaped as %2B
.
The timestamp field passed to the TRUNCATE
function can belong to the perspective table, or it can be at the end of a field path (e.g., TRUNCATE(Messages.SendDate)
).
When a grouping field uses the TRUNCATE
function, the truncated value is used for the field value within each group. An example in XML is shown below:
<results>
<aggregate query="SendDate >= 2010-07-17" metric="COUNT(*)" group="TRUNCATE(SendDate,HOUR)"/>
<groups>
<group>
<field name="SendDate">2010-07-17 00:00:00</field>
<metric>5</metric>
</group>
<group>
<field name="SendDate">2010-07-17 01:00:00</field>
<metric>4</metric>
</group>
<group>
<field name="SendDate">2010-07-17 02:00:00</field>
<metric>4</metric>
</group>
...
</groups>
<summary>6030</summary>
</results>
In JSON:
{"results": {
"aggregate": {
"group": "TRUNCATE(SendDate,HOUR)",
"metric": "COUNT(*)",
"query": "SendDate >= 2010-07-17"
},
"groups": [
{"group": {
"field": {"SendDate": "2010-07-17 00:00:00"},
"metric": "5"
}},
{"group": {
"field": {"SendDate": "2010-07-17 01:00:00"},
"metric": "4"
}},
{"group": {
"field": {"SendDate": "2010-07-17 02:00:00"},
"metric": "4"
}},
...
]
}}
When a text field is used as a grouping field in an aggregate query, actual field values are used to form each group value. For example, in this query:
.../_aggregate?m=COUNT(*)&f=Extension&...
If the field Extension
has identical but differently-cased values such as ".jpg" and “.JPG”, a group is created for each one and the metric function (COUNT
) is applied to each one.
When a text field is used as the grouping field, values can be case-normalized as they are used as grouping field values. This can be done with the UPPER
and LOWER
functions, which translate each text field accordingly as it is sorted into aggregated groups. Example:
.../_aggregate?m=COUNT(*)&f=LOWER(Extension)&...
This causes the Extension
field to be down-cased before it is sorted into its metric group. Hence, both values ".jpg" and “.JPG” are counted in a single group.
The WHERE
function can be used to provide filtering on a path used in a grouping expression. Most importantly, it can be used for multi-clause expressions that are bound to the same objects. To illustrate why the WHERE
clause is needed and how it is used, here’s an example.
Suppose we want to count messages grouped by the domain name of each message’s recipients, but we only want recipients that received the message after a certain date and the recipient’s address is considered external. As an example, this aggregate query won’t work:
// Doesn’t do what we want
GET /Msgs/Message/_aggregate?m=COUNT(*)
&f=Recipients.MessageAddress.Domain.Name
&q=Recipients.ReceiptDate > "2014-01-01" AND Recipients.MessageAddress.Domain.IsInternal=false
This query doesn’t work because it selects messages for which at least one recipient’s ReceiptDate
is > "2014-01-01"
, and at least one recipient has an external domain. Every such message is then counted in all of its Recipients.MessageAddress.Domain.Name
values, even for those that don’t really qualify.
Using the WHERE
filter for query expressions, we could bind the two query clauses to the same Recipients
instances. But this query still doesn’t work:
// Still not what we want
GET /Msgs/Message/_aggregate?m=COUNT(*)
&f=Recipients.MessageAddress.Domain.Name
&q=Recipients.WHERE(ReceiptDate > "2014-01-01" AND MessageAddress.Domain.IsInternal=false)
This causes the correct objects to be selected, but it still counts them in all Recipients.MessageAddress.Domain.Name
groups, not just those found with the query expression.
For this scenario, we can use the WHERE
function in the grouping parameter instead of the query parameter. In a grouping parameter, the WHERE
function filters out group values we don’t want. And, when the object selection criteria lies solely in the choice of groups, we don’t need a separate query parameter. The solution to the previous problem can be expressed as follows:
GET /Msgs/Message/_aggregate?m=COUNT(*)
&f=Recipients.WHERE(ReceiptDate > "2014-01-01" AND
MessageAddress.Domain.IsInternal=false).MessageAddress.Domain.Name
The grouping field is still Recipients.MessageAddress.Domain.Name
, but the WHERE
function inserted after Recipients
filters values used for grouping. The first field in each WHERE
clause (ReceiptDate
and MessageAddress
) must be members of the same table as Recipients
, thereby filtering the recipients in some manner. In this case, only recipients whose ReceiptDate
is > "2014-01-01"
and whose MessageAddress.Domain.IsInternal
is false. Groups are created by domains of recipients that match those constraints, and only objects within those group values are counted.
But wait! It gets better! The WHERE
function can be applied to multiple components of the same grouping path as long as each subquery is qualified to the path component to which it is attached. Exploiting this, we can factor out the redundant specification of MessageAddress.Domain
with this shorter but equivalent expression:
GET /Msgs/Message/_aggregate?m=COUNT(*)
&f=Recipients.WHERE(ReceiptDate > "2014-01-01").Address.Domain.WHERE(IsInternal=false).Name
Neat, yes?
Technical Documentation
[Doradus OLAP Databases](https://github.com/dell-oss/Doradus/wiki/Doradus OLAP Databases)
- Architecture
- OLAP Database Overview
- OLAP Data Model
- Doradus Query Language (DQL)
- OLAP Object Queries
- OLAP Aggregate Queries
- OLAP REST Commands
- Architecture
- Spider Database Overview
- Spider Data Model
- Doradus Query Language (DQL)
- Spider Object Queries
- Spider Aggregate Queries
- Spider REST Commands
- [Installing and Running Doradus](https://github.com/dell-oss/Doradus/wiki/Installing and Running Doradus)
- [Deployment Guidelines](https://github.com/dell-oss/Doradus/wiki/Deployment Guidelines)
- [Doradus Configuration and Operation](https://github.com/dell-oss/Doradus/wiki/Doradus Configuration and Operation)
- [Cassandra Configuration and Operation](https://github.com/dell-oss/Doradus/wiki/Cassandra Configuration and Operation)