- Type: Standard Library API proposal
- Author: Ilya Gorbunov
- Status: Implemented in Kotlin 1.1
- Prototype: Implemented
- Discussion: KEEP-23
Introduce a function similar to groupBy
, but folding values of each group on the fly.
- Ceylon:
Iterable.summarize
.
The following extension for Iterable
and iterable-like receivers is proposed:
public inline fun <T, K> Iterable<T>.groupingBy(
crossinline keySelector: (T) -> K
): Grouping<T, K>
where Grouping<T, K>
is an interface defined as following:
// A wrapper around a source of elements which could be iterated
// with the `keySelector: (T) -> K` function attached to it.
interface Grouping<T, out K> {
fun sourceIterator(): Iterator<T>
fun keyOf(element: T): K
}
Provide the following extensions for Grouping<T, K>
:
// Generic aggregation (fold or reduce)
// The most generic form of aggregation, that other overloads
// delegate their implementation to.
public inline fun <T, K, R> Grouping<T, K>.aggregate(
operation: (key: K, value: R?, element: T, first: Boolean) -> R
): Map<K, R>
public inline fun <T, K, R, M : MutableMap<in K, R>> Grouping<T, K>.aggregateTo(
destination: M,
operation: (key: K, accumulator: R?, element: T, first: Boolean) -> R
): M
// Key-parametrized fold
// Here the initial value and the operation depend on the key of a group.
public inline fun <T, K, R> Grouping<T, K>.fold(
initialValueSelector: (key: K, element: T) -> R,
operation: (key: K, accumulator: R, element: T) -> R
): Map<K, R>
public inline fun <T, K, R, M : MutableMap<in K, R>> Grouping<T, K>.foldTo(
destination: M,
initialValueSelector: (key: K, element: T) -> R,
operation: (key: K, accumulator: R, element: T) -> R
): M
// Simplified fold
// The `initialValue` is a constant and the operation do not depend on the group key.
public inline fun <T, K, R> Grouping<T, K>.fold(
initialValue: R,
operation: (accumulator: R, element: T) -> R
): Map<K, R>
public inline fun <T, K, R, M : MutableMap<in K, R>> Grouping<T, K>.foldTo(
destination: M,
initialValue: R,
operation: (accumulator: R, element: T) -> R
): M
// Reduce
public inline fun <S, T : S, K> Grouping<T, K>.reduce(
operation: (key: K, accumulator: S, element: T) -> S
): Map<K, S>
public inline fun <S, T : S, K, M : MutableMap<in K, S>> Grouping<T, K>.reduceTo(
destination: M,
operation: (key: K, accumulator: S, element: T) -> S
): M
// Count
public fun <T, K> Grouping<T, K>.eachCount(): Map<K, Int>
public fun <T, K, M : MutableMap<in K, Int>> Grouping<T, K>.eachCountTo(destination: M): M
// SumBy
public inline fun <T, K> Grouping<T, K>.eachSumOf(
valueSelector: (T) -> Int
): Map<K, Int>
public inline fun <T, K, M : MutableMap<in K, Int>> Grouping<T, K>.eachSumOfTo(
destination: M,
valueSelector: (T) -> Int
): M
The most common use case is doing some aggregation, broken down by some key:
-
given a text, count frequencies of words/characters;
val frequencies = words.groupingBy { it }.eachCount()
-
given orders in all stores, sum total value of orders by store;
val storeTotals = orders.groupingBy { it.store } .eachSumOf { order -> order.total }
-
given orders of all clients, find an order with the maximum total for each client.
val bestClientOrders = orders.groupingBy { it.client } .reduce { k, maxValueOrder, order -> maxOfBy(order, maxValueOrder) { it.total } }
-
Just use
groupBy
and thenmapValues
on a resulting map.- Pro: many operations are supported on a group of values, not just
fold
. - Con: intermediate map of lists is created.
Example:
val frequencies: Map<String, Int> = words.groupBy { it }.mapValues { it.value.size }
- Pro: many operations are supported on a group of values, not just
-
Use Rx.observables.
- Pro: many operations supported
- Con: observable transformations overhead,
asymptotically less than that of
mapValues
.
val frequencies: Map<String, Int> = Observable.from(values) .groupBy { it } .flatMap { g -> g.count().map { g.key to it } } .toBlocking() .toIterable() .toMap()
-
Previous version of this proposal intended to introduce fully inlined operations, such as
groupFoldBy(keySelector, initialValue, operation)
.- Pro: do not require a wrapper
Grouping
object to be allocated. - Pro: may require less intermediate boxing in case of receivers with primitive elements, such as primitive arrays and char sequences.
- Con: having several functional parameters makes an invocation awkward.
- Con: duplicating all the operations for each receiver type implies high method count.
A benchmark was conducted to study the performance impact of not inlining
keySelector
function. That impact was shown to be negligible for receivers with object elements, and on the other side noticeable for receivers with primitive elements. However, the latter is hardly to be the use case covered by this operation. - Pro: do not require a wrapper
Only a subset of Kotlin Standard Library available on all supported platforms is required.
- module:
kotlin-stdlib
- packages:
kotlin.collections
,kotlin.sequences
,kotlin.text
The prototypes are implemented in the repository kotlinx.collections.experimental.
The final implementation has been merged into the standard library, here is the commit set.
It is possible to provide groupingBy
operation for each collection-like receiver, such as
Iterable
, Sequence
, Array
, CharSequence
, (Primitive)Array
,
however for primitive arrays this operation does not make much sense.
The operation returns Map<K, R>
, where K
is the type of the key and R
is the type of the accumulator.
-
Naming options:
-
groupingBy
or justgrouping
There is a risk of confusing
groupingBy
andgroupBy
. -
count
/sumBy
can be misinterpreted as operations on the whole collection, rather on each group.- resolution:
countEach
,sumEachBy
- resolution:
-
Grouping
is often referenced in docs as "grouping source", so maybe we should name itGroupingSource
?
-
-
Which general forms of
fold
/reduce
/aggregate
should we provide?- Method count is increased
- resolution: since there's a single receiver for all operations, the impact on method count is not that big.
- Having them as overloads hurts completion during gradual typing.
- Method count is increased
-
Should we provide
To
-overloads (likegroupByTo
) with a mutable map as a target parameter?- resolution: there are no easy alternatives to achieve the same results without these overloads, so we should provide them.
-
Having primitive fold accumulators stored in a map introduces a lot of boxing.
- resolution: provide optimized implementations of
countEach
andsumEachBy
. Note that, there's no such optimization forcountEachTo
andsumEachByTo
.
- resolution: provide optimized implementations of
-
Should we provide
sumEachByLong
andsumEachByDouble
operations, or wait until KT-11265 is resolved, so we could have them as overloads ofeachSumOf
?- resolution: provide them as overloads of
eachSumOf
, but do not provideeachSumOf
itself until KT-11265 is fixed.
- resolution: provide them as overloads of
- If we do not provide some forms, evaluate whether they could be introduced later.
- Converting collection operations to folds can be error-prone, maybe we should provide some standard reducer functions, such as Count, Sum etc.
- In Java 8 it is possible to engage standard Collectors API by providing the operation
Grouping<T, K>.collectEach(Collector<T, A, R>)