Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: bitmap datatype #11607

Merged
merged 4 commits into from
May 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions docs/doc/13-sql-reference/10-data-types/44-data-type-bitmap.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
title: Bitmap
---
import FunctionDescription from '@site/src/components/FunctionDescription';

<FunctionDescription description="Introduced: v1.1.45"/>

Bitmap in Databend is an efficient data structure used to represent the presence or absence of elements or attributes in a collection. It has wide applications in data analysis and querying, providing fast set operations and aggregation capabilities.

:::tip Why Bitmap?

- Distinct Count: Bitmaps are used for efficient calculation of the number of unique elements in a set. By performing bitwise operations on bitmaps, it is possible to quickly determine the existence of elements and achieve distinct count functionality.

- Filtering and Selection: Bitmaps are effective for fast data filtering and selection. By performing bitwise operations on bitmaps, it becomes efficient to identify elements that satisfy specific conditions, enabling efficient data filtering and selection.

- Set Operations: Bitmaps can be used for various set operations such as union, intersection, difference, and symmetric difference. These set operations can be achieved through bitwise operations, providing efficient set operations in data processing and analysis.

- Compressed Storage: Bitmaps offer high compression performance in terms of storage. Compared to traditional storage methods, bitmaps can effectively utilize storage space, saving storage costs and improving query performance.
:::

Databend enables the creation of bitmaps using two formats with the TO_BITMAP function:

- String format: You can create a bitmap using a string of comma-separated values. For example, TO_BITMAP('1,2,3') creates a bitmap with bits set for values 1, 2, and 3.

- uint64 format: You can also create a bitmap using a uint64 value. For example, TO_BITMAP(123) creates a bitmap with bits set according to the binary representation of the uint64 value 123.

The bitmap data type in Databend is a binary type that differs from other supported types in terms of its representation and display in SELECT statements. Unlike other types, bitmaps cannot be directly shown in the result set of a SELECT statement. Instead, they require the use of [Bitmap Functions](../../15-sql-functions/05-bitmap-functions/index.md) for manipulation and interpretation:

```sql
SELECT TO_BITMAP('1,2,3')

+---------------------+
| to_bitmap('1,2,3') |
+---------------------+
| <bitmap binary> |
+---------------------+

SELECT TO_STRING(TO_BITMAP('1,2,3'))

+-------------------------------+
| to_string(to_bitmap('1,2,3')) |
--------------------------------+
| 1,2,3 |
+-------------------------------+
```
soyeric128 marked this conversation as resolved.
Show resolved Hide resolved

**Example**:

This example illustrates how bitmaps in Databend enable efficient storage and querying of data with a large number of possible values, such as user visit history.

```sql
-- Create table user_visits with user_id and page_visits columns, using build_bitmap for representing page_visits.
CREATE TABLE user_visits (
user_id INT,
page_visits Bitmap
)

-- Insert user visits for 3 users, calculate total visits using bitmap_count.
INSERT INTO user_visits (user_id, page_visits)
VALUES
(1, build_bitmap([2, 5, 8, 10])),
(2, build_bitmap([3, 7, 9])),
(3, build_bitmap([1, 4, 6, 10]))

-- Query the table
SELECT user_id, bitmap_count(page_visits) AS total_visits
FROM user_visits

+--------+------------+
|user_id |total_visits|
+--------+------------+
| 1| 4|
| 2| 3|
| 3| 4|
+--------+------------+
```
soyeric128 marked this conversation as resolved.
Show resolved Hide resolved
9 changes: 5 additions & 4 deletions docs/doc/13-sql-reference/10-data-types/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ Databend is capable of handling both general and semi-structured data types.

| Data Type | Alias | Sample | Description |
|----------------------------------------|-------|----------------------------------|-----------------------------------------------------------------------------------|
| [ARRAY](./40-data-type-array-types.md) | N/A | `[1, 2, 3, 4]` | A collection of values of the same data type, accessed by their index. |
| [TUPLE](./41-data-type-tuple-types.md) | N/A | `('2023-02-14','Valentine Day')` | An ordered collection of values of different data types, accessed by their index. |
| [MAP](./42-data-type-map.md) | N/A | `{"a":1, "b":2, "c":3}` | A set of key-value pairs where each key is unique and maps to a value. | |
| [VARIANT](./43-data-type-variant.md) | JSON | `[1,{"a":1,"b":{"c":2}}]` | Collection of elements of different data types, including `ARRAY` and `OBJECT`. |
| [ARRAY](./40-data-type-array-types.md) | N/A | [1, 2, 3, 4] | A collection of values of the same data type, accessed by their index. |
| [TUPLE](./41-data-type-tuple-types.md) | N/A | ('2023-02-14','Valentine Day') | An ordered collection of values of different data types, accessed by their index. |
| [MAP](./42-data-type-map.md) | N/A | {"a":1, "b":2, "c":3} | A set of key-value pairs where each key is unique and maps to a value. | |
| [VARIANT](./43-data-type-variant.md) | JSON | [1,{"a":1,"b":{"c":2}}] | Collection of elements of different data types, including `ARRAY` and `OBJECT`. |
| [BITMAP](44-data-type-bitmap.md) | N/A | 0101010101 | A binary data type used to represent a set of values, where each bit represents the presence or absence of a value. |