From e38f7ebdf2b4c9d78769a37587176c411107f423 Mon Sep 17 00:00:00 2001 From: soyeric128 Date: Mon, 29 May 2023 04:05:31 -0400 Subject: [PATCH] docs: bitmap datatype (#11607) * Create 44-data-type-bitmap.md * Update index.md * Update 44-data-type-bitmap.md --------- Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> --- .../10-data-types/44-data-type-bitmap.md | 76 +++++++++++++++++++ .../13-sql-reference/10-data-types/index.md | 9 ++- 2 files changed, 81 insertions(+), 4 deletions(-) create mode 100644 docs/doc/13-sql-reference/10-data-types/44-data-type-bitmap.md diff --git a/docs/doc/13-sql-reference/10-data-types/44-data-type-bitmap.md b/docs/doc/13-sql-reference/10-data-types/44-data-type-bitmap.md new file mode 100644 index 000000000000..ab1b3cc779f6 --- /dev/null +++ b/docs/doc/13-sql-reference/10-data-types/44-data-type-bitmap.md @@ -0,0 +1,76 @@ +--- +title: Bitmap +--- +import FunctionDescription from '@site/src/components/FunctionDescription'; + + + +Bitmap in Databend is an efficient data structure used to represent the presence or absence of elements or attributes in a collection. It has wide applications in data analysis and querying, providing fast set operations and aggregation capabilities. + +:::tip Why Bitmap? + +- Distinct Count: Bitmaps are used for efficient calculation of the number of unique elements in a set. By performing bitwise operations on bitmaps, it is possible to quickly determine the existence of elements and achieve distinct count functionality. + +- Filtering and Selection: Bitmaps are effective for fast data filtering and selection. By performing bitwise operations on bitmaps, it becomes efficient to identify elements that satisfy specific conditions, enabling efficient data filtering and selection. + +- Set Operations: Bitmaps can be used for various set operations such as union, intersection, difference, and symmetric difference. These set operations can be achieved through bitwise operations, providing efficient set operations in data processing and analysis. + +- Compressed Storage: Bitmaps offer high compression performance in terms of storage. Compared to traditional storage methods, bitmaps can effectively utilize storage space, saving storage costs and improving query performance. +::: + +Databend enables the creation of bitmaps using two formats with the TO_BITMAP function: + +- String format: You can create a bitmap using a string of comma-separated values. For example, TO_BITMAP('1,2,3') creates a bitmap with bits set for values 1, 2, and 3. + +- uint64 format: You can also create a bitmap using a uint64 value. For example, TO_BITMAP(123) creates a bitmap with bits set according to the binary representation of the uint64 value 123. + +The bitmap data type in Databend is a binary type that differs from other supported types in terms of its representation and display in SELECT statements. Unlike other types, bitmaps cannot be directly shown in the result set of a SELECT statement. Instead, they require the use of [Bitmap Functions](../../15-sql-functions/05-bitmap-functions/index.md) for manipulation and interpretation: + +```sql +SELECT TO_BITMAP('1,2,3') + ++---------------------+ +| to_bitmap('1,2,3') | ++---------------------+ +| | ++---------------------+ + +SELECT TO_STRING(TO_BITMAP('1,2,3')) + ++-------------------------------+ +| to_string(to_bitmap('1,2,3')) | +--------------------------------+ +| 1,2,3 | ++-------------------------------+ +``` + +**Example**: + +This example illustrates how bitmaps in Databend enable efficient storage and querying of data with a large number of possible values, such as user visit history. + +```sql +-- Create table user_visits with user_id and page_visits columns, using build_bitmap for representing page_visits. +CREATE TABLE user_visits ( + user_id INT, + page_visits Bitmap +) + +-- Insert user visits for 3 users, calculate total visits using bitmap_count. +INSERT INTO user_visits (user_id, page_visits) +VALUES + (1, build_bitmap([2, 5, 8, 10])), + (2, build_bitmap([3, 7, 9])), + (3, build_bitmap([1, 4, 6, 10])) + +-- Query the table +SELECT user_id, bitmap_count(page_visits) AS total_visits +FROM user_visits + ++--------+------------+ +|user_id |total_visits| ++--------+------------+ +| 1| 4| +| 2| 3| +| 3| 4| ++--------+------------+ +``` \ No newline at end of file diff --git a/docs/doc/13-sql-reference/10-data-types/index.md b/docs/doc/13-sql-reference/10-data-types/index.md index 5ea727ec0e72..b85e5b5d9f07 100644 --- a/docs/doc/13-sql-reference/10-data-types/index.md +++ b/docs/doc/13-sql-reference/10-data-types/index.md @@ -25,7 +25,8 @@ Databend is capable of handling both general and semi-structured data types. | Data Type | Alias | Sample | Description | |----------------------------------------|-------|----------------------------------|-----------------------------------------------------------------------------------| -| [ARRAY](./40-data-type-array-types.md) | N/A | `[1, 2, 3, 4]` | A collection of values of the same data type, accessed by their index. | -| [TUPLE](./41-data-type-tuple-types.md) | N/A | `('2023-02-14','Valentine Day')` | An ordered collection of values of different data types, accessed by their index. | -| [MAP](./42-data-type-map.md) | N/A | `{"a":1, "b":2, "c":3}` | A set of key-value pairs where each key is unique and maps to a value. | | -| [VARIANT](./43-data-type-variant.md) | JSON | `[1,{"a":1,"b":{"c":2}}]` | Collection of elements of different data types, including `ARRAY` and `OBJECT`. | \ No newline at end of file +| [ARRAY](./40-data-type-array-types.md) | N/A | [1, 2, 3, 4] | A collection of values of the same data type, accessed by their index. | +| [TUPLE](./41-data-type-tuple-types.md) | N/A | ('2023-02-14','Valentine Day') | An ordered collection of values of different data types, accessed by their index. | +| [MAP](./42-data-type-map.md) | N/A | {"a":1, "b":2, "c":3} | A set of key-value pairs where each key is unique and maps to a value. | | +| [VARIANT](./43-data-type-variant.md) | JSON | [1,{"a":1,"b":{"c":2}}] | Collection of elements of different data types, including `ARRAY` and `OBJECT`. | +| [BITMAP](44-data-type-bitmap.md) | N/A | 0101010101 | A binary data type used to represent a set of values, where each bit represents the presence or absence of a value. | \ No newline at end of file