visit
Like in many databases, shards data into partitions, and then a partition is further divided into buckets. Partitions are typically defined by time or other continuous values. This allows query engines to quickly locate the target data during queries by pruning irrelevant data ranges.
Bucketing, on the other hand, distributes data based on the hash values of one or more columns, which prevents data skew.
: Users specify the partitions in the table creation statement or modify them through DDL statements afterwards.
: The system automatically maintains partitions within a pre-defined range based on the data ingestion time.
In Apache Doris 2.1.0, we have introduced . It supports partitioning data by RANGE or by LIST and further enhances flexibility on top of automatic partitioning.
In Doris, the data table is divided into partitions and then buckets in a hierarchical manner. The data within the same bucket then forms a data tablet, which is the minimum physical storage unit in Doris for data replication, inter-cluster data scheduling, and load balancing.
CREATE TABLE IF NOT EXISTS example_range_tbl
(
`user_id` LARGEINT NOT NULL COMMENT "User ID",
`date` DATE NOT NULL COMMENT "Data import date",
`timestamp` DATETIME NOT NULL COMMENT "Data import timestamp",
`city` VARCHAR(20) COMMENT "Location of user",
`age` SMALLINT COMMENT "Age of user",
`sex` TINYINT COMMENT "Sex of user",
`last_visit_date` DATETIME REPLACE DEFAULT "1970-01-01 00:00:00" COMMENT "Last visit date of user",
`cost` BIGINT SUM DEFAULT "0" COMMENT "User consumption",
`max_dwell_time` INT MAX DEFAULT "0" COMMENT "Maximum dwell time of user",
`min_dwell_time` INT MIN DEFAULT "99999" COMMENT "Minimum dwell time of user"
)
ENGINE=OLAP
AGGREGATE KEY(`user_id`, `date`, `timestamp`, `city`, `age`, `sex`)
PARTITION BY RANGE(`date`)
(
PARTITION `p201701` VALUES LESS THAN ("2017-02-01"),
PARTITION `p201702` VALUES LESS THAN ("2017-03-01"),
PARTITION `p201703` VALUES LESS THAN ("2017-04-01"),
PARTITION `p2018` VALUES [("2018-01-01"), ("2019-01-01"))
)
DISTRIBUTED BY HASH(`user_id`) BUCKETS 16
PROPERTIES
(
"replication_num" = "1"
);
The table is partitioned by the data import date date
, and 4 partitions have been pre-created. Within each partition, the data is further divided into 16 buckets based on the hash value of the user_id
.
With this partitioning and bucketing design, when querying data from 2018 onwards, the system only needs to scan the p2018
partition. This is what the query SQL looks like:
mysql> desc select count() from example_range_tbl where date >= '20180101';
+--------------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0 |
| OUTPUT EXPRS: |
| count(*)[#11] |
| PARTITION: UNPARTITIONED |
| |
| ...... |
| |
| 0:VOlapScanNode(193) |
| TABLE: test.example_range_tbl(example_range_tbl), PREAGGREGATION: OFF. |
| PREDICATES: (date[#1] >= '2018-01-01') |
| partitions=1/4 (p2018), tablets=16/16, tabletList=561490,561492,561494 ... |
| cardinality=0, avgRowSize=0.0, numNodes=1 |
| pushAggOp=NONE |
| |
+--------------------------------------------------------------------------------------+
If the data is distributed unevenly across partitions, the hash-based bucketing mechanism can further divide the data based on the user_id
. This helps to avoid load imbalance on some machines during querying and storage.
CREATE TABLE `DAILY_TRADE_VALUE`
(
`TRADE_DATE` datev2 NOT NULL COMMENT 'Trade date',
`TRADE_ID` varchar(40) NOT NULL COMMENT 'Trade ID',
......
)
UNIQUE KEY(`TRADE_DATE`, `TRADE_ID`)
PARTITION BY RANGE(`TRADE_DATE`)
(
PARTITION p_200001 VALUES [('2000-01-01'), ('2000-02-01')),
PARTITION p_200002 VALUES [('2000-02-01'), ('2000-03-01')),
PARTITION p_200003 VALUES [('2000-03-01'), ('2000-04-01')),
PARTITION p_200004 VALUES [('2000-04-01'), ('2000-05-01')),
PARTITION p_200005 VALUES [('2000-05-01'), ('2000-06-01')),
PARTITION p_200006 VALUES [('2000-06-01'), ('2000-07-01')),
PARTITION p_200007 VALUES [('2000-07-01'), ('2000-08-01')),
PARTITION p_200008 VALUES [('2000-08-01'), ('2000-09-01')),
PARTITION p_200009 VALUES [('2000-09-01'), ('2000-10-01')),
PARTITION p_200010 VALUES [('2000-10-01'), ('2000-11-01')),
PARTITION p_200011 VALUES [('2000-11-01'), ('2000-12-01')),
PARTITION p_200012 VALUES [('2000-12-01'), ('2001-01-01')),
PARTITION p_200101 VALUES [('2001-01-01'), ('2001-02-01')),
......
)
DISTRIBUTED BY HASH(`TRADE_DATE`) BUCKETS 10
PROPERTIES (
......
);
This is an example CREATE TABLE statement for a table that is partitioned by day. The start
and end
parameters are set to -7
and 3
, respectively, meaning that data partitions for the next 3 days will be pre-created, and the historical partitions that are older than 7 days will be reclaimed.
CREATE TABLE `DAILY_TRADE_VALUE`
(
`TRADE_DATE` datev2 NOT NULL COMMENT 'Trade date',
`TRADE_ID` varchar(40) NOT NULL COMMENT 'Trade ID',
......
)
UNIQUE KEY(`TRADE_DATE`, `TRADE_ID`)
PARTITION BY RANGE(`TRADE_DATE`) ()
DISTRIBUTED BY HASH(`TRADE_DATE`) BUCKETS 10
PROPERTIES (
"dynamic_partition.enable" = "true",
"dynamic_partition.time_unit" = "DAY",
"dynamic_partition.start" = "-7",
"dynamic_partition.end" = "3",
"dynamic_partition.prefix" = "p",
"dynamic_partition.buckets" = "10"
);
Over time, the table will always maintain partitions within the range of [current date - 7, current date + 3]
. Dynamic Partition is particularly useful for real-time data ingestion scenarios, such as when the ODS (Operational Data Store) layer directly receives data from external sources like Kafka.
The start
and end
parameters define a fixed range for the partitions, allowing the user to manage the partitions only within this range. However, if the user needs to include more historical data, they would have to dial up the start
value, and that could lead to unnecessary metadata overhead in the cluster.
The former stands for automation and the latter for flexibility. The essence of realizing them both is associating partition creation with the actual data.
AUTO PARTITION BY RANGE (FUNC_CALL_EXPR)
()
FUNC_CALL_EXPR ::= DATE_TRUNC ( <partition_column>, '<interval>' )
The <partition_column>
above is the partition column (i.e., the column that the partitioning is based on). <interval>
specifies the partition unit, which is the desired width of each partition.
For example, if the partition column is k0
and you want to partition by month, the partition statement would be AUTO PARTITION BY RANGE (DATE_TRUNC(k0, 'month'))
. For all the imported data, the system will call DATE_TRUNC(k0, 'month')
to calculate the left endpoint of the partition, and then the right endpoint by adding one interval
.
Now, we can apply Auto Partition to the DAILY_TRADE_VALUE
table introduced in the previous section on Dynamic Partition.
CREATE TABLE DAILY_TRADE_VALUE
(
`TRADE_DATE` DATEV2 NOT NULL COMMENT 'Trade Date',
`TRADE_ID` VARCHAR(40) NOT NULL COMMENT 'Trade ID',
......
)
AUTO PARTITION BY RANGE (DATE_TRUNC(`TRADE_DATE`, 'month'))
()
DISTRIBUTED BY HASH(`TRADE_DATE`) BUCKETS 10
PROPERTIES
(
......
);
mysql> show partitions from DAILY_TRADE_VALUE;
Empty set (0.10 sec)
mysql> insert into DAILY_TRADE_VALUE values ('2015-01-01', 1), ('2020-01-01', 2), ('2024-03-05', 10000), ('2024-03-06', 10001);
Query OK, 4 rows affected (0.24 sec)
{'label':'label_2a7353a3f991400e_ae731988fa2bc568', 'status':'VISIBLE', 'txnId':'85097'}
mysql> show partitions from DAILY_TRADE_VALUE;
+-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+--------------------+--------------+
| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime | DataSize | IsInMemory | ReplicaAllocation | IsMutable | SyncWithBaseTables | UnsyncTables |
+-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+--------------------+--------------+
| 588395 | p200 | 2 | 2024-06-01 19:02:40 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2015-01-01]; ..types: [DATEV2]; keys: [2015-02-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | true | NULL |
| 588437 | p20200101000000 | 2 | 2024-06-01 19:02:40 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2020-01-01]; ..types: [DATEV2]; keys: [2020-02-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | true | NULL |
| 588416 | p20240301000000 | 2 | 2024-06-01 19:02:40 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2024-03-01]; ..types: [DATEV2]; keys: [2024-04-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | true | NULL |
+-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+--------------------+--------------+
3 rows in set (0.09 sec)
Auto Partition by LIST is to shard data based on non-time-based dimensions, such as region
and department
. It fills that gap for Dynamic Partition, which does not support data partitioning by LIST.
AUTO PARTITION BY LIST (`partition_col`)
()
This is an example of Auto Partition by LIST using city
as the partition column:
mysql> CREATE TABLE `str_table` (
-> `city` VARCHAR NOT NULL,
-> ......
-> )
-> DUPLICATE KEY(`city`)
-> AUTO PARTITION BY LIST (`city`)
-> ()
-> DISTRIBUTED BY HASH(`city`) BUCKETS 10
-> PROPERTIES (
-> ......
-> );
Query OK, 0 rows affected (0.09 sec)
mysql> insert into str_table values ("Denver"), ("Boston"), ("Los_Angeles");
Query OK, 3 rows affected (0.25 sec)
mysql> show partitions from str_table;
+-------------+-----------------+----------------+---------------------+--------+--------------+-------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+--------------------+--------------+
| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime | DataSize | IsInMemory | ReplicaAllocation | IsMutable | SyncWithBaseTables | UnsyncTables |
+-------------+-----------------+----------------+---------------------+--------+--------------+-------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+--------------------+--------------+
| 589685 | pDenver7 | 2 | 2024-06-01 20:12:37 | NORMAL | city | [types: [VARCHAR]; keys: [Denver]; ] | city | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | true | NULL |
| 589643 | pLos5fAngeles11 | 2 | 2024-06-01 20:12:37 | NORMAL | city | [types: [VARCHAR]; keys: [Los_Angeles]; ] | city | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | true | NULL |
| 589664 | pBoston8 | 2 | 2024-06-01 20:12:37 | NORMAL | city | [types: [VARCHAR]; keys: [Boston]; ] | city | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | true | NULL |
+-------------+-----------------+----------------+---------------------+--------+--------------+-------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+--------------------+--------------+
3 rows in set (0.10 sec)
Manually adjust historical partitions
For tables that receive both real-time data and occasional historical updates, since Auto Partition does not automatically reclaim historical partitions, we recommend two options:
Use Auto Partition and manually create a LESS THAN
partition to accommodate the historical updates. This allows for a clearer separation of historical and real-time data and makes data management easier.
mysql> CREATE TABLE DAILY_TRADE_VALUE
-> (
-> `TRADE_DATE` DATEV2 NOT NULL COMMENT 'Trade Date',
-> `TRADE_ID` VARCHAR(40) NOT NULL COMMENT 'Trade ID'
-> )
-> AUTO PARTITION BY RANGE (DATE_TRUNC(`TRADE_DATE`, 'DAY'))
-> (
-> PARTITION `pHistory` VALUES LESS THAN ("2024-01-01")
-> )
-> DISTRIBUTED BY HASH(`TRADE_DATE`) BUCKETS 10
-> PROPERTIES
-> (
-> "replication_num" = "1"
-> );
Query OK, 0 rows affected (0.11 sec)
mysql> insert into DAILY_TRADE_VALUE values ('2015-01-01', 1), ('2020-01-01', 2), ('2024-03-05', 10000), ('2024-03-06', 10001);
Query OK, 4 rows affected (0.25 sec)
{'label':'label_96dc3d20c6974f4a_946bc1a674d24733', 'status':'VISIBLE', 'txnId':'85092'}
mysql> show partitions from DAILY_TRADE_VALUE;
+-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+--------------------+--------------+
| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime | DataSize | IsInMemory | ReplicaAllocation | IsMutable | SyncWithBaseTables | UnsyncTables |
+-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+--------------------+--------------+
| 577871 | pHistory | 2 | 2024-06-01 08:53:49 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [0000-01-01]; ..types: [DATEV2]; keys: [2024-01-01]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | true | NULL |
| 577940 | p20240305000000 | 2 | 2024-06-01 08:53:49 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2024-03-05]; ..types: [DATEV2]; keys: [2024-03-06]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | true | NULL |
| 577919 | p20240306000000 | 2 | 2024-06-01 08:53:49 | NORMAL | TRADE_DATE | [types: [DATEV2]; keys: [2024-03-06]; ..types: [DATEV2]; keys: [2024-03-07]; ) | TRADE_DATE | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | true | NULL |
+-------------+-----------------+----------------+---------------------+--------+--------------+--------------------------------------------------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+--------------------+--------------+
3 rows in set (0.10 sec)
NULL partition
With Auto Partition by LIST, Doris supports storing NULL values in NULL partitions. For example:
mysql> CREATE TABLE list_nullable
-> (
-> `str` varchar NULL
-> )
-> AUTO PARTITION BY LIST (`str`)
-> ()
-> DISTRIBUTED BY HASH(`str`) BUCKETS auto
-> PROPERTIES
-> (
-> "replication_num" = "1"
-> );
Query OK, 0 rows affected (0.10 sec)
mysql> insert into list_nullable values ('123'), (''), (NULL);
Query OK, 3 rows affected (0.24 sec)
{'label':'label_f5489769c2f04f0d_bfb65510f9737fff', 'status':'VISIBLE', 'txnId':'85089'}
mysql> show partitions from list_nullable;
+-------------+---------------+----------------+---------------------+--------+--------------+------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+--------------------+--------------+
| PartitionId | PartitionName | VisibleVersion | VisibleVersionTime | State | PartitionKey | Range | DistributionKey | Buckets | ReplicationNum | StorageMedium | CooldownTime | RemoteStoragePolicy | LastConsistencyCheckTime | DataSize | IsInMemory | ReplicaAllocation | IsMutable | SyncWithBaseTables | UnsyncTables |
+-------------+---------------+----------------+---------------------+--------+--------------+------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+--------------------+--------------+
| 577297 | pX | 2 | 2024-06-01 08:19:21 | NORMAL | str | [types: [VARCHAR]; keys: [NULL]; ] | str | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | true | NULL |
| 577276 | p0 | 2 | 2024-06-01 08:19:21 | NORMAL | str | [types: [VARCHAR]; keys: []; ] | str | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | true | NULL |
| 577255 | p1233 | 2 | 2024-06-01 08:19:21 | NORMAL | str | [types: [VARCHAR]; keys: [123]; ] | str | 10 | 1 | HDD | 9999-12-31 23:59:59 | | NULL | 0.000 | false | tag.location.default: 1 | true | true | NULL |
+-------------+---------------+----------------+---------------------+--------+--------------+------------------------------------+-----------------+---------+----------------+---------------+---------------------+---------------------+--------------------------+----------+------------+-------------------------+-----------+--------------------+--------------+
3 rows in set (0.11 sec)
However, Auto Partition by RANGE does not support NULL partitions, because the NULL values will be stored in the smallest LESS THAN
partition, and it is impossible to reliably determine the appropriate range for it. If Auto Partition were to create a NULL partition with a range of (-INFINITY, MIN_VALUE), there would be a risk of this partition being inadvertently deleted in production, as the MIN_VALUE boundary may not accurately represent the intended business logic.
Auto Partition by LIST supports partitioning based on multiple columns, but each automatically created partition only contains one single value, and the partition name cannot exceed 50 characters in length. Note that the partition names follow specific naming conventions, which have particular implications for metadata management. That means not all of the 50-character space is at the user's disposal.
Auto Partition by RANGE only supports a single partition column, which must be of type DATE or DATETIME.
Auto Partition by LIST supports NULLABLE partition column and inserting NULL values. Auto Partition by RANGE does not support NULLABLE partition column.
Previously, without Auto Partition, when a table does not have the required partition, the behavior in Doris is for the BE nodes to accumulate errors until a DATA_QUALITY_ERROR
is reported. Now, with Auto Partition enabled, a request will be initiated to the Doris Frontend to create the necessary partition on-the-fly. After the partition creation transaction is completed, the Doris Frontend responds to the Coordinator, which then opens the corresponding communication channels (Node Channel and Tablets Channel) to continue the data ingestion process. This is a seamless experience for users.
Case 1: 1 Frontend + 3 Backend; 6 randomly generated datasets, each having 100 million rows and 2,000 partitions; ingested the 6 datasets concurrently into 6 tables
Objective: Evaluate the performance of Auto Partition under high pressure and check for any performance degradation.
Results: Auto Partition brings an average performance loss less than 5%, with all import transactions running stably.
Case 2: 1 Frontend + 3 Backend; ingesting 100 rows per second from Flink by Routine Load; testing with 1, 10, and 20 concurrent transactions (tables), respectively
Objective: Identify any potential data backlog issues that could arise with Auto Partition under different concurrency levels.
Results: With or without Auto Partition enabled, the data ingestion was successful without any backpressure issues across all the concurrency levels tested, even at 20 concurrent transactions when the CPU utilization reached close to 100%.