Updated March 4, 2023
Introduction to Teradata Partition by
Partition by Teradata usually specifies that one or more partitioning levels partition the table on which it is applied. There are various partition types with a wide range in the no. of altogether combined partitions. However, there are both positive and negative impacts of defining a partition, which is an improvement in performance over the subsequent queries of that table alongside the disc storage, respectively.
- A partitioning level can-not be specified if it includes a row-level security constraint column.
- A character partitioning level can-not is specified for the columns or the constants having Kanji1 or KanjiSJIS server character sets.
- For Global Temporary and Volatile table types, you can not specify or define the column partitioning.
- Parting for NOPI or PA table can be row-level or column level.
How to use Partition by in Teradata?
Partition by is defined as a part of the CREATE TABLE statement to specify. How to distribute the columns over the available AMP’s?
Let us take the below examples to understand them.
Partition Non-Unique Primary Index by Current date function.
CREATE TABLE cust ( customer_name CHARACTER(8), policy_num INTEGER, policy_exp_dt DATE FORMAT 'YYYY/MM/DD') PRIMARY INDEX (customer_name, policy_num) PARTITION BY CASE_N(policy_exp_dt>= CURRENT_DATE, NO CASE);
- Here the partitioning is based on a Non-unique column that is the policy expiration date.
- The data having the expired policies are separated from the non-expired policies by using the current date for partitioning by the policy expiry date.
Partitioning based on a Non-Unique primary index with a unique secondary index defined on the same column with the partitioning by Range_N function.
CREATE TABLE order_table ( order_id INTEGER NOT NULL, cust_id INTEGER, order_stats CHARACTER(1) CASESPECIFIC, total_price DECIMAL(13,2) NOT NULL, order_date DATE FORMAT 'yyyy-mm-dd' NOT NULL, order_priority CHARACTER(21), clerk CHARACTER(16), shipment_priority INTEGER, remarks VARCHAR(79)) PRIMARY INDEX (order_id) PARTITION BY RANGE_N(order_date BETWEEN DATE '1992-01-01' AND DATE '1998-12-31' EACH INTERVAL '1' MONTH) UNIQUE INDEX (order_id);
- In this example, the Order_id column is the Primary Index of the order_table.
- RANGE_N function is used for partitioning, using the order_date between a specified range and an interval of 1 month.
- Order_id column is the unique index as well in this case.
Partitioning by a unique primary Index over a narrow range defined by the RANGE_N function.
CREATE TABLE sales_table ( st_id INTEGER NOT NULL, prod_id INTEGER NOT NULL, sales_dt DATE FORMAT 'yyyy-mm-dd' NOT NULL, tot_revenue DECIMAL(13,2), tot_sold INTEGER, note_remarks VARCHAR(256)) UNIQUE PRIMARY INDEX (st_id, prod_id, sales_dt) PARTITION BY RANGE_N(sales_dt BETWEEN DATE '2001-01-01' AND DATE '2001-05-31' EACH INTERVAL '1' DAY);
- This example creates a unique primary index on the st_id, prod_id and the sales_dt, Whereas the partition is based on the RANGE_N function over sales_dt with a specified date range along within interval specified.
- The records are distributed over the AMP’s based on the partitioning; that’s why partition by is defined during the CREATE TABLE statement.
How partition works in Teradata. Partitioning by NUSI without a USI and partitioning by RANGE_N function.
CREATE TABLE lineitem_table ( orderkey INTEGER NOT NULL, partkey INTEGER NOT NULL, suppkey INTEGER, linenumber INTEGER, quantity INTEGER NOT NULL, extendedprice DECIMAL(13,2) NOT NULL, discount DECIMAL(13,2), tax DECIMAL(13,2), returnflag CHARACTER(1), linestatus CHARACTER(1), shipdate DATE FORMAT 'yyyy-mm-dd', commitdate DATE FORMAT 'yyyy-mm-dd', receiptdate DATE FORMAT 'yyyy-mm-dd', shipinstruct VARCHAR(25), shipmode VARCHAR(10), comment VARCHAR(44)) PRIMARY INDEX (orderkey) PARTITION BY RANGE_N(shipdate BETWEEN DATE '1992-01-01' AND DATE '1998-12-31' EACH INTERVAL '1' MONTH);
- This example creates a non-unique primary index on the column orderkey, whereas the partition expression is based on the RANGE_N function on shipdate.
- No secondary index is defined in the column.
Partitioning on Unique primary index defined on Extract function.
This example creates a unique primary index based on st_id, prod_id, sales_dt. The partitioning is based on the Extract function by sales_dt. No secondary index is defined.
CREATE TABLE sales_month ( st_id INTEGER NOT NULL, prod_id INTEGER NOT NULL, sales_dt DATE FORMAT 'yyyy-mm-dd' NOT NULL, tot_revenue DECIMAL(13,2), to_sold INTEGER, remarks VARCHAR(256)) UNIQUE PRIMARY INDEX (st_id, prod_id, sales_dt) PARTITION BY EXTRACT(MONTH FROM sales_dt);
- Here the data of the sales_month table will be distributed by the month of the same date.
For example, the sales had the same month of sales date will be partitioned together.
- This usually helps in extracting the data having the sales from the same month more efficiently. That’s why partitioning is done to have a better execution plan.
- The execution plan for any query can be seen by simply modifying the query and putting EXPLAIN in front of it.
EXPLAIN select * from sales_month
- First of all, the table is locked to read the reserved row hash in a single partition.
- Thereafter, the all-AMP retrieving step is followed by a residue condition built in the spool space itself.
- Finally, an END TRANSACTION command is sent out to all AMP’s involved in processing the request.
How can we qualify the partitioning to avoid ambiguity?
SELECT * FROM orders_table , lineitem_table WHERE orders.PARTITION = 3 AND lineitem_table.PARTITION = 5 AND orders_table.o_orderkey = lineitem_table.l_orderkey;
SELECT orders_table.*, lineitem_table.*, orders_table.PARTITION FROM orders_table, lineitem_table WHERE orders_table.PARTITION = 3 AND lineitem_table.PARTITION = 5 AND orders_table.o_orderkey = lineitem_table.l_orderkey;
- Here in partitioning is utilized with the select clause to avoid the ambiguity between the partition values in the order table.
Advantages of Teradata Partition by
Below are the advantages:
- If partitioning is defined, then full scans of the tables can be avoided improving the query time.
- Usage of the secondary index can be avoided, which requires additional query time.
- We can access a part of a subset of the large table quickly if partitions are defined.
- The data can be dropped quickly, as well.
- The reason being, we the PE, know where the data resides and eventually, the scan time is reduced.
Partitions are defined during the CREATE TABLE statement Partitioning allows us to distribute the data in a table across various AMP’s based on the selected field and the selected logic so that the retrieval process isn’t time-consuming. The query can be optimized using the partition information by the Parsing engine.
We hope that this EDUCBA information on “Teradata Partition by” was beneficial to you. You can view EDUCBA’s recommended articles for more information.