Introduction to Teradata Performance Tuning
Teradata performance tuning involves a. Optimization of its database by normalizing key tables, using correct data types, rightsizing the number of columns, avoiding null values, using right join operations, and deploying correct indexes and server cache b. Fine-tuning database quires by tweaking SQL to the benchmark standards and following certain dos and don’ts while coding SQL statements and c. Altering Compute, storage, and network environment and parameters to enhance the database performance.
Performance tuning is essential to any organization to improve customer experience by improving front-end response time with faster retrieval of data from the database and cutting unwanted database activities. Any slowness in the database will reflect in application performance. In digital business operations, any such slowness will drive the customers to move away from competitors, resulting in a financial loss to the company.
In this article, let’s examine ways and means of performance tuning in Teradata applications.
Features of Teradata
- One of the renowned relational database management used mainly in building very large data warehousing applications.
- Owned by a company called Teradata started in 1979 and constantly upgraded till 2014
- The workload is shared among multiple Processors and offers abundant parallel processing capabilities.
- Has dedicated processor nodes with attached disk storage to store records evenly and retrieve them independently.
- It is scalable linearly with the addition of nodes, and it can be easily connected to mainframe or Network systems.
- It supports SQL, a host of utilities to facilitate the uploading of data into the Teradata database.
Steps in Teradata performance tuning
As mentioned earlier, Teradata performance tuning can occur at three levels viz., a. Database level, b. Optimizing SQL queries, c. Tweaking hardware environment. Various steps in all the levels are explained below.
1. Database Tuning
The database structure should be sound enough to store large volumes of data and retrieve them faster to ensure better performance of the Teradata system.
Normalizing the tables
- Each column should have a unique name, and it should have a single value
- Every table should have a primary key field, and all the non-key fields in the table should be functionally fully dependent on the primary key.
- There should not be any dependency between non-key fields in a table
- Multivalued dependency should be avoided.
Benefits of Normalization are
- It avoids duplication of records and the wrong multiplication of records during the join operation.
- Improves consistency of data and accuracy in query results
- Faster Execution due to nil redundant records and quicker Search operations using Index.
Indexing helps in retrieving records faster by cutting short full table access. There are two types of Indexes, namely the Primary Index and Secondary Index. The primary index column should be unique, and the other columns should be dependent on the primary index column. Secondary indexes are created depending on the data demography and how that column is used in the where clause.
This process splits a large table into multiple smaller tables. Logical grouping of tables facilitates easy maintenance of records and quicker retrievals of records as the search is restricted to a smaller volume of data resulting in an overall reduction in response time. The rows should be grouped and partitioned on a column in such a way that user accesses are restricted within the group and do not cut across partitions.
Usage of right data types
- For the numeric field, the right-sized data type should be chosen as per the maximum value to be stored, and for non-numeric fields, it should be as per the maximum length. Shorter length data types are always better for better performance.
- Date and time types should be used wherever it is necessary to avoid wasteful conversion.
- The column used in the join operation should have the same type in all the tables to avoid unnecessary conversion during Join.
Avoid Null values
Null is a no value in database parlance, and generally, it should be avoided in columns involved in arithmetic, aggregation, string manipulations, and conditionals operations since the results are unpredictable. One way to avoid this error is to explicitly check the null value in the coding, but it slows the performance level.
Too many Columns
Avoid too many columns in a table as it has heavy overhead and pulls down the performance of the database wherever possible design tables with a smaller number of columns split logically for smoother and faster operation.
Server Query Caching
This Caching is also known as the pinning of tables in the memory of the server. Copies of the frequently accessed tables are maintained in the memory, and any queries accessing this table will refer to the hot copies from memory instead of accessing the database to get the results quickly and speed up the performance. Any change in the contents of the table will trigger refreshing the contents in the memory.
2. Query optimization
- In where clause, the column with Indexes should only be used as far as possible, and a full table scan should be avoided.
- Leading wild cards should be avoided since Indexes will not be used in such search operations.
- Logical OR conditions should be used judiciously, and if one of the OR conditions is not on an indexed column, then the optimizer may choose a full table scan. Alternatively, a union clause can be used.
- Like expression where clause should be avoided as far as possible, the optimizer can choose a full table scan.
- Instead of IN expression, EXISTS should be used to improve database performance.
- Use DISTICT in lieu of GROUP BY to save query time
- Avoid UNION and use UNION ALL since UNION takes more time in filtering duplicates
- Have manageable smaller SQL statements against a big one.
- Use Joints with a fewer number of tables and fine-tune them.
3. Monitoring Database performance and fine-tuning
- Statistics should be collected for critical columns depending upon the data demography, and it should also be collected for the non-index column used in the where clause. Statistics should be updated frequently.
- Monitor whether the execution plan uses an index or resorting to a full table scan and study the bottlenecks.
- Use Optimizer and Explain options along with statistics data to track whether Indexes created are used effectively and their impact on the performance of the database.
- Add new indexes and drop existing indexes.
- Suggest code changes in SQL where clause to overcome bottleneck.
The performance of the database should be monitored continuously using DB tools, and any bottle necks should be analyzed and resolved. Along with DB monitoring, application design and coding should also be revisited for a stable performance level.
This is a guide to Teradata Performance Tuning. Here we discuss the ways and means of performance tuning in Teradata applications along with the features and steps. You may also have a look at the following articles to learn more –