Introduction to Fastload in Teradata
Fastload is primarily a Teradata’s utility that is utilized to load pretty large datasets in the Teradata system. The unique property that makes quite unique is the lightning speed with which it loads the data in the tables within the system.
Using Fastload, the data can be loaded from multiple sources, out of which some are given below:
- Magnetic tape or Disc drives.
- Data from flat files over the network.
- Any other device that contains properly formatted data.
Teradata’s fastload utilizes more than one session as well, along with block-level operations to load the data into the table. That is the reason fastload loads the data much faster as compared to other available utilities.
What is Fastload in Teradata?
Teradata Fastload being a command-driven utility, can be invoked either in inactive mode or either in batches to load large volumes of data into empty Teradata tables. Teradata Fastload inbuilt utility creates multiple sessions, generally one session per Amp by default. Thus enabling the transfer of data by blocks instead of row-level load. Moreover, it does not even utilize the transient journal to load the empty tables. It loads only a single table at a time. If in case you wish to load multiple tables, then the same can be achieved using multiple fastload jobs at a time.
Certain prerequisites are required in place before running a fastload utility in Teradata:
1. Log Table
To keep track of every fastload session running on the system, a log table is required. Under the database named SYSADMIN, Teradata maintains the table called fastlog as the log table. If you want to use these tables, you will require certain privileges that are inserted, delete and update.
2. Empty Target Table
An empty target table is required before inserting data using Teradata’s Fastload process. These empty tables are created automatically by the system. A log tables are also maintained during this process of populating data into this table.
3. Two Error Tables
If in case any errors come up during the population of the empty table under Teradata’s fastload process, then errors are logged into these error tables.
Two error tables are maintained due for the same and are created automatically:
- First Error Table: Any translational or other types of constraining violations are captured in this table. For example, a row with the wrong data type would be reported in this table.
- Second Error Table: If the errors are caused due to duplicates arising for unique primary key indexes, then the errors will be logged to this table.
- Fastload will load only the first instance of the unique primary index, and the second or the duplicate one will be stored in the second error table.
- If in case the entire row is a duplicate, then the fastload will count it but won’t store the same in the second error table.
How does Fastload work in Teradata?
Fastload divides the complete data load process into two subsequent phases as given below:
Phase 1: Acquisition
- The primary objective of phase one is to move the host computer’s data or from the source data file to the Teradata environment. In short, transferring the data to the access module processor (AMP) as soon as feasible.
- To take care of this, the parsing engine does not hash each row one by one.
- Actually, the parsing engine parses the statement only once, once it receives the insert statement from Teradata’s fast load utility.
- Thereafter the parsing engine opens a session connecting the fastload client and the AMP directly. It will open one connection per AMP. For instance, if there are 100 AMP’s then, in that case, 100 connections will be set up, and that’s why Teradata’s fastload loads data so quickly.
- One of these client sessions packs the raw data into 64K Blocks.
- Thereafter, this 64K block of data is randomly sent to the amp, without any concern about which it’s being sent. That’s how data across multiple AMP’s is distributed evenly in Teradata.
- Once the AMP’s receives the data black, it hashes the rows based on the unique primary index and sends it to the concerned AMP. This is how the data is redistributed amongst multiple AMP’s in the Teradata System.
- After the redistribution of rows between AMP’s, the data is then written to the internal worktable, but this remains unsorted.
Phase 2: Application
- This phase’s main responsibility is to write the concerned data to the actual table space, which is the empty table in the disc of the Teradata system.
- Data stored in each AMP’s work tables gets sorted after preparing the same during phase 1.
- Once the sorting is done, these rows are then written to the actual space, which is the empty tables available in the disc permanently.
- Any error encountered during this phase will be stored in the second error table. That is the errors due to duplicates arising for unique primary key indexes that will be stored in the second error table.
Example of Fastload in Teradata
Given below is the example mentioned:
BEGIN LOADING teradatauser.Employee_Staging_table
ERRORFILES Employee_ErrorTab, Employee_Tab_UV
SET RECORD VARTEXT ",";
DEFINE in_EmployeeNUM (VARCHAR(1O0)),
FILE = employee_Data.txt;
INSERT INTO Employee_Stg (
:in_Birth_Date (FORMAT 'YYYY/MM/DD'),
:in_Join_Date (FORMAT 'YYYY/MM/DD'),
This is distributed into three sections:
- Logon to the database and setting up the staging and the error tables.
- Data will be loaded to the staging table using the Insert statement.
- Before that, the staging table is being defined.
- Finally, the connection is closed using the Logoff statement.
Fastload is a Teradata utility that loads the data to Teradata tables quite quickly. It loads the data into two phases, known as the acquisition and the application phase. The errors encountered during this process are saved in the error tables, whereas the log into the log table, respectively.
This is a guide to Fastload in Teradata. Here we discuss the introduction to fastload in Teradata, along with the working and example. You may also have a look at the following articles to learn more –