Introduction to Talend Data Integration
Talend Data integration means combining data from different sources and combining them to a single view to get some meaningful data from that which can help the company or organization to improve their business by analyzing those data. Integration helps in getting data, cleaning the data making some required transformation etc, then loading it into a data warehouse.
What is Talend Data Integration?
- Talend is an ETL tool that is used for data integration. Talend provides a solution for data preparation, data quality, data integration, and big data.
- Talend offers Open Studio which is an open-source for data integration and big data.
- Talend open studio helps in handling huge data with big data components. It has more than 800+ components for various integration purposes. Here we will be discussing some of the components. To make it easy see the below example
- A sim operator has huge data about plans, customers, sim details, etc. These data are huge so big data is also used in the integration.
Customer A buying a sim using a government id.
Giving his name: AB C
Address as: Chennai, Chennai
Phone number: 1234567890
After data integration
First name: AB
Address: Chennai, India
Here the data is cleansed and transformed into something more meaningful.
Benefits of Data Integration
Here we will be discussing the Benefits of Data Integration.
- Analyzing Business trends using data integration
- Combining data into a single system
- Time-saving and more efficient and less rework
- Easy Report generation – used by BI tools
- Maintaining and inserting data into data warehouse and data marts
Application of Talend Data Integration
Here we will be discussing the Application of Talend Data Integration.
1. Working with Talend
- Make sure you have java installed and environment variables set.
- Download the open-source from the Talend website and install the software.
- Create a new project and finish the setup
- Talend will open with the designer tab.
- Talend is an eclipse based tool and the components can be dragged from the palette or you can click and type the components name.
2. First Job Reading a File
- Search for the component tFileinputdelimited. This component is used for reading any delimited files.
- Place the tFileinputdelimited component. Search for tLogRow and place it in the job designer.
- Right click tFileinputdelimited and select row-> main and draw a line to tLogRow.
- In the component, the tab selects the path of the file you want to read and gives the row separator as \n. If the file has delimiter you can mention the delimiter.
- Click the schema and give the column type details or you can read the entire row as a string with one column and delimiter value should be empty.
- You can skip the header and footer also.
- In the tLogRow component select the way how you want to see the data. Table format or single-line format.
- tLogRow displays output in the run console.
- After connecting both tFileinputdelimited and tLogRow run the job from the run tab.
- You can see the file contents in the console printed.
3. Second Job Using Tmap
- Read a file and filter it into different output files.
- Read a file in the tFileinputdelimited component with one column schema as a record.
- Tmap component- This component helps in transforming data with some inbuilt functions like lookup, joins, etc.
- In tmap create two outputs out1 and out2.
- In out1 filter add record.contains(“talend”) and draw the record to out1.
- Draw the record line to other out2.
- From the tmap take main rows and connect to two tFileoutputdelimited.
- out1 link to one tfileoutputdelimited1 as file1.txt and out2 to other tfileoutputdelimited2 as file2.txt.
- txt will have records that contain talend.
- txt will have records that have other names.
4. Built-in and Repository
- Built-in means you should set schema or details for connecting to a database every time.
- The repository comes in handy to save the details in the metadata so that you can reuse the same details every time without manually entering details every time. In metadata, you can save file schema, database connections, Hadoop connection, hive connection, s3 connection and many more.
Components of Talend Data Integration
Here we will be discussing the components of Talend Data Integration.
1. tFileList: This component lists the files in a directory or folder with a given file mask pattern.
2. tMysqlConnection: This component is used for connecting with the MySQL database. Mysql components can use this connection for easy setup of connecting to the database.
3. tMysqlInput: This component helps to run a mysql database query and get the table or columns. This component is used to select queries and get the details.
4. tMysqlOutput: This component is used for inserting or updating data in the Mysql database.
5. tPrejob: This component is the first to execute in the job and can be connected with other components with on Subjob ok.
6. tPostjob: This component is the last to execute in the job. You can connect this with connection close components.
7. tLogcatcher: This component catches the warning and errors in the job. This is the most important component used in the error handling technique. Error logs can be written using this component along with tfileoutputdelimited. There are more than 800+ components.
8. Context variable: Context variables are variables that can be used in the job anywhere. It holds values and can be passed to another job also using tRun components. The uses of context variables are that we can change the value for different purposes. For example, we can have a set of values for the development context group and different set of context values for production. This way we don’t have to change the job just changing the context parameters is enough.
9. Building a job: To build a job right-click the job and select a building job. You can import the build job in TAC. In Talend Administration Console you schedule a job to trigger the job set dependency also. You can also import the job from the Nexus repository using an artifact job.
10. Create a task in TAC: Open job conductor in TAC. Click new tasks and select normal or artifact tasks. Import the build job or select from nexus. Select the job server in which the talend will run. Save the task. Now you can deploy and run the job.
- “Simplify ETL and ELT with the leading free open source ETL tool for big data.” is the tagline for open studio.
- Talend Bigdata has many components for handling huge data.
- Standard job, Bigdata job, Bigdata streaming jobs are the different types of jobs available in Talend.
- Bigdata jobs can be created in a spark or MapReduce framework.
This is a guide to Talend Data Integration. Here we discuss the introduction to Talend Data Integration and the benefits along with applications and components. You can also go through our other suggested articles to learn more