Updated June 26, 2023

Ab initio Interview Questions And Answers

So you have finally found your dream job in Ab initio but are wondering how to crack the Ab initio Interview and what could be the probable Ab initio Interview Questions for 2023. Every interview is different, and the job scope is different too. Keeping this in mind, we have designed the most common Ab initio Interview Questions and Answers for 2023 to help you get success in your interview.

Below are the top Ab initio Interview Questions that are asked frequently in an interview. These Interview questions are divided into two parts as follows:

Part 1 – Ab Initio Interview Questions (Basic)

This first part covers basic Ab initio Interview Questions and Answers.

1. What are the components or functions available in ab initio?

Answer:

The main components in ab initio are here below,

Component	Purpose
Dedup	To remove duplicates
Join	To join multiple input datasets based on a common key value.
Sort	This component reorders the data. It takes the collation order and dumps data to memory.
Filter	Any conditional-related removal of data.
Replicate	This component is mainly for parallelism, as an additional copy of data is useful when any other nodes go unavailable.
merge	This component is to combine multiple input data.

2. What are the types of parallel processing?

Answer:

These is the common Ab initio Interview questions asked in an interview. Different types of parallel processing are:

Component parallelism: An application that has multiple components running on the system simultaneously. But the data are separate. This is achieved through component-level parallel processing.
Data parallelism: Data is split into segments and runs the operations simultaneously. This kind of process is achieved using data parallelism.
Pipeline parallelism: An application with multiple components running on the same dataset. This uses pipeline parallelism.

3. What is the different way to achieve the partitions?

Answer:

There are multiple ways to do the partitions.

Partitions	Description
Expression	Data split according to the data manipulation language.
Key	Grouping the data by specific keys
Load balance	Dynamic load balancing
Percentage	Segregate the data where the output size is on the fractions of 100
Range	Split the data evenly based on a key and a range among the nodes
Round robin	Distributing the data evenly in blocksize across the output partitions.

Let us move to the next Ab initio Interview Questions.

4. What is a multifile system?

Answer:

Multifile is a set of directories on different nodes in a cluster. They possess an identical directory structure. The multifile system leads to better performance as it is parallel processing where the data resides on multiple disks.

5. What is the Difference between Hadoop and Ab initio?

Answer:

Hadoop	Ab initio
Open-source	Proprietary software
Parallel processing through mappers and reducers	Parallel processing architecture
Any variety of data is best suited here.	Best for traditional EDW implementations
Fault tolerance is achieved	Fault tolerance is not achieved
MapReduce is controlled on any components or functions	Components like join, group, and sort are easily performed
Cheap as its open source and can try out any business use cases.	Expensive and applicable to a high valued business case because of the cost
Loosely coupled components where custom functions are built	Tightly coupled between the components as they are highly recommended based on the business use case.

Part 2 – Ab initio Interview Questions (Advanced)

Let us now have a look at the advanced Ab initio Interview Questions.

6. What kind of layouts does Ab Initio support?

Answer:

Supports serial and parallel layouts.
A graph layout supports both serial and parallel layouts at a time.
A multi-file system is a 4-way parallel system.
A component in a graph system can run a 4-way parallel system.

7. What is the relation between the Enterprise metadata environment (EME), the Graphical development environment (GDE), and cooperating system?

Answer:

Cooperating System: It operates on top of the operating system, and the ab initio provides this, and it the base for all Ab Initio processes. Air commands are one of the features that can be installed on different operating systems like UNIX, Linux, IBM, etc

These are the following features that it provides,
– Manages and runs Ab Initio graphs and controls the ETL processes
– Providing the extensions
– ETL processes monitoring and debugging
– Metadata management and interaction with the EME

GDE: It’s a designing component used to run the ab initio graphs.

Graphs are formed by the components (predefined or user-defined) and flows and parameters. It provides the ETL process in Ab Initio that is represented by graphs.

Ability to run, debug the process logs jobs and trace execution logs

Enterprise Meta-Environment (EME): It’s an environment for storage and metadata management (business and technical metadata). The metadata is accessed from the graphical development environment and also the web browser or the cooperating command line. It is an ab initio repository for any placeholders.

Let us move to the next Ab initio interview questions.

8. How is data processed, and what are the fundamentals of this approach?

Answer:

Certain activities require data collection, and the best thing is processing largely depends on the same in many cases. Before processing the data, it has to reside on some placeholder, like well-defined storage. This task depends on some major factors; they are.

Collection of Data
Presentation
Final Outcomes
Analysis
Sorting

9. What is the difference between partitioning with a key and a round-robin?

Answer:

Partition by key, in this, we have to specify the key based on which the partition will occur. It results in well-balanced data due to the key-based partitions. It is useful for key-dependent parallelism. It is useful for recording independent parallelism.

10. How do you improve the performance of a graph?

Answer:

Reduce the usage of multiple components in certain phases.
Use a refined and well-defined value of max core values for sorting and joining components.
Minimize the use of regular expression functions like re_index in the transfer functions.
Minimize sorted join components and, if possible, replace them with in-memory join/hash join.
Use only the required fields in the sort, reformat, and join components.
Using Phase or flow buffering during the cases of merge or sorted joins.
Use hash join if the two sets of input are small; else better to choose the sort join for the huge input size.
For large datasets better not to use broadcast as partitioned
Reduce the number of sort components while processing.
Avoid repartitioning of data unnecessarily.

Quiz Result
Total Questions	Correct Answers	Wrong Answers	Percentage

Ab initio Interview Questions And Answers

Part 1 – Ab Initio Interview Questions (Basic)

1. What are the components or functions available in ab initio?

2. What are the types of parallel processing?

3. What is the different way to achieve the partitions?

4. What is a multifile system?

5. What is the Difference between Hadoop and Ab initio?

Part 2 – Ab initio Interview Questions (Advanced)

6. What kind of layouts does Ab Initio support?

7. What is the relation between the Enterprise metadata environment (EME), the Graphical development environment (GDE), and cooperating system?

8. How is data processed, and what are the fundamentals of this approach?

9. What is the difference between partitioning with a key and a round-robin?

10. How do you improve the performance of a graph?

Recommended Articles

Follow us!

APPS

Blog

Courses

Email