Ab initio Interview Questions And Answers
Below is the top Ab initio Interview Questions that are asked frequently in an interview. These Interview questions are divided into two parts are as follows:
Part 1 – Ab initio Interview Questions (Basic)
This first part covers basic Ab initio Interview Questions and Answers.
1. What are the components or functions available in ab initio?
The main components in ab initio are here below,
|Dedup||To remove duplicates|
|Join||To join multiple input dataset based on a common key value.|
|Sort||This component reorders the data. It takes the collation order and dumps data to memory|
|Filter||Any conditional related removal of data.|
|Replicate||This is component is mainly for the parallelism as an additional copy of data is useful while any other nodes go unavailable.|
|merge||This component is to combine multiple input data.|
2. What are the types of parallel processing?
This is the common Ab initio Interview questions asked in an interview. Different types of parallel processing are,
- Component parallelism
- Data parallelism
- Pipeline parallelism
Component parallelism: An application that has multiple components running on the system simultaneously. But the data are separate. This is achieved through component level parallel processing.
Data parallelism: Data is split into segments and runs the operations simultaneously. This kind of process is achieved using the data parallelism
Pipeline parallelism: An application with multiple components but running on the same dataset. This uses pipeline parallelism.
3. What is the different way to achieve the partitions?
There are multiple ways to do the partitions.
|Expression||Data split according to the data manipulation language.|
|Key||Grouping the data by specific keys|
|Load balance||Dynamic load balancing|
|Percentage||Segregate the data where the output size is on the fractions of 100|
|Range||Split the data evenly based on a key and a range among the nodes|
|Round robin||Distributing the data evenly in blocksize across the output partitions.|
Let us move to the next Ab initio interview Questions.
4. What is a multifile system?
Multifile is a set of directories on different nodes in a cluster. They possess an identical directory structure. The multifile system leads to a better performance as it is parallel processing where the data resides on multiple disks.
It is created with the control partition on one node and data partitions on the other nodes to distribute the processing in order to improve the performance.
5. Difference between Hadoop and Ab initio?
|Open source||Proprietary software|
|Parallel processing through mappers and reducers||Parallel processing architecture|
|Any variety of data is best suited here||Best for traditional EDW implementations|
|Fault tolerance is achieved||Fault tolerance is not achieved|
|MapReduce is controlled on any components or functions||Components like join, group, sort are easily
|Cheap as its open source and can try out any business use cases.||Expensive and the applicable on a high values business case because of the cost|
|Loosely coupled components where custom functions are built||Tightly coupled between the components as they are highly recommended based on the business use case.|
Part 2 – Ab initio Interview Questions (Advanced)
Let us now have a look at the advanced Ab initio Interview Questions.
6. What kind of layouts does Ab initio support?
- Supports serial and parallel layouts.
- A graph layout supports both serial and parallel layouts at a time.
- A multi-file system is a 4-way parallel system
- A component in a graph system can run 4-way parallel system.
7. What is the relation between Enterprise metadata environment (EME), the Graphical development environment (GDE) and co-operating system?
CoOperating System: It operates on top of the operating system and this is provided by the ab initio and it the base for all Ab Initio processes. Air commands are one of the features that can be installed on different operating systems like UNIX, Linux, IBM etc
These are the following features that it provides,
– Manages and runs Ab Initio graphs and control the ETL processes
– Providing the extensions
– ETL processes monitoring and debugging
– Metadata management and interaction with the EME
GDE: It’s a designing component and used to run the ab initio graphs.
Graphs are formed by the components (predefined or user-defined) and flows and the parameters. It provides the ETL process in Ab Initio that is represented by graphs.
Ability to run, debug the process logs jobs and trace execution logs
Enterprise Meta-Environment (EME): It’s an environment for storage and also metadata management (Both business and technical metadata). The metadata is accessed from the graphical development environment and also the web browser or the cooperating command line. It is ab initio repository for any placeholders.
Let us move to the next Ab initio interview questions.
8.How data is processed and what are the fundamentals of this approach?
There are certain activities which require the collection of the data and the best thing is processing largely depends on the same in many cases. Before processing the data it has to reside on some placeholder like a well-defined storage. This task depends on some major factors are they are
1. Collection of Data
9. What is the difference between partitioning with key and round robin?
This is the advanced Ab initio interview questions asked in an interview. Partition by key
In this, we have to specify the key based on which the partition will occur. It results in well-balanced data due to the key based partitions. It is useful for key dependent
Partition by round robin: In this, distributing data evenly in block size chunks the records are partitioned in a sequential way across the output partition. It is not key
based and results are well-balanced data especially with a block size of 1. It is useful for
record independent parallelism.
10. How do you improve the performance of a graph?
There are many ways the performance of the graph can be improved.
1) Reduce the usage of multiple components on certain phases.
2) Use a refined and well defined value of max core values for sort and join components
3) Minimize the use of regular expression functions like re_index in the transfer functions
4) Minimize sorted join component and if possible replace them by in-memory join/hash join
5) Use only required fields in the sort, reformat, join components
6) Using Phase or the flow buffering during the cases of merge or sorted joins
7) Use hash join if the two sets of input is small else better to choose the sorted join for the huge input size
8) For large dataset better not use broadcast as partitioned
9) Reduce the number of sort components while processing.
10) Avoid repartitioning of data unnecessarily
This has been a guide to List Of Ab initio Interview Questions and Answers so that the candidate can crackdown these Ab initio Interview Questions easily. Here in this post, we have studied about top Ab initio Interview Questions which are often asked in interviews. You may also look at the following articles to learn more –