Ab initio Interview Questions And Answers
Below is the top Ab initio Interview Questions that are asked frequently in an interview. These Interview questions are divided into two parts are as follows:
Part 1 – Ab Initio Interview Questions (Basic)
This first part covers basic Ab initio Interview Questions and Answers.
1. What are the components or functions available in ab initio?
Answer:
The main components in ab initio are here below,
Component | Purpose |
Dedup | To remove duplicates |
Join | To join multiple input dataset based on a common key value. |
Sort | This component reorders the data. It takes the collation order and dumps data to memory. |
Filter | Any conditional related removal of data. |
Replicate | This is component is mainly for parallelism as an additional copy of data is useful while any other nodes go unavailable. |
merge | This component is to combine multiple input data. |
2. What are the types of parallel processing?
Answer:
This is the common Ab initio Interview questions asked in an interview. Different types of parallel processing are:
- Component parallelism: An application that has multiple components running on the system simultaneously. But the data are separate. This is achieved through component level parallel processing.
- Data parallelism: Data is split into segments and runs the operations simultaneously. This kind of process is achieved using data parallelism.
- Pipeline parallelism: An application with multiple components but running on the same dataset. This uses pipeline parallelism.
3. What is the different way to achieve the partitions?
Answer:
There are multiple ways to do the partitions.
Partitions | Description |
Expression | Data split according to the data manipulation language. |
Key | Grouping the data by specific keys |
Load balance | Dynamic load balancing |
Percentage | Segregate the data where the output size is on the fractions of 100 |
Range | Split the data evenly based on a key and a range among the nodes |
Round robin | Distributing the data evenly in blocksize across the output partitions. |
Let us move to the next Ab initio Interview Questions.
4. What is a multifile system?
Answer:
Multifile is a set of directories on different nodes in a cluster. They possess an identical directory structure. The multifile system leads to better performance as it is parallel processing where the data resides on multiple disks.
It is created with the control partition on one node and data partitions on the other nodes to distribute the processing in order to improve the performance.
5. Difference between Hadoop and Ab initio?
Answer:
Hadoop | Ab initio |
Open-source | Proprietary software |
Parallel processing through mappers and reducers | Parallel processing architecture |
Any variety of data is best suited here. | Best for traditional EDW implementations |
Fault tolerance is achieved | Fault tolerance is not achieved |
MapReduce is controlled on any components or functions | Components like join, group, sort are easily
performed |
Cheap as its open source and can try out any business use cases. | Expensive and applicable to a high values business case because of the cost |
Loosely coupled components where custom functions are built | Tightly coupled between the components as they are highly recommended based on the business use case. |
Part 2 – Ab initio Interview Questions (Advanced)
Let us now have a look at the advanced Ab initio Interview Questions.
6. What kind of layouts does Ab Initio support?
Answer:
- Supports serial and parallel layouts.
- A graph layout supports both serial and parallel layouts at a time.
- A multi-file system is a 4-way parallel system.
- A component in a graph system can run 4-way parallel system.
7. What is the relation between the Enterprise metadata environment (EME), the Graphical development environment (GDE) and co-operating system?
Answer:
CoOperating System: It operates on top of the operating system, and the ab initio provides this and it the base for all Ab Initio processes. Air commands are one of the features that can be installed on different operating systems like UNIX, Linux, IBM etc
These are the following features that it provides,
– Manages and runs Ab Initio graphs and control the ETL processes
– Providing the extensions
– ETL processes monitoring and debugging
– Metadata management and interaction with the EME
GDE: It’s a designing component and used to run the ab initio graphs.
Graphs are formed by the components (predefined or user-defined) and flows and the parameters. It provides the ETL process in Ab Initio that is represented by graphs.
Ability to run, debug the process logs jobs and trace execution logs
Enterprise Meta-Environment (EME): It’s an environment for storage and also metadata management (Both business and technical metadata). The metadata is accessed from the graphical development environment and also the web browser or the cooperating command line. It is an ab initio repository for any placeholders.
Let us move to the next Ab initio interview questions.
8.How data is processed, and what are the fundamentals of this approach?
Answer:
Certain activities require the collection of the data, and the best thing is processing largely depends on the same in many cases. Before processing the data, it has to reside on some placeholder like well-defined storage. This task depends on some major factors are they are
- Collection of Data
- Presentation
- Final Outcomes
- Analysis
- Sorting
9. What is the difference between partitioning with key and round-robin?
Answer:
This is the advanced Ab initio interview questions asked in an interview. Partition by key
In this, we have to specify the key based on which the partition will occur. It results in well-balanced data due to the key-based partitions. It is useful for key-dependent
parallelism.
Partition by round-robin: In this, distributing data evenly in block size chunks, the records are partitioned in a sequential way across the output partition. It is not key-based, and results are well-balanced data, especially with a block size of 1. It is useful for
record independent parallelism.
10. How do you improve the performance of a graph?
Answer:
There are many ways the performance of the graph can be improved.
- Reduce the usage of multiple components on certain phases.
- Use a refined and well-defined value of max core values for sort and join components.
- Minimize the use of regular expression functions like re_index in the transfer functions
- Minimize sorted join component and, if possible, replace them with in-memory join/hash join.
- Use only required fields in the sort, reformat, join components.
- Using Phase or the flow buffering during the cases of merge or sorted joins.
- Use hash join if the two sets of input are small; else better to choose the sorted join for the huge input size.
- For large dataset better not use broadcast as partitioned
- Reduce the number of sort components while processing.
- Avoid repartitioning of data unnecessarily.
Recommended Articles
This has been a guide to List Of Ab initio Interview Questions and Answers. Here we have listed the most useful 10 interview sets of questions so that jobseeker can crack the interview with ease. You may also look at the following articles to learn more –
- Android Interview Questions
- Statistics Interview Questions
- Minitab Interview Questions
- Splunk Interview Questions
360+ Online Courses | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7
View Course
Related Courses