Updated March 23, 2023

Introduction to Database Parallelism

Database Parallelism is a method of implementing parallel processing in a database. An attempt to increase database functioning in a short period of time. Parallelism in the database covers all the operations that are usually going through in a database, like loading the data, transforming the data, executing the queries, etc. concurrently. This helps to improve data consistency, system performance, parallel access of data, by making use of multiple data sources, multiple system’s memory usages, multiple disk space occupation, Hierarchical flow with lesser time spent compared to individual process execution.

Types of Database Parallelism

Parallelism has the following types as below:

Interquery Parallelism
Independent Parallelism
Pipelined Parallelism
Intraoperative Parallelism

1. Interquery Parallelism

In interquery parallelism, there are different queries or transactions which are run in parallel. By doing this the throughput increases. The response time of transactions which are present will not be faster than the ones when running in isolation. The main purpose of interquery parallelism is that you can increase in transaction processing. It supports a significant number of transactions per second. The advantage of interquery parallelism is the implementation of multi-server and multithreaded systems.

It can efficiently handle a large number of client requests in a few seconds. When multiple requests are submitted then the system can execute the requests in parallel and increase the throughput. There are different server threads that can handle multiple requests at the same time.

Interquery parallelism does not speed up the process as there is only one processor to take care of the query which is being executed. Every query is independent and relatively takes a very short time to execute. The more the number of users more the queries will be generated. Without interquery parallelism, all queries will perform like a single processor in a time-shared manner. The queries are distributed over multiple processors. The interquery parallelism can be implemented successfully on SMP systems where the throughput can also be increased, and it supports concurrent users as well.

2. Intraquery Parallelism

Intraquery parallelism defines the execution of a query on multiple disks. Intraquery parallelism is capable of breaking a single query into multiple sub-tasks. These subtasks which are created can run in parallel using different processors for each. As a result of this, the overall elapsed time is the time needed to execute a single query. This kind of query is useful in systems where decisions are to be made.

The decision support systems have long complex queries that are complex for the system as well. These systems are widely being used and the database vendors are thus increasing support for this type of query parallelism. The application decomposes the serial SQL. This happens when the query decomposes to lower-level operations like scan, join, sort and aggregation.

The lower level operations thus distinguished are executed concurrently. This parallelism divides the database operation like index creation, database load, or SQL queries. These can be executed in parallel in a single database partition. This can be used as an advantage of multiple processors of the multiprocessor server. This parallelism takes advantage of data parallelism and pipeline parallelism.

It scans large indexes and tables. The index or data being used can be partitioned dynamically and queries can be executed in parts. The data can be partitioned based on key values whereas the table can be scanned and partitioned accordingly. It carries distinct operations that will be executed parallelly.

3. Pipelined Parallelism

Pipeline partition breaks the task into the sequence of processing stages. As the concept of pipeline works, it takes the output of previous input and the results are giving as input to the next stage. It is limited and has limited scalability. It can parallelize all the tasks which are dependent and as a result can allow more cases or results to run in parallel.

A stage can consume multiple values before it sends an output which can affect the overall pipelining. The staged reading will start when one processor is being used and the pipeline starts filling with the data which is being read. The next stage will start running on another processor when data is there in the pipeline process and start filling the next pipeline.

4. Intraoperative Parallelism

When a single relational operator given in a query works then it is intraoperative parallelism. In short, it paralyzes the execution of an individual query. Consider a query which is having joins. The query will be joining two tables on a particular common attribute. Parallelism is needed when the tables are huge in size. The order of tuples in a database does not matter in a relational database.

As a result, the tables can be arranged randomly. When a join is involved it is important that each record is matched with every other record in order to complete the join process. Parallelism helps in having the better performance of this query. Many relational operations are present which can help in parallel execution.

There are subsets of the query created which can involve many relational operators or sorting techniques so that operations can take place in parallel. The operations can be range partitioning sort, parallel external sort-merge, partitioned join, fragment and replicate join, partitioned parallel hash join, projection, aggregation, etc. The breaking of any individual query hence helps in improved performance.

Advantages and Disadvantages of Database Parallelism

Following advantages and disadvantages are explained below.

It helps in breaking a query and running it over multiple nodes. It has different types that work in optimizing the process and providing better results.
Parallelism breaks the queries and runs different threads of data.
The resources are distributed and uniformly used.
Parallelism improves the performance of the system.
The disadvantage of database parallelism is that it is not scalable and is limited.

Conclusion

Thus it is the most efficient way of using a database. Distributing the data helps in using the resources in a utilized way. Parallelism improves system performance and helps in maintaining data properly. A large task when divided into smaller tasks hence speeds up the process.