Updated April 4, 2023

Introduction to Data flow Architecture

A software system is created when information goes through a series of transformations. Data flows through different modules and transformation throughout these modules converts the raw input into the required solution. The raw data is transformed via independent operations and methods until we get to the final output. Data Flow Architecture depicts the workflow followed to create a software system. The workflow consists of a series of transformations on the input information, where information and transformation methods are independent of each other. In a data flow architecture, information is pulled into the system which then flows through several modules and goes through transformation until destination (output or a data store) is achieved. In Data flow architecture, transformations can be reused and modified.

Modules & Components of Data Flow Architecture

There are several approaches available that are followed to Data flow architectures. We will discuss 3 of them, one is very basic, second is intermediate and third provides a wide range of possibilities. These approaches execute sequences between different modules.

Batch Sequential
Pipe and Filter
Process Control

1. Batch Sequential

As the name suggests the task is divided into several batches with its subtask. These batches perform their subtasks and provide results to the next batch. A very basic processing model is followed in batch processing, where the next batch starts only when the previous batch is through.

The basic example of such data flow architecture can be observed in a banking transaction, where once all the details required for the transaction are provided then only the next page is processed and then payment can be made.

2. Pipe and Filter

Pipe and Filter emphasize the incremental transformation of the data to complete the task, which also gives the possibility of providing the ability to process the data concurrently independent of others and later can be combined to draw useful output/Information.

Pipe and Filter architecture as the name suggests provides the flexibility of decomposing the whole system into pipes, filters, and data sinks. These pipes are interconnected and can follow data streams like FIFO (first in first out etc.) to process the information. Here we have the flexibility to use both sequential flows as well as parallel. If all the pipes are connected in series without any parallel connection, it does the job of sequential transformation. Be noted that this sequential is different than batch sequential where the next batch doesn’t start until the current job is not finished.

Pipe: pipes are connectors of filters and posses no state, the job of a pipe is to pass the data stream from one filter to another. From one filter there could be several pipes connecting to different filters, it further depends on the need of the problem, different architectures can propose different connection. The pipe connections are independent of each other.

Filter: Filter is a transformation unit in this Data flow structure and is independent of the data stream. Here the data is transformed in an incremental mode that means data transformation starts as soon as the data enters from the adjacent connected pipe.

A filter is of two types:

Active filter: pull and push the transformed information
Passive filter: push and pull the information with reading and write mechanism.

3. Process Control Architecture

Here the data is not processed through a batch or any pipeline rather the data is processed based on the variables passed to it. The stream of data is processed by composing the whole system into several modules and is connected to process it further.

In-process control unit there are two main units one is the processing unit and the other is the controller unit. The processing unit does the job of changing the variables and the Controlling unit takes into account the changes that have been made.

Below are the elements for the controller unit.

Controlled Variable: It provides the value, which is measured by sensors.
Input Variable: It contains the input information for the controller unit to process.
Manipulated Variable: It is an adjustment variable and can be changed by controller
Process Definition: It defines the process in which the different variables shall be processed.
Sensor: Records values for the variables, also used for feedback to trigger manipulated variables to recalculate the variable.
Setpoint: controlled variable’s desired value.
Control Algorithm: The algorithm provides a deciding approach for manipulating process variables.

Advantages and Disadvantages of Data Flow Architecture

Each of the dataflow structures has its advantages and disadvantages, we will discuss each of them separately.

Batch Sequential

Advantages: Subsystems have simples divisions and these subsystems are independent of other batches. Each batch transforms input data and produce output independent of the adjacent batch.

Disadvantages: It has high latency because each batch must be completed to reach to the second batch. It does not avail of the flexibility of interaction between batches.

Pipes and Filters

Advantages: Pipe and filter have provisions for concurrency and throughput are also fast. That makes it flexible to support both sequential as well as parallel execution. The data flows continuously which makes it reusable and maintenance is also low. It is easy to modify the connection between filters, it does so by offering a simple pipe connection between filters.

Disadvantages: It does not support dynamic interaction and there’s always the possibility of data transformation overhead between the filters. Maintenance of this architecture is not simple.

Process Control Architecture

Advantages: Runtime control of the processes is easy even at the times when control algorithms are subjected to change. These architectures can handle dynamic systems and can process the continuous flow of data.

Disadvantages: specifying the time characteristic is a difficult part of this type of architecture. Disturbed responses could not be handled.

Conclusion

Data flow architecture depicts the workflow followed to create a software system. The workflow consists of a series of transformations on the input information, where information and operations are independent of each other. In this article we have discussed three Data flow architectures namely Batch Sequential, Pipe & filter, and Process Control architecture. We also have discussed the advantages and disadvantages of each. At times one architecture may fit for the problem and other times some other architecture may fit.