Introduction to PostgreSQL Architecture
As we know that PostgreSQL is the most advanced relational database management system. PostgreSQL architecture uses two models.
- Client
- Server
The client sends a request to PostgreSQL server and the PostgreSQL server gives a response to the client request. In typical application client and server is on different host in that scenario they will communicate through TCP/IP network connection. PostgreSQL server handling multiple concurrent sessions from the client. To achieve this by starting a new process for each connection. From that point, the new server process and the client will communicate without the intervention of another process. PostgreSQL has its own background processes to manage the PostgreSQL server.
Architecture of PostgreSQL
The physical structure of PostgreSQL is very simple, it consists of the following components:
- Shared Memory
- Background processes
- Data directory structure / Data files
The below figure shows the PostgreSQL Architecture.
1. Shared Memory
Shared memory refers to the memory reserved for transactional and another log catches. Shared memory consist of the following components:
Shared Buffers
- We need to set some amount of memory to a database server for uses of shared buffers. The default value of shared buffers in 9.2 and the older version is 32 megabytes (32 MB) from 9.3 and the later default value of shared buffers is 128 megabytes (128 MB).
- If we have a dedicated server for PostgreSQL, reasonable starting to set shared buffers value is 25% of total memory. The purpose of shared buffers is to minimize server DISK IO.
WAL Buffers
- WAL buffers temporarily store changes in the database, which changes in the WAL buffers are written to the WAL file at a predetermined time. At the time of backup and recovery, WAL buffers and WAL files are very important to recover the data at some peak of time.
- The minimum value of shared buffers is 32 KB. If we set this parameter as wal_buffers = -1 it will set based on shared_buffers.
Work Memory
- Set the specific memory to per client connections to be used by internal sort of operations and hash tables to writing data into temporary disk files.
- The default value of work memory in 9.3 and the older version is 1 megabyte (1 MB) from 9.4 and later default value of work memory is 4 megabytes (4 MB).
Maintenance Work Memory
- We need to specify the maximum amount of memory for database maintenance operations such as VACUUM, ANALYZE, ALTER TABLE, CREATE INDEX, and ADD FOREIGN KEY, etc.
- The default value of maintenance work memory in 9.3 and the older version is 16 megabytes (16 MB) from 9.4 and later default value of maintenance work memory is 64 megabytes (64 MB).
- It is safe to set maintenance work memory is large as compared to work memory. Larger settings will improve the performance of maintenance (VACUUM, ANALYZE, ALTER TABLE, CREATE INDEX, and ADD FOREIGN KEY, etc.) operations.
2. Background Processes
Below are the background processes of PostgreSQL. Each process has its individual features and PostgreSQL internals. Each process details will be described as follows:
Background Writer process: In PostgreSQL 9.1 version background writer regularly does the checkpoint processing. But in PostgreSQL 9.2 version checkpointer process was separated from the background writer process. It will keep logs and backup information up to date.
WAL Writer: This process writes and flushes periodically the WAL data on the WAL buffer to persistent storage.
Logging Collector: This process also called a logger. It will write a WAL buffer to WAL file.
Autovacuum Launcher: When autovacuum is enabled, this process has the responsibility of the autovacuum daemon to carry vacuum operations on bloated tables. This process relies on the stats collector process for perfect table analysis.
Archiver: If we enable archive mode, this process has the responsibility to copy the WAL log files to a specified directory.
Stats Collector: In this process, Statistics information like pg_stat_activity and for pg_stat_database is collected. Information is passed from the stats collector via temporary files to requesting processes.
Checkpointer Process: In PostgreSQL 9.2 and later versions checkpoint process is performed. The actual work of this process is when a checkpoint occurs it will write dirty buffer into a file.
Checkpointer: Checkpointer will write all dirty pages from memory to disk and clean shared buffers area. If PostgreSQL database is crashed, we can measure data loss between last checkpoint time and PostgreSQL stopped time. The checkpoint command forces an immediate checkpoint when the command is executed manually. Only database superuser can call checkpoint.
The checkpoint will occur in the following scenarios:
- The pages are dirty.
- Starting and restarting the DB server (pg_ctl STOP | RESTART).
- Issue of the commit.
- Starting the database backup (pg_start_backup).
- Stopping the database backup (pg_stop_backup).
- Creation of the database.
3. Data Files / Data Directory Structure
- PostgreSQL consist of multiple databases this is called a database cluster. When we initialize PostgreSQL database template0, template1 and Postgres databases are created.
- Template0 and template1 are template databases for new database creation of user it contains the system catalog tables.
- The user database will be created by cloning the template1 database.
PGDATA directory contains several subdirectories and control files are as follows.
- pg_version: It contains database version information.
- base: Containing database subdirectories.
- global: Containing cluster wise tables such as pg_user.
- pg_clog: Containing transaction commits status data.
- pg_multixact: Containing multi transaction status data (used for shared row locks).
- pg_notify: Containing LISTEN/NOTIFY status data.
- pg_serial: Containing information about committed serializable transactions.
- pg_snapshots: Containing exported snapshots.
- pg_stat_tmp: Containing temporary files for the statistics subsystem.
- pg_subtrans: Containing sub-transaction status data.
- pg_tblspc: Containing symbolic links to tablespaces.
- pg_twophase: Containing state files for prepared transactions.
- pg_xlog: Containing WAL (Write Ahead Log) files.
- pid: This file containing the current postmaster process ID (PID).
Conclusion
PostgreSQL architecture mainly divided into two models client and server. The client sends a request to the server, PostgreSQL server process the data using shared buffers and background processes and send a response back to the client. The data directory contains the physical file of the PostgreSQL database server.
Recommended Articles
This has been a guide to PostgreSQL Architecture. Here we discuss the basic concept and architecture of PostgreSQL respectively. You may also have a look at the following articles to learn more –