Introduction to PostgreSQL Replication
For maintaining high availability of database and load balancing when a large number of users are accessing the PostgreSQL database, various strategies need to be implemented, and multiple database servers can be used that work together to the server the users. In this case, it is very easy to provide this facility if all the operations are read-only, which means no operation is performed on the data present in the database servers; only the available data is retrieved. In this topic, we are going to learn about PostgreSQL Replication.
But in the case of reading/write, servers that involve all kinds of manipulations on the database’s data becomes difficult. This is because a single change in one database should be reflected in all the database servers that are serving the users. This is called replication of the data. A single change in the database server needs to be done in all the database servers with which it is working to provide high-availability to the end-users.
Working of PostgreSQL Replication
In PostgreSQL, there is usually one database server on which all the modification relate operations and requests are served. Those changes are applied to other database servers called the slave servers or standby database servers. That means the master nodes are always the replicated data’s senders while slave nodes receive the replicated data and apply the same changes over there.
The built-in replication feature that is provided in PostgreSQL is the streaming replication that follows a master-slave structure. However, when cascading is used, even the slave or stand-by database servers can send as well as receive the replicated data to and from the master database server.
The slave of standby servers always keep track of the changes made by the master server and tries to keep themselves updated. There are two types of standby database servers, namely hot and warm standby servers. A server that cannot accept the user connection until and unless it is promoted to the master server is called warm servers. The servers that can accept the user connections and provide the facility of read-only operations on that server are called hot standby servers.
Write-Ahead-Log (WAL) file logging system is used in the streaming replication that marks the transaction committed even before the database in which the change is being done; that is, the target database is changed. However, this difference is minimal; hence there is no inconsistency between the main and replicated database values.
Settings
There are some settings available and that need to be set on each of the master, slave or sending users that will be discussed in the upcoming sections of this article –
Sending database servers
This is the server that sends the replicated data to all the other servers. The master server is always a sending server, and all these settings are required to be set on that server, however in the case where the slave servers also function as sending servers, as in cascading, these settings need to be set on those servers too.
1. max_wal_senders(integer)
This parameter specifies the maximum number of allowed connections with other slave database servers at a single instance are allowed. This parameter’s default value is zero, which means no replication is allowed, and no replicated database connections are supported. This value cannot exceed max_connections as this value is equivalent to the number of current wal connections.
2. wal_keep_segments(integer)
The standby servers require 16-megabytes segments of past log files that are stored in the pg_xlog directory for streaming replication in Postgres. This parameter specifies the minimum number of those segments that are required by the standby servers for proper replication to be performed. This parameter’s value can either be set from the server command line prompt or postgresql.conf configuration file of PostgreSQL.
3. replication_timeout(integer)
If the standby servers are crashed, or some network issues occur, then the connection to that server needs to be terminated. This parameter specifies that after how much time span the connection should be terminated if no response is being received. This parameter’s default value is set to 60 seconds and can only be modified from the server command line prompt or postgresql.conf configuration file of PostgreSQL.
Master database servers
These parameters need to be set on the server that performs the role of the master node or primary node and can also be set on the stand-by servers that can be promoted to master in the coming future.
1. synchronous_standby_names( string )
Whenever the replication is being done, it can be either synchronous or asynchronous. The operation that will lead to modification of the data is not considered committed until and unless all the servers commit that operation in synchronous replication that maintains consistency of data across database servers, while in asynchronous replication, there is a bit time difference between the commitment of transaction and actual change in all the target databases that can return stale results sometimes. synchronous_standby_names parameter stores the comma-separated list of all the standby database servers that support synchronous replication.
2. vacuum_defer_cleanup_age( integer )
This parameter specifies the number of transactions of dead row version cleanups that can be deferred while performing vacuum or hot operations.
Slave/Standby database servers
These settings need to be done on the slave database servers or stand-by database servers that will receive the data to be replicated.
1. hot_standby(boolean)
This parameter has a default value set to false and specifies whether it is allowed to connect and run the queries when the data is being recovered.
2. max_standby_archive_delay( integer )
This parameter has 30 seconds as the default value and is used in the case of hot standby servers specifying the waiting time before cancelling queries of the stand-by servers that might lead to conflict with about-to-be-applied WAL entries.
Besides the above, three more settings need to be set at the slave server, namely max_standby_streaming_delay, wal_receiver_status_interval, and hot_standby_feedback.
Conclusion
We can provide high-availability and balance the load on the database server by maintaining multiple database servers that can work together and implement replication to serve a large number of users without affecting the performance much in the PostgreSQL database. However, you need to be careful while performing replication and make sure that all the settings are set correctly for the respective database servers depending on the role which they are going to perform in replication and request handling.
Recommended Articles
This is a guide to PostgreSQL Replication. Here we discuss the working and settings of PostgreSQL Replication along with the database servers. You may also have a look at the following articles to learn more –