Updated May 17, 2023
Introduction to PostgreSQL Replication
It is necessary to implement various strategies to maintain the high availability of the PostgreSQL database and achieve load balancing in scenarios with a large number of users. One such strategy is utilizing multiple database servers collaborating to serve the users effectively. In this case, it is very easy to provide this facility if all the operations are read-only, which means no operation is performed on the data present in the database servers; only the available data is retrieved. In this topic, we are going to learn about PostgreSQL Replication.
But in the case of reading/write, servers that involve all kinds of manipulations on the database’s data becomes difficult. This is because a single change in one database should be reflected in all the database servers that are serving the users. In this scenario, data replication is utilized to duplicate or replicate data across multiple database servers. It involves making the same change in all the database servers with which the server is operating, ensuring high availability for the end-users.
Working on PostgreSQL Replication
In PostgreSQL, typically, there is one database server that handles all modification-related operations and requests. Other database servers, commonly referred to as slave servers or standby database servers, then replicate or propagate these changes. That means the master nodes are always the replicated data’s senders, while slave nodes receive the replicated data and apply the same changes over there.
PostgreSQL provides a built-in replication feature known as streaming replication, which operates on a master-slave architecture. When utilizing cascading, both the slave or standby database servers have the capability to send and receive replicated data to and from the master database server.
The slave of standby servers always keep track of the changes made by the master server and tries to keep themselves updated. There are two types of standby database servers, namely hot and warm standby servers. Warm servers are servers that require promotion to the master server role before they can accept user connections. On the other hand, hot standby servers are capable of accepting user connections and performing read-only operations.
The streaming replication system in PostgreSQL utilizes the Write-Ahead Log (WAL) file logging system. This system marks a transaction as committed before applying the corresponding change to the target database. However, this difference is minimal; hence there is no inconsistency between the main and replicated database values.
Settings
In the upcoming sections of this article, we will discuss the settings that need to be configured on each of the master, slave, or sending users.
Sending database servers
This is the server that sends the replicated data to all the other servers. The master server always functions as a sending server, and it is necessary to configure all these settings on that server. However, in the case where the slave servers also serve as sending servers, such as in cascading, it is also essential to configure these settings on those servers.
1. max_wal_senders(integer)
The maximum number of allowed connections with other slave database servers at a single instance can be specified by this parameter. The parameter’s default value is zero, indicating the absence of replication support and disallowing replicated database connections. This value cannot exceed max_connections as this value is equivalent to the number of current wal connections.
2. wal_keep_segments(integer)
The standby servers require 16-megabytes of past log files stored in the pg_xlog directory for streaming replication in Postgres. The minimum number of segments required by the standby servers for proper replication can be specified by this parameter. You can set its value either from the server command line prompt or the postgresql.conf configuration file of PostgreSQL.
3. replication_timeout(integer)
If the standby servers crash or if some network issues occur, the connection to that server must terminate. The server command line prompt or the postgresql.conf configuration file of PostgreSQL is where you can modify this parameter. The connection will be terminated if no response is received within a specified time span. The default value for this parameter is 60 seconds.
Master database servers
You should configure the mentioned parameters on the server that serves as the master node or primary node. Additionally, you can also set these parameters on standby servers that may be promoted to the master role in the future.
1. synchronous_standby_names( string )
During replication, there are two modes available: synchronous and asynchronous. In synchronous replication, all participating servers in the replication process must confirm the data modification before it is considered committed. This ensures data consistency across all database servers. On the other hand, asynchronous replication may introduce a slight time difference between the transaction’s commitment and the actual changes being made in the target databases. This can occasionally result in stale results. The “synchronous_standby_names” parameter stores a comma-separated list of standby database servers that support synchronous replication.
2. vacuum_defer_cleanup_age( integer )
This parameter sets the maximum number of dead row version cleanups that the database can defer during vacuum or hot operations.
Slave/Standby database servers
You should configure the mentioned settings on the slave or standby database servers intended to receive the replicated data.
1. hot_standby(boolean)
The default value of this parameter is set to false. It determines whether it is permissible to establish connections and execute queries while data recovery is in progress.
2. max_standby_archive_delay( integer )
The default value for this parameter is 30 seconds. In the context of hot standby servers, PostgreSQL’s streaming replication feature specifies a duration to wait before canceling queries from standby servers. The wait time is crucial for preventing potential conflicts with upcoming Write-Ahead Log (WAL) entries that are about to be applied.
In addition to the above, three more settings must be configured on the slave server: max_standby_streaming_delay, wal_receiver_status_interval, and hot_standby_feedback.
Conclusion- PostgreSQL Replication
We can provide high-availability and balance the load on the database server by maintaining multiple database servers that can work together and implement replication to serve a large number of users without affecting the performance much in the PostgreSQL database. When performing replication, it is crucial to exercise caution and ensure that all settings are properly configured for the respective database servers based on the role they will fulfill in replication and request handling.
Recommended Articles
We hope that this EDUCBA information on “PostgreSQL Replication” was beneficial to you. You can view EDUCBA’s recommended articles for more information.