Updated May 15, 2023

Introduction to PostgreSQL cluster

The main purpose of a PostgreSQL database is to store the data required by the user and make it available for use and manipulation whenever required. Even though this statement seems easy to fulfill, in reality, it is tough to ensure that a single database is available to all the users present in huge numbers. Database engineers often face the challenge of maintaining high availability in their systems. The PostgreSQL cluster technique involves clustering database tables to improve performance and availability. The cluster operation performs once and reorders the data in a table based on a specified index. If you do not specify a particular index for clustering, the clustering operation may consider any index present on the table.

Syntax :

CLUSTER [VERBOSE] nameOfTable [ USING nameOfIndex ]

CLUSTER [VERBOSE]

nameOfTable: It is the name of the table that you wish to cluster.
NameOfIndex: The table named “nameOfTable” already has an index with the specified name that you want to use for clustering. It is an optional parameter. If you do not explicitly specify, the index defined on the table will be used for clustering. Even the name of the table is optional. If you do not specify a table name, the cluster operation will cluster all tables in the current database based on their respective indexes.
Verbose: When you specify this optional parameter, information about the clustering process will be printed on the console after firing the CLUSTER query.

Working on the PostgreSQL cluster

The cluster operation is a one-time process.When you perform the cluster operation on a table, it reorders all records in the table based on the table’s defined index. After the cluster operation, any manipulations, such as inserting, updating, or deleting records in the table, will not maintain the order established during the clustering process. To maintain the clustering order after performing updates on a table, you can either periodically perform the clustering operation or set the FILLFACTOR storage parameter against the table. Setting FILLFACTOR to less than 100% enables maintaining the clustering order during update operations by attempting to store values on the same page if there is available space.

If you perform clustering inside a transaction block, the cluster operation will not be executed.The actions that are executed if you run the cluster command without supplying any parameters depend on the user running it.If you are a superuser, you can cluster all tables in the database and recluster them based on their respective indexes, even if they have already been clustered before. Any tables held by that user will either be clustered for the first time if the action is being performed on a table, or they will be reclustered if a non-superuser gives the command.

When you cluster a particular table, the operation exclusively locks it for access, and it prevents any read or write operations on the table until the clustering operation is complete. After the cluster operation is completed, the table can be accessed again. Also, note that SQL does not support any statement like CLUSTER, while in PostgreSQL cluster operation is compatible with all the latest versions as well as versions previous to PostgreSQL 8.3.

Examples to Implement of PostgreSQL Cluster

Below are the examples of PostgreSQL cluster:

For performing C to your PostgreSQL database psql command shell using the psql command and enter the password if required as follows –

Example #1 – PostgreSQL Database Command Shell

Command:

psql

Output:

Now, let us check the available tables in my database using the command \dt, that results in the following output in my case.

Command:

\dt

Output:

As we can see, we have 5 tables present in my database. Let us describe the table named educba to find out the keys defined on that table using the \d educba command that results in the following output –

Command:

\d educba;

Output:

We can see that the educba table has a primary key constraint named “educba_pkey” defined on the “technical_id” column. If you don’t have a table in your database, you can create one using the CREATE TABLE statement and define the index on that table. This table will be further used for clustering.

Example #2 – Clustering the Table

Now, we will cluster the educba table on the index named educba_pkey using the following query statement –

Command:

CLUSTER educba USING educba_pkey;

Output:

The CLUSTER statement is the only output we receive after successfully clustering the educba table. The successful completion of the cluster operation is indicated by this message. However, the console does not print any information related to the performed cluster operation.We can utilize the VERBOSE parameter to accomplish this. If you specify it, all information related to the cluster operation will be printed. Let’s try to execute the following command.

Command:

CLUSTER VERBOSE educba USING educba_pkey;

Output:

After we use VERBOSE, it prints all the information related to CLUSTER operations on the console, including the number of dead rows removed and the query execution time.

After this, whenever we operate CLUSTER on the educba table, it will consider educba_pkey for reclustering as PostgreSQL remembers all the keys on which the tables were clustered.

Command:

CLUSTER VERBOSE educba;

Output:

We can perform the cluster operation once manually based on a certain key on which we wish to perform reordering. After that, write the script that will recluster the table periodically by performing the CLUSTER tableName command. The reordering that occurs during the clustering process remains maintained even after multiple updates and insertions are performed on the table.

Conclusion

Clustering can greatly improve performance by reducing the query execution time. This is because clustering organizes the records in a single location, which eliminates the need to search for them in a scattered format. As a result, the faster retrieval of the query result set can enhance the database’s availability for users over time. Faster execution time and access result in improved performance and availability of the database for multiple users.