Introduction to PostgreSQL cluster
The database’s main purpose is to store the data required by the user and make it available for use and manipulation whenever required. Even though this statement seems to be easy to say and fulfill, in reality, it is very tough to make sure that the single database is available to all the users present in huge numbers. High availability is always a challenge faced by database engineers. To overcome these, multiple techniques are used. One of the crucial ones is the clustering of the tables of the database. Clustering is an operation that is performed once, and that leads to reordering of the data in the tables based on a certain index that is specified, and if not, any index present on that table is considered for clustering.
Syntax :
CLUSTER [VERBOSE] nameOfTable [ USING nameOfIndex ]
or
CLUSTER [VERBOSE]
- nameOfTable: It is the name of the table that you wish to cluster.
- NameOfIndex: The name of the index is already defined on the table named nameOfTable on which you want to do clustering. It is the optional parameter. When not specified, the index defined on the table is considered for clustering. Even the name of the table is optional. When the table name is not specified, all the current database tables are clustered based on their respective indexes.
- Verbose: This is an optional parameter which, when specified, the information about the clustering process being performed after firing the CLUSTER query is printed on the console.
Working of the PostgreSQL cluster
This is a one-time operation. Once done on the table, all the records are reordered based on the table’s defined index. After that, whenever the manipulations are done on the table like if we insert, update or delete the records in it, the order during clustering is not maintained. For that, you can again perform clustering operation periodically as and when possible or can take the help of the storage parameter named FILLFACTOR that can be set against the table, which, when set to less than 100%, can maintain the clustering order while performing the update operation as the values are tried to store on the same page provided if the space is available on that page.
If clustering is being done inside the transaction block, then it won’t be cluster operation won’t be performed. When none of the parameters is specified while firing the cluster command, then depending on which user is firing, the command steps are taken. If the user is a superuser, then all the tables present in that database are clustered and reclustered in case if they were already clustered based on their respective index. If it is not the superuser who is issuing the command, then the tables that that user owns are reclustered is previously clustered and are clustered if the operation is being performed for the first time on that table.
When we do clustering on a particular table, then during that time, we cannot perform any read or write operation on it as the table is exclusively locked for access and can only be accessed once the cluster operation is completed. Also, note that SQL does not support any statement like CLUSTER, while in PostgreSQL cluster operation is compatible with all the latest versions as well as versions previous to PostgreSQL 8.3.
Examples to Implement of PostgreSQL cluster
Below are the examples of PostgreSQL cluster:
For performing C to your PostgreSQL database psql command shell using the psql command and enter the password if required as follows –
Example #1 – PostgreSQL Database Command Shell
Command: psql
Output:
Now, let us check the available tables in my database using the command \dt that results in the following output in my case.
Command: \dt
Output:
As we can see, we have 5 tables present in my database. Let us describe the table named educba to find out keys defined on that table using the \d educba command that results in the following output –
Command:\d educba;
Output:
We can see that a primary key constraint named the educba_pkey index has been defined on the column technical_id for table educba. If you don’t have any table present in your database, you can create the one using the CREATE TABLE statement and define the index on that table. This table will be further used for clustering.
Example #2 – Clustering the Table
Now, we will cluster the educba table on the index named educba_pkey using the following query statement –
Command: CLUSTER educba USING educba_pkey;
Output:
As we can see, the only CLUSTER is the output we get after clustering the educba table, which is correct if the cluster operation is performed successfully. But none of the information related to the cluster operation that was performed is printed on the console. To do so, we can make use of the VERBOSE parameter. After specifying it, all the cluster operation-related information is printed. Let us try to execute the following command.
Command: CLUSTER VERBOSE educba USING educba_pkey;
Output:
After using VERBOSE, we can see all the CLUSTER operation-related information printed on the console specifying how many dead rows were removed and how much time the query execution took.
After this, whenever we operate CLUSTER on the educba table, it will consider educba_pkey for reclustering as PostgreSQL remembers all the keys on which the tables were clustered.
Command: CLUSTER VERBOSE educba;
Output:
We can perform the cluster operation once manually based on a certain key on which we wish to perform reordering. After that, write the script that will recluster the table periodically by performing the CLUSTER tableName command. The reordering is maintained even after multiple updations and insertions in the table.
Conclusion
Clustering highly increases the performance and retrieves the query resultset very quickly as the execution, i.e. lowers done because the records are now present in the single location after clustering and are not in the distorted format. This gradually affects the availability of the database for users. As execution time decreases, the access time decreases, making the database available for other users, thus achieving high performance and database availability.
Recommended Articles
This is a guide to the PostgreSQL cluster. Here we discuss the Introduction to PostgreSQL cluster and the practical examples and different subquery expressions. You can also go through our suggested articles to learn more –