Updated March 20, 2023

Introduction to Hashing in DBMS

Hashing in a Database Management System is a procedural approach applied to rationally find the position of the required data, instead of creating new indexes or using the existing index structures of the database system, by making use of dedicated methods called the ‘Hash functions’ or ‘Buckets’ to search the specific data from the disk memory, as opposed to the traditional methods of using group by or conditional statements for the Indexes of the database systems.

Types of Hashing in DBMS

There are typically two types of hashing techniques in DBMS:

1. Static Hashing
2. Dynamic Hashing

1. Static Hashing

In the case of static hashing, the data set formed and the bucket address is the same. This means that if we try to generate the address for USER_ID=113 by making use of the hashing function modulus 5 then it always provides us the resultant as 3 with the same looking bucket address. In this case, there will not be any change in the address of the bucket provided. Therefore the number of buckets remains constant throughout the operation.

Operation of Statically typed Hashing

a. Searching for a Record: If there is a need for the record to be found, then the exact same hashing function is used to retrieve the address and path of data bucket with the data being stored.

b. Insertion of a New Record: If a new and fresh record is put in a table, then an address is generated for a fresh record based upon the hashing key thereby storing the record onto that location.

Deletion of the Record: In order for the record to be deleted, first that record needs to be fetched which can be deleted. Once that task is done, then the records need to be deleted for that memory address.
Updation of a Record: In order to update the record, we first search the record by making use of the hash-based function and once that is done, then our data record can be said to be in an updated state. In order for us to insert a fresh record in the file and the address which is generated from the hash-based function and data-bucket is non-empty or if the data is already present in the address provided. This situation which particularly arises in case of static hashing can be better-called bucket overflow and therefore there are some ways used to overcome this problem such as:

(i) Open Hashing: If a hashing function generates the address for which the data can be seen already in the stored state, in that case, the next level of the bucket will automatically get allocated. This mechanism can be termed to be a linear probing technique.

For example, if R3 is the fresh address which is needed to be put, then the hash-based function will generate address as the number 102 for the R3 address. The address generated is in full state and therefore the system is meant to search for the new data bucket which is 113 and assign R3 to that data bucket.

(ii) Closed Hashing: When the buckets are completely full, a new bucket is then allocated for a particular hash result which is linked right after the one completed previously and therefore this method is called to be Overflow chaining technique.

For example, R3 is the fresh address which is required to be put in the new table the hashing function is used to generate address as the number 110 to it. This bucket, in turn, is full and therefore cannot receive new data and therefore a fresh bucket is put in the end after 100.

2. Dynamic Hashing

This kind of hash-based method can be used to solve the basic problems of static based hashing like the ones such as bucket overflow as the data buckets can grow and shrink with the size it is more space optimized technique and therefore it is called as Extendable hash-based method. In this method, the hashing is made dynamic which means that the insertion activity or deletion is allowed without going into providing poor performance.

a. Searching a Key: Calculate the hash-based address of the required key and check the number of bits that are being used in the case of a directory which is known as i. Then the ones which are least significant of the I bits are taken from the directory which gives an idea about the index from the directory. By making use of that index value, go into the directory to find the bucket address to search for the present records.

b. Insertion of a Fresh Record: At first, you are required to follow the exact same retrieval procedure which needs to be ending up somewhere in the bucket. Search for the space in that bucket, then put the records inside it. If that created bucket is complete and full, then the bucket will get split and the records are redistributed.

For example, the last two bits of the digits 2 and 4 are 00. So they will go in the bucket B0 and so on so forth according to the modulus function. Key 9 has an address of 10001 which must be present in the first bucket but will get split and will move to the new bucket B1 and this goes on until all the buckets and keys are dynamically hashed. The hash function is used in a way where the hash function is used to choose the column and its value to generate the address. Maximum times the hash function makes use of the primary key which in turn is used to generate the addresses of the data block. It is a simple mathematical function where the primary key can also be considered as the data block address which means that every row with the same address as that of the primary key will be stored in the data block.