Updated March 27, 2023

Introduction to Informed Search

Informed Search refers to search algorithms which help in navigating large databases with certain available information about the end goal in search and most widely used in large databases where uninformed search algorithms can’t accurately curate precise results.

For example, when searching on google maps you give the search algorithm information like a place you plan to visit from your current location for it to accurately navigate the distance, the time taken and real-time traffic updates on that particular route. This is all driven by complex Informed search algorithms powering google maps search functionality.

Types of Informed Search Algorithms

Before getting started with different types of Informed search Algorithms, it’s important to understand some basic concepts like Search Space which refers to space or the database in which the search is to be performed, the Initial State or Start State from where the search begins and the goal state which is the result of the search like our destination in the earlier example of google maps and goal test to check whether the current state is the destination or goal state. Path cost is a numerical term assigned to measure the numeric cost of the path taken to achieve the goal. Heuristic function, which is a function used to measure how close our current state is to the goal state and uses heuristic properties to find out the best possible path with respect to path cost to achieve the goal state.

In informed search algorithms as discussed, we have information on the goal state which narrows down our results precisely. There may be many possible ways to get to the goal state, but we need to get the best possible outcome or path for our search; this is where informed search shines.

1. Pure Heuristic search

A pure heuristic search algorithm is a simple search performed on the basis of heuristic value denoted y h(n) to a node. In a heuristic search, there are two lost created, open for new but unexpanded nodes and closed for expanded nodes, where for every iteration the node with smallest heuristic value is expanded, and all its ‘child’ nodes are expanded and added to close it. A Heuristic function is then applied to that closed list, and the node with the shortest path is saved, and rest are dispersed.

2. Best First or ‘Greedy’ Search

In order to understand the Greedy search, we need to briefly understand the concept of DFS and FS. They’re both vertex-based techniques to find the shortest path. While DFS uses a stack data structure, BFS uses a queue data structure to find out the shortest route.

Greedy search at its core uses the best path from the current state using a combination of both DFS and FS techniques to find the shortest path. The node that is closest to the goal is expanded in that current state and the closest cost to that point. It’s called the greedy search because it may not find the best solution at a given point of time, but it surely does give you an optimal one within a reasonable amount of time. It can be best explained by the example of the Travelling Salesman problem. A salesman is given a list of places to visit in a city and has to find his optimum route to travel in order to be productive to reduce his travelling time as much as possible. Here he’ll be given a choice of choosing between 2 or more places from his current position to go to, and he’ll choose the one with least distance thereby reducing his travel time albeit not being his best destination but that is a roader optimal solution.

3. A* Tree Search

Simply put, A* search combines elements and strengths from a greedy search and a Uniform cost search. The heuristic of A*search is the summation of Heuristic cost in greedy search and that in uniform cost search denoted here,

F(x)= h(x)+g(x)

Where,

h(x): Forward cost referring cost of a node from the current state to the goal state.
g(x): Backward cost which is the cost of a node from the root state or initial state.

Here, as represented, the idea is to select a node with the shortest f(x).

One of the biggest advantages of A* search algorithm is that it’s a complete search algorithm as when compared to a simple heuristic approach which only gives the shortest paths, it also takes into account the optimality of the operation overall and therefore is the most widely used search algorithm and can solve complex functions with complex search space. However, as discussed, it keeps into its memory all of the expansion and generation of nodes, is resource constraint and can’t be used for very large scale operations.

4. A*Graph Search

In Tree search, the branches or nodes are expanded again to newer iterations and thus wasting time in the process whereas in graph search same nodes which are expanded before aren’t expanded. Here the heuristic is represented by consistency, where the graph search is optimal when the forward cost as represented in A*tree search is equal to or less than backward cost between the two nodes.

One of the advantages of graph search over A* search is that it doesn’t store all the nodes and therefore isn’t resource-constrained and can be used for very large-scale operations.

Conclusion

‘Search’ is a very important aspect of getting the desired result out of a very large, complex and ever-growing database of any organization or institution around the world. It is imperative to have a working knowledge of different types of search, be it uninformed and especially informed search in the study of Data Science in general and Database Management Systems in particular. With the advent of AI and ML, there are astounding opportunities to develop and create new ways to search databases and provide users with answers that help them solve their problems.