Updated April 18, 2023

Introduction to OpenCV OCR

The OpenCV OCR is a command present in the open-source computer vision library, which consists of various functions that aid in programming that is majorly designed to help in programs associated with computer vision that work on a real-time platform and computation. The OpenCV OCR function stands for optical character recognition, which is designed to read an image file provided by the user and then recognize the text given within the image to be displayed to the user. This text can be further utilized for any purpose that the user may need to make use of the extracted text.

Required Installation to be Made for Using OpenCV OCR

In order for the system to be enabled to perform optical character recognition from a provided image, the system needs to 1st be installed with tesseract version 4, which is inclusive of a very accurately designed deep learning and computer intelligence model, which is specifically made for text and character recognition.

Pip install open cv – python

pip install b pytesseract

How does OpenCV OCR Function Work?

The image that is provided by the user first needs to have a change in the color space and get stored temporarily as a variable. The OpenCV cvt Color function is used for the conversion of the color. The next parameter, which is the flag, is used to determine the kind of conversion happening for the image. The user can either choose Gray scaling or HSV for converting the red, green and blue colors from the image in accordance to the saturation Hue and value for the image as for the color space proportions.
Finally, a threshold is applied to the image with the help of the OpenCV threshold function. there are three kinds of thresholding that can be applied to the resultant image: simple thresholding, adaptive thresholding, and Otsu’s thresholding or binarization method.
In order for extracting a rectangular structure, the OpenCV get structuring element function is used two define the elemental structure such as a circular shape, an elliptical shape or a rectangular shape. In the perimeter that the user needs to set, they choose the shape (here rectangle) and use the function cv2.Morph_rect taking an extra size as an additional parameter for the kernel. For making larger blocks and adding text together, a larger kernel size may be required. After the user chooses the kernel, the dilation method is applied to the image using the OpenCV dilate function, which makes the text’s detection more precise within the text blocks.
After this, the next function needed to be performed on the image provided by the user is finding contour. Using the OpenCV find contour function, the returning contours and the hierarchy are provided from the dilated image. Each of the contour derived from the source image is then saved in a numpy array where its coordinates correspond to the boundary points for the object that is present in the image.
Have the contouring function is used to find the white objects present within the image and take out the contrasting background, which would be a dark or black background. This contouring process enables the detection of the boundary edges for the given block of text which is present within the image. A text file is opened by the system in order for writing the extracted text and is then flushed, at the other end saving the text which has been extracted for performing the optical character recognition function.
Finally, the application of the optical character recognition occurs. The function is looped in through each of the contouring processes that happens antiques the coordinates for the specified image along with the height and the width using OpenCV boundary rect function. The system then draws a rectangle within the output image where it utilizes the OpenCV rectangle function using the obtained coordinates and the height and the width.
The last parameter of the color is also provided, which corresponds to the boundary color corresponding to the rectangle for which the red, green, blue value is taken, and the size of the boundary for the rectangle is set. now, this rectangular region is cropped Anne is further past two tesseract function in order for extracting the text from the image that the user has provided. The system now opens the text file in the emending mode where the obtain text is appended Anne saved, and the file is closed.

Example of OpenCV OCR

Given below is the example mentioned:

Code:

# import required packages for performing OCR
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = 'System_path_to_tesseract.exe'
# Reading image file from where the text is to be extracted
img1 = cv2.imread("EduCBA logo.jpg")
# Converting the image into to gray scaled image
Gray1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
ret1, thresh_1 = cv2.threshold(gray1, 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY_INV)
# specifying structure shape, kernel size, increase/decreases the kernel area
rect_kernel1 = cv2.getStructuringElement(cv2.MORPH_RECT, (18, 18))
dilation1 = cv2.dilate(thresh1, rect_kernel1, iterations = 1)
# finding contouring for the image
contours1, hierarchy1 = cv2.findcontours(dilation1, cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_NONE)
# creating a copy
img2 = img1.copy()
file1 = open("recognized.txt", "w+")
file.write("")
file.close()
# looping for ocr through the contours found
for cnt in contours:
x1, y1, w1, h1 = cv2.boundingrect(cnt)
rect1 = cv2.rectangle(img2, (x1, y1), (x1 + w1, y1 + h1), (0, 255, 0), 2)
cropped1 = img2[y1:y1 + h1, x1:x1 + w1]
file_1 = open("recognized.txt", "a")
# apply ocr
text_1 = pytesseract.image_to_string(cropped1)
file.write(text1)
file.close

Output:

EduCBA file entered as source image:

The output obtained for the code is:

Conclusion

The OpenCV OCR function or optical character recognition is designed to read an image file provided by the user and then recognize the text given within the image to be displayed to the user. This text can be further utilized for any purpose that the user may need to make use of the extracted text and is majorly essential for in programs associated with computer vision that work on a real-time platform and computation.