Introduction to Python Regex
The following article provides an outline for Python Regex. Regular expressions are a highly specialized language made available inside Python through the RE module. Using this language, you can specify a set of rules that can fetch a particular group and validate a string against that rule. This set might contain e-mail addresses, phone numbers or any particular pattern which you want to match, known as a regular expression, regex pattern or REs, and you can specify or ask questions like “Does this string has the same pattern” or anything similar or you can fetch various groups from a large string.
Syntax:
We are going to see the syntax with an example. For this, we can search a string to see if it starts with “He” and ends with “smart”.
import reword = "He is very smart"
x =re.search("^He.*smart$",word)
print(x)
If you look at the syntax, it is very simple; you just have to first import the regex package, which is re and then use any of the imported package functions as per your requirement. If we run the above sample code in Jupyter, we get the below result.
Regex Functions in Python
There are many regex functions that help us to search a string for a match. Before that, we will first learn about the characters that we generally see in a regex function.
[] |
It represents a set of characters. |
. |
It represents any character except a newline. |
* |
It represents zero or more occurrences. |
+ |
It represents one or more occurrences. |
^ |
It represents the starting character. |
$ |
It represents the ending character. |
| |
It represents either-or. |
() |
It represents capture and group. |
\ |
It is generally used to escape special characters. |
Regex also has a few special sequences which will be useful to know, for example:
\w |
It shows a match if the string has any set of word characters from [0-9], A-Z or a-z and underscore. |
\W |
It returns a match if the string has no word characters present. |
\d |
These returns match when there are digits in the string. |
\D |
It is the opposite of the previous one as it returns a match if no digits are present in the string. |
\s |
It is used to check for white space characters in a string. It returns a match if white space characters are present. |
\S |
It returns a match when there are no white spaces in the string. |
Functions Used for Regex Operations
Let us see various functions of the re module that can be used for regex operations in python.
1. findall() function
This function is present in the re module. It returns a list of all matches present in the string. It iterates from left to right across the string. The matches are also returned in the exact same order of search. We will go through an example of this. Suppose we want to find all the digits present in a string. For this, we will use the findall() function, in which we will find all the digits present in the string. Let us see the code for this now:
Code:
import re
word = "Raju is 22 years old and his mobile number last three-digit is 789"
rgex ='\d+'
x =re.findall(rgex,word)
print(x)
If we go through the code, we are basically assigned the variable word with a string containing digits and then passing the appropriate regex symbol for digits along with the variable word as arguments in the findall() function.
Output:
As you can see, we get a list of numbers as a result.
2. search() function
The search function is used to search patterns in a string, and if a match is found, it returns the object. Here, one thing we have to remember is if there is more than one match, then it returns only the first occurrence. If no match is found, then it returns none. We will see an example for this if we want to find the string that starts with a particular word. We will test both positive and negative match cases. Let us see the code for the same.
Code:
import re
word = "Raju is 22 years old"
rgex ='^Raju'
x =re.search(rgex,word)
print(x)
regex1= '^Mohan'
x1 = re.search(regex1,word)
print(x1)
Here variable ‘regex’ is used in a positive scenario and variable ‘regex1’ for a negative scenario.
Output:
In the first case, we get the match object returned, while in the second case, we get ‘None’ returned.
3. Split() function
This function splits the string after each match which means as soon as there is a match in the string, this function splits the string from there. So, if there are three matches, then there will be three splits. We will see an example. Suppose we want to split a string after each space. So we can use this split function to good use in that situation.
Code:
import re
word = "Raju is 22 years old"
rgex ='\s'
x =re.split(rgex,word)
print(x)
Here the patterns represent white space character.
Output:
As you can see in the output, the string is split after each space.
4. sub() function
This function replaces the matches with the string or character of the users’ choice. It basically means that if there is a match in the string, it will replace that matched character or string with your string or character and return the modified string. It takes three arguments. For example, we will just replace the white space with ‘&’ in our string.
Code:
import re
word = "Raju is 22 years old"
rgex ='\s'
x =re.sub(rgex,'&',word)
print(x)
Output:
As you can see, all the spaces were replaced by ‘&’.
Conclusion
In this article, we discussed the regex module and its various Python Built-in Functions. Regex is very important and is widely used in various programming languages.
Recommended Articles
This is a guide to Python Regex. Here we discuss the introduction to python regex and some important regex functions, along with an example. You can also go through our other suggested articles to learn more –