I need some help on declaring a regex. My inputs are like the following:
this is a paragraph with<[1> in between</[1> and then there are cases ... where the<[99> number ranges from 1-100</[99>. and there are many other lines in the txt files with<[3> such tags </[3>The required output is:
this is a paragraph with in between and then there are cases ... where the number ranges from 1-100. and there are many other lines in the txt files with such tagsI've tried this:
#!/usr/bin/python import os, sys, re, glob for infile in glob.glob(os.path.join(os.getcwd(), '*.txt')): for line in reader: line2 = line.replace('<[1> ', '') line = line2.replace('</[1> ', '') line2 = line.replace('<[1>', '') line = line2.replace('</[1>', '') print lineI've also tried this (but it seems like I'm using the wrong regex syntax):
line2 = line.replace('<[*> ', '') line = line2.replace('</[*> ', '') line2 = line.replace('<[*>', '') line = line2.replace('</[*>', '')I dont want to hard-code the replace from 1 to 99.
If you want to replace the string that matches the regular expression instead of a perfect match, use the sub() method of the re module.
To replace a string in Python using regex(regular expression), use the regex sub() method. The re.sub() is a built-in Python method that accepts five arguments maximum and returns replaced string.
You can also use the str.replace() method to replace the string; the new string will be replaced to match the old string entirely.
Python regex sub()
Python re.sub() function in the re module can replace substrings.
To use the sub() method, first, we have to import the re module, and then we can use its sub() method.
See the syntax of the re.sub() method.
Syntax
re.sub(pattern, repl, string, count=0, flags=0)See the following code.
import re str = '' print(re.sub('[a-z]*@', 'ApD@', str))In the above example, we try to replace just small letter cases from a to z before @ character.
See the output.
From the output, we can see that we have successfully updated the email addresses.
Specify the count
In the Python regex sub() method, we can pass a count parameter.
It suggests to the compiler that please don’t replace more than count’s value.
import re str = '' print(re.sub('[a-z]*@', 'ApD@', str, 2))Output
From the output, you can see that it replaced only two email addresses, and the last address is as it is.
Replace multiple substrings with the same string
You can enclose the string with [ ] to match any single character in it. It can be used to replace multiple different characters with the same string.
import re str = ' ' print(re.sub('[a-z]*@', 'info@', str))Output
You can see that it replaces the same string with multiple substrings.
If | delimits patterns, it matches any pattern. And also, it is possible to use special characters of regular expression for each pattern, but it is OK even if the usual string is specified as it is.
It can be used to replace multiple different strings with the same string.
import re str = ' ' print(re.sub('gmail|hotmail|apple', 'appdividend', str))In the above code, we have a string that consists of three substrings.
Now, we are replacing any of the three substrings with an appdividend. If it finds all, it will replace all; if it finds two matches, it will replace two substrings. If it finds one match, then it will replace one substring.
Output
You can see that all substring is replaced with appdividend.
Let’s see a scenario in which only one pattern matches.
import re str = ' ' print(re.sub('gmail|hotmail|apple', 'appdividend', str))Output
You can see that and are unchanged.
Only is replaced with .
Replace using the matched part
To replace a substring using the matched part, you can use the string that matches the part enclosed in () in the new string. See the following code.
import re str = ' ' print(re.sub('([a-z]*)@', '\1 19-@', str))Output
The \1 corresponds to the part that matches (). If there are multiple ( ), use them like \2, \3 …
It is important to escape \ like \\1 if it is a regular string surrounded by the ” or ” “, but if it is the raw string with r at the beginning like r”, you can write \1.
Replace string by position using a slice
There is no method for specifying the position and replacing it; dividing by the slice and concatenating them with the arbitrary string, a new string in which a specified position is replaced can be created.
import re str = 'albertodelrio' print(str[:4] + 'HANAYA' + str[7:])Output
albeHANAYAdelrioIn the above code example, you can see that we have added a substring between string index 4 to 7.
Conclusion
In this tutorial, we have seen how to replace a string in Python using regular expressions. First, we have imported the re module, and then we have used the re.sub() method to replace one string, multiple strings, matched string, and replace string by its position using the slice.
That’s it for this tutorial.
See also
Python regular expressions
Python String replace()
Python String rstrip()
Python String strip
Python f-Strings