Python regex remove escape characters

You're looking for a search and replace method, which in Python should be re#sub().

Simply replace non-letters & apostrophe ([^a-zA-Z' ]+) with '' (nothing).

- Oh well, what about the escaped characters?
R: They will turn into a single character when inside the string, \n will be turned into a newline character for example, which is not a letter or a '.

Instead, if you actually have escaped an escaped character in your string (like: "abc\\nefg"), you should add a \\\\.| at the start of your regex, which will match the backslash + any other character (so it will be: \\\\.|[^a-zA-Z' ])

Here is the working exemple:

import re
s = "aaa\n\t\n asd123asd water's tap413 water blooe's"
replaced = re.sub("[^a-zA-Z' ]+", '', s)
print(replaced)

https://repl.it/repls/ReasonableUtterAnglerfish


Would appreciate it if you can explain what each expression means

So, the explanation:

  • \\\\ - Matches a backslash (Why four? Each pair will escape the slash for the Python string's compilation, which will turn into a \\ which is how you match a backslash in regex).
  • . - Match any character except for the newline character.
  • | - OR expression, matches what is before OR what is after.
  • [^...] - Must NOT be one of these characters (inside).
  • a-zA-Z'  - Match characters from a to z, A to Z, ' or  .
  • + - Quantifier, not needed here, but would be good to reduce the matches, hence reduce the time of execution (Which would translate as "One or more occurrences of the term behind").

If you’re like me, you’ll regularly sit in front of your code and wonder: how to escape a given character?

Challenge: Some characters have a special meaning in Python strings and regular expressions. Say you want to to search for string "(s)" but the regex engine takes the three characters (s) as a matching group. You could manually escape the special symbols and brackets by using \(s\), but this is tedious and error-prone.

Question: How to escape all special regex symbols automatically?

Python Regex - How to Escape Special Characters?

If you have this problem too, you’re in luck. This article is the ultimate guide to escape special characters in Python. Just click on the topic that interests you and learn how to escape the special character you’re currently struggling with!

If you’re the impatient guy, you’re in luck too. Just try to add the backslash to your special character you want to escape: \x to escape special character x.

Here are a few examples:

>>> import re
>>> re.findall('\( \{ \" \. \* \+', r'( { " . * +')
['( { " . * +']

However, you may not want to escape all of those manually. That’s why the re.escape method exists!

  • Python re.escape Method
  • Python Regex Escape Characters
    • Python Regex Escape Parentheses ()
    • Python Regex Escape Square Brackets []
    • Python Regex Escape Curly Brace (Brackets)
    • Python Regex Escape Slash (Backslash and Forward-Slash)
    • Python Regex Escape String Single Quotes
    • Python Regex Escape String Double Quotes
    • Python Regex Escape Dot (Period)
    • Python Regex Escape Plus
    • Python Regex Escape Asterisk
    • Python Regex Escape Question Mark
    • Python Regex Escape Underscore
    • Python Regex Escape Pipe
    • Python Regex Escape Dollar
    • Python Regex Escape Greater Than and Smaller Than
    • Python Regex Escape Hyphen
    • Python Regex Escape Newline
  • Python Regex Bad Escape
  • Where to Go From Here

Python re.escape Method

If you know that your string has a lot of special characters, you can also use the convenience method re.escape(pattern) from Python’s re module.

Specification: re.escape(pattern)

Definition: escapes all special regex meta characters in the given pattern.

Example: you can escape all special symbols in one go:

>>> re.escape('https://www.finxter.com/')
'https://www\\.finxter\\.com/'

The dot symbol has a special meaning in the string 'https://www.finxter.com/'. There are no other special symbols. Therefore, all special symbols are replaced.

Note that “only characters that can have special meaning in a regular expression are escaped. As a result, '!', '"', '%', "'", ',', '/', ':', ';', '<', '=', '>', '@', and "`" are no longer escaped” (source).

Related article: Python Regex Superpower – The Ultimate Guide

Do you want to master the regex superpower? Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.

Python Regex Escape Characters

If you use special characters in strings, they carry a special meaning. Sometimes you don’t need that. The general idea is to escape the special character x with an additional backslash \x to get rid of the special meaning.

In the following, I show how to escape all possible special characters for Python strings and regular expressions:

Python Regex Escape Parentheses ()

How to escape the parentheses ( and ) in Python regular expressions?

Parentheses have a special meaning in Python regular expressions: they open and close matching groups.

You can get rid of the special meaning of parentheses by using the backslash prefix: \( and \). This way, you can match the parentheses characters in a given string. Here’s an example:

>>> import re
>>> re.findall(r'\(.*\)', 'Python is (really) great')
['(really)']

The result shows a string that contains the “special” characters '(' and ')'.

Python Regex Escape Square Brackets []

How to escape the square brackets [ and ] in Python regular expressions?

Square brackets have a special meaning in Python regular expressions: they open and close character sets.

You can get rid of the special meaning of brackets by using the backslash prefix: \[ and \]. This way, you can match the brackets characters in a given string. Here’s an example:

>>> import re
>>> re.findall(r'\[.*\]', 'Is Python [really] easy?')
['[really]']

The result shows a string that contains the “special” characters '[' and ']'.

Python Regex Escape Curly Brace (Brackets)

How to escape the curly braces{ and } in Python regular expressions?

The curly braces don’t have any special meaning in Python strings or regular expressions. Therefore, you don’t need to escape them with a leading backslash character \. However, you can do so if you wish as you see in the following example:

>>> import re
>>> re.findall(r'\{.*\}', 'if (2==2) { y = 3; }')
['{ y = 3; }']
>>> re.findall(r'{.*}', 'if (2==2) { y = 3; }')
['{ y = 3; }']
>>> re.findall('{.*}', 'if (2==2) { y = 3; }')
['{ y = 3; }']

All three cases match the same string enclosed in curly braces—even though we did not escape them and didn’t use the raw string r'' in the third example.

Python Regex Escape Slash (Backslash and Forward-Slash)

How to escape the slash characters—backslash \ and forward-slash /—in Python regular expressions?

The backslash has a special meaning in Python regular expressions: it escapes special characters and, thus, removes the special meaning. (How meta.)

>>> import re
>>> re.findall(r'\\...', r'C:\home\usr\dir\hello\world')
['\\hom', '\\usr', '\\dir', '\\hel', '\\wor']

You can see that the resulting matches have escaped backslashes themselves. This is because the backslash character has a special meaning in normal strings. Thus, the Python interpreter escapes it automatically by itself when printing it on the shell. Note that you didn’t need to escape the backslash characters when writing the raw string r'C:\home\usr\dir\hello\world' because the raw string already removes all the special meaning from the backslashed characters. But if you don’t want to use a raw string but a normal string, you need to escape the backslash character yourself:

>>> re.findall(r'\\...', 'C:\\home\\usr\\dir\\hello\\world')
['\\hom', '\\usr', '\\dir', '\\hel', '\\wor']

In contrast to the backslash, the forward-slash doesn’t need to be escaped. Why? Because it doesn’t have a special meaning in Python strings and regular expressions. You can see this in the following example:

>>> import re
>>> re.findall('/...', '/home/usr/dir/hello/world')
['/hom', '/usr', '/dir', '/hel', '/wor']

The result shows that even in a non-raw string, you can use the forward-slash without leading escape character.

Python Regex Escape String Single Quotes

How to escape the single quotes ' in Python regular expressions?

Single quotes have a special meaning in Python regular expressions: they open and close strings.

You can get rid of the special meaning of single quotes by using the backslash prefix: \'. This way, you can match the string quote characters in a given string. Here’s an example:

>>> import re
>>> re.findall('\'.*\'', "hello 'world'")
["'world'"]

The result shows a string that contains the “special” single quote characters. The result also shows an alternative that removes the special meaning of the single quotes: enclose them in double quotes: "hello 'world'".

Python Regex Escape String Double Quotes

How to escape the double quotes " in Python regular expressions?

Double quotes have a special meaning in Python regular expressions: they open and close strings.

You can get rid of the special meaning of single quotes by using the backslash prefix: \". This way, you can match the string quote characters in a given string. Here’s an example:

>>> import re
>>> re.findall('\".*\"', 'hello "world"')
['"world"']

The result shows a string that contains the “special” single quote characters. The result also shows an alternative that removes the special meaning of the single quotes: enclose them in double quotes: 'hello "world"'.

Python Regex Escape Dot (Period)

How to escape the regex dot (or period) meta character . in Python regular expressions?

The dot character has a special meaning in Python regular expressions: it matches an arbitrary character (except newline).

You can get rid of the special meaning of the dot character by using the backslash prefix: \.. This way, you can match the dot character in a given string. Here’s an example:

>>> import re
>>> re.findall('..\.', 'my. name. is. python.')
['my.', 'me.', 'is.', 'on.']

The result shows four strings that contain the “special” characters '.'.

Python Regex Escape Plus

How to escape the plus symbol + in Python regular expressions?

The plus symbol has a special meaning in Python regular expressions: it’s the one-or-more quantifier of the preceding regex.

You can get rid of the special meaning of the regex plus symbol by using the backslash prefix: \+. This way, you can match the plus symbol characters in a given string. Here’s an example:

>>> import re
>>> re.findall('\++', '+++python+++rocks')
['+++', '+++']

The result shows both usages: the plus symbol with and without leading escape character. If it is escaped \+, it matches the raw plus character. If it isn’t escaped +, it quantifies the regex pattern just in front of it (in our case the plus symbol itself).

Python Regex Escape Asterisk

How to escape the asterisk symbol * in Python regular expressions?

The asterisk symbol has a special meaning in Python regular expressions: it’s the zero-or-more quantifier of the preceding regex.

You can get rid of the special meaning of the regex asterisk symbol by using the backslash prefix: \*. This way, you can match the asterisk symbol characters in a given string. Here’s an example:

>>> import re
>>> re.findall('\**', '***python***rocks')
['***', '***']

The result shows both usages: the asterisk symbol with and without leading escape character. If it is escaped \*, it matches the raw asterisk character. If it isn’t escaped *, it quantifies the regex pattern just in front of it (in our case the asterisk symbol itself).

Python Regex Escape Question Mark

How to escape the question mark symbol ? in Python regular expressions?

The question mark symbol has a special meaning in Python regular expressions: it’s the zero-or-one quantifier of the preceding regex.

You can get rid of the special meaning of the question mark symbol by using the backslash prefix: \?. This way, you can match the question mark symbol characters in a given string. Here’s an example:

>>> import re
>>> re.findall('...\?', 'how are you?')
['you?']

The result shows that the question mark symbol was matched in the given string.

Python Regex Escape Underscore

How to escape the underscore character _ in Python regular expressions?

The underscore doesn’t have a special meaning in Python regular expressions or Python strings.

Therefore, you don’t need to escape the underscore character—just use it in your regular expression unescaped.

>>> import re
>>> re.findall('..._', 'i_use_underscore_not_whitespace')
['use_', 'ore_', 'not_']

However, it doesn’t harm to escape it either:

>>> re.findall('...\_', 'i_use_underscore_not_whitespace')
['use_', 'ore_', 'not_']

In both cases, Python finds the underscore characters in the string and matches them in the result.

Python Regex Escape Pipe

How to escape the pipe symbol | (vertical line) in Python regular expressions?

The pipe symbol has a special meaning in Python regular expressions: the regex OR operator.

You can get rid of the special meaning of the pipe symbol by using the backslash prefix: \|. This way, you can match the parentheses characters in a given string. Here’s an example:

>>> import re
>>> re.findall('.\|.', 'a|b|c|d|e')
['a|b', 'c|d']

By escaping the pipe symbol, you get rid of the special meaning. The result is just the matched pipe symbol with leading and trailing arbitrary character.

If you don’t escape the pipe symbol, the result will be quite different:

>>> re.findall('.|.', 'a|b|c|d|e')
['a', '|', 'b', '|', 'c', '|', 'd', '|', 'e']

In this case, the regex .|. matches “an arbitrary character or an arbitrary character”—quite meaningless!

Python Regex Escape Dollar

How to escape the dollar symbol $ in Python regular expressions?

The dollar symbol has a special meaning in Python regular expressions: it matches at the end of the string.

You can get rid of the special meaning by using the backslash prefix: \$. This way, you can match the dollar symbol in a given string. Here’s an example:

>>> import re
>>> re.findall('\$\d+', 'Your house is worth $1000000')
['$1000000']

Note that the \d+ regex matches an arbitrary number of numerical digits between 0 and 9.

Python Regex Escape Greater Than and Smaller Than

How to escape the greater than < and smaller than > symbols in Python regular expressions?

Greater and smaller than symbols don’t have a special meaning in Python regular expressions. Therefore, you don’t need to escape them.

Here’s an example:

>>> import re
>>> re.findall('<.*>.*<.*>', '<div>hello world</div>')
['<div>hello world</div>']

The result shows a string that even without escaping the HTML tag symbols, the regex matches the whole string.

Python Regex Escape Hyphen

How to escape the hyphen- in Python regular expressions?

Outside a character set, the hyphen doesn’t have a special meaning and you don’t need to escape it. Here’s an example:

>>> import re
>>> re.findall('..-', 'this is-me')
['is-']

The unescaped hyphen character in the regex matches the hyphen in the string.

However, inside a character set, the hyphen stands for the range symbol (e.g. [0-9]) so you need to escape it if you want to get rid of its special meaning and match the hyphen symbol itself. Here’s an example:

>>> re.findall('[a-z\-]+', 'hello-world is one word')
['hello-world', 'is', 'one', 'word']

Note that, in this case, if you don’t escape the hyphen in the character set, you get the same result:

>>> re.findall('[a-z-]+', 'hello-world is one word')
['hello-world', 'is', 'one', 'word']

The reason is that the hyphen appears at the end of the character set where it can have only one meaning: the hyphen symbol itself. However, in all other cases, the hyphen would be assumed to mean the range character which will result in strange behavior. A good practice is, thus, to escape the hyphen in the character class per default.

Python Regex Escape Newline

In a recent StackOverflow article, I read the following question:

I got a little confused about Python raw string. I know that if we use raw string, then it will treat '\' as a normal backslash (ex. r'\n' would be '\' and 'n'). However, I was wondering what if I want to match a new line character in raw string. I tried r'\n', but it didn’t work. Anybody has some good idea about this?

The coder asking the question has understood that the Python interpreter doesn’t assume that the two characters \ and n do have any special meaning in raw strings (in contrast to normal strings).

However, those two symbols have a special meaning for the regex engine! So if you use them as a regular expression pattern, they will indeed match the newline character:

>>> import re
>>> text = '''This
is
a
multiline
string'''
>>> re.findall(r'[a-z]+\n', text)
['his\n', 'is\n', 'a\n', 'multiline\n']

Therefore, you don’t need to escape the newline character again to match it in a given string.

Python Regex Bad Escape

There are some common errors in relation to escaping in Python regular expressions.

If you try to escape a normal character that has not a special meaning, Python will throw a “bad escape error”:

>>> re.findall('\m', 'hello {world}')
Traceback (most recent call last):
  File "<pyshell#61>", line 1, in <module>
    re.findall('\m', 'hello {world}')
  File "C:\Users\xcent\AppData\Local\Programs\Python\Python37\lib\re.py", line 223, in findall
    return _compile(pattern, flags).findall(string)
  File "C:\Users\xcent\AppData\Local\Programs\Python\Python37\lib\re.py", line 286, in _compile
    p = sre_compile.compile(pattern, flags)
  File "C:\Users\xcent\AppData\Local\Programs\Python\Python37\lib\sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "C:\Users\xcent\AppData\Local\Programs\Python\Python37\lib\sre_parse.py", line 930, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
  File "C:\Users\xcent\AppData\Local\Programs\Python\Python37\lib\sre_parse.py", line 426, in _parse_sub
    not nested and not items))
  File "C:\Users\xcent\AppData\Local\Programs\Python\Python37\lib\sre_parse.py", line 507, in _parse
    code = _escape(source, this, state)
  File "C:\Users\xcent\AppData\Local\Programs\Python\Python37\lib\sre_parse.py", line 402, in _escape
    raise source.error("bad escape %s" % escape, len(escape))
re.error: bad escape \m at position 0

As the error message suggests, there’s no escape sequence \m so you need to get rid of it to avoid the error.

Where to Go From Here

Wow, you either have read about a lot of escaped character sequences or you did a lot of scrolling to reach this point.

In both cases, you have a great advantage over other coders: you’re a persistent guy or gal!

Do you want to increase your advantage over your peers? Then join my Python email academy! I’ll teach you the ins and outs of Python coding—all free!

Join Finxter Email Academy, become a better coder, and download your free Python cheat sheets!

Python regex remove escape characters

While working as a researcher in distributed systems, Dr. Christian Mayer found his love for teaching computer science students.

To help students reach higher levels of Python success, he founded the programming education website Finxter.com. He’s author of the popular programming book Python One-Liners (NoStarch 2020), coauthor of the Coffee Break Python series of self-published books, computer science enthusiast, freelancer, and owner of one of the top 10 largest Python blogs worldwide.

His passions are writing, reading, and coding. But his greatest passion is to serve aspiring coders through Finxter and help them to boost their skills. You can join his free email academy here.

Do I need to escape in regex Python?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

How do you escape special characters in regex Python?

escape() was changed to escape only characters which are meaningful to regex operations. Note that re. escape will turn e.g. a newline into a backslash followed by a newline; one might well instead want a backslash followed by a lowercase n.

How do I escape a character in regex?

The backslash character ( \ ) is the escaping character. It can be used to denote an escaped character, a string, literal, or one of the set of supported special characters. Use a double backslash ( \\ ) to denote an escaped string literal.

How do I remove all special characters from a string in Python?

Using 're..
“[^A-Za-z0–9]” → It'll match all of the characters except the alphabets and the numbers. ... .
All of the characters matched will be replaced with an empty string..
All of the characters except the alphabets and numbers are removed..