Timing with random strings of ASCII printables: from inspect import getsource
from random import sample
import re
from string import printable
from timeit import timeit
pattern_single = re.compile(r'[\W]')
pattern_repeat = re.compile(r'[\W]+')
translation_tb = str.maketrans('', '', ''.join(c for c in map(chr, range(256)) if not c.isalnum()))
def generate_test_string(length):
return ''.join(sample(printable, length))
def main():
for i in range(0, 60, 10):
for test in [
lambda: ''.join(c for c in generate_test_string(i) if c.isalnum()),
lambda: ''.join(filter(str.isalnum, generate_test_string(i))),
lambda: re.sub(r'[\W]', '', generate_test_string(i)),
lambda: re.sub(r'[\W]+', '', generate_test_string(i)),
lambda: pattern_single.sub('', generate_test_string(i)),
lambda: pattern_repeat.sub('', generate_test_string(i)),
lambda: generate_test_string(i).translate(translation_tb),
]:
print(timeit(test), i, getsource(test).lstrip(' lambda: ').rstrip(',\n'), sep='\t')
if __name__ == '__main__':
main()
Result (Python 3.7): Time Length Code
6.3716264850008880 00 ''.join(c for c in generate_test_string(i) if c.isalnum())
5.7285426190064750 00 ''.join(filter(str.isalnum, generate_test_string(i)))
8.1875841680011940 00 re.sub(r'[\W]', '', generate_test_string(i))
8.0002205439959650 00 re.sub(r'[\W]+', '', generate_test_string(i))
5.5290945199958510 00 pattern_single.sub('', generate_test_string(i))
5.4417179649972240 00 pattern_repeat.sub('', generate_test_string(i))
4.6772285089973590 00 generate_test_string(i).translate(translation_tb)
23.574712151996210 10 ''.join(c for c in generate_test_string(i) if c.isalnum())
22.829975890002970 10 ''.join(filter(str.isalnum, generate_test_string(i)))
27.210196289997840 10 re.sub(r'[\W]', '', generate_test_string(i))
27.203713296003116 10 re.sub(r'[\W]+', '', generate_test_string(i))
24.008979928999906 10 pattern_single.sub('', generate_test_string(i))
23.945240008994006 10 pattern_repeat.sub('', generate_test_string(i))
21.830899796994345 10 generate_test_string(i).translate(translation_tb)
38.731336012999236 20 ''.join(c for c in generate_test_string(i) if c.isalnum())
37.942474347000825 20 ''.join(filter(str.isalnum, generate_test_string(i)))
42.169366310001350 20 re.sub(r'[\W]', '', generate_test_string(i))
41.933375883003464 20 re.sub(r'[\W]+', '', generate_test_string(i))
38.899814646996674 20 pattern_single.sub('', generate_test_string(i))
38.636144253003295 20 pattern_repeat.sub('', generate_test_string(i))
36.201238164998360 20 generate_test_string(i).translate(translation_tb)
49.377356811004574 30 ''.join(c for c in generate_test_string(i) if c.isalnum())
48.408927293996385 30 ''.join(filter(str.isalnum, generate_test_string(i)))
53.901889764994850 30 re.sub(r'[\W]', '', generate_test_string(i))
52.130339455994545 30 re.sub(r'[\W]+', '', generate_test_string(i))
50.061149017004940 30 pattern_single.sub('', generate_test_string(i))
49.366573111998150 30 pattern_repeat.sub('', generate_test_string(i))
46.649754120997386 30 generate_test_string(i).translate(translation_tb)
63.107938601999194 40 ''.join(c for c in generate_test_string(i) if c.isalnum())
65.116287978999030 40 ''.join(filter(str.isalnum, generate_test_string(i)))
71.477421126997800 40 re.sub(r'[\W]', '', generate_test_string(i))
66.027950693998720 40 re.sub(r'[\W]+', '', generate_test_string(i))
63.315361931003280 40 pattern_single.sub('', generate_test_string(i))
62.342320287003530 40 pattern_repeat.sub('', generate_test_string(i))
58.249303059004890 40 generate_test_string(i).translate(translation_tb)
73.810345625002810 50 ''.join(c for c in generate_test_string(i) if c.isalnum())
72.593953348005020 50 ''.join(filter(str.isalnum, generate_test_string(i)))
76.048324580995540 50 re.sub(r'[\W]', '', generate_test_string(i))
75.106637657001560 50 re.sub(r'[\W]+', '', generate_test_string(i))
74.681338128997600 50 pattern_single.sub('', generate_test_string(i))
72.430461594005460 50 pattern_repeat.sub('', generate_test_string(i))
69.394243567003290 50 generate_test_string(i).translate(translation_tb)
str.maketrans & str.translate is fastest, but includes all non-ASCII characters. re.compile & pattern.sub is slower, but is somehow faster than ''.join & filter . This post will discuss how to remove non-alphanumeric characters from a string in Python. 1. Using regular expressionsA simple solution is to use regular expressions for removing non-alphanumeric characters from a string. The idea is to use the special character \W , which matches any character which is not a word character.
| importre if__name__=='__main__': input="Welcome, User_12!!" s=
re.sub(r'\W+','',input)
print(s) # WelcomeUser_12
|
Download Run Code The \W is equivalent of [^a-zA-Z0-9_] , which excludes all numbers and letters along with underscores. If you need to remove underscores as well, you can do like:
| importre if__name__=='__main__': input="Welcome, User_12!!" s=
re.sub(r'[^a-zA-Z0-9]','',
input) print(s) # WelcomeUser12
|
Download Run Code If the expression is used several times in a single program, you should compile and save the resulting regular expression object for reuse.
| importre if__name__=='__main__': input="Welcome, User_12!!" pattern=re.compile('\W') s=
re.sub(pattern,'',input) print(s) # WelcomeUser_12
|
Download Run Code 2. Using isalnum() functionAnother option is to filter the string that matches with the isalnum() function. It returns true if all characters in the string are alphanumeric, false otherwise.
| if__name__=='__main__': input
="Welcome, User_12!!" s=
''.join(filter(str.isalnum,
input)) print(s) # WelcomeUser12
|
Download Run Code This is equivalent to:
| if__name__=='__main__': input
="Welcome, User_12!!" s=
''.join(cforcin
inputifc.isalnum()) print(s) # WelcomeUser12
|
Download Run Code That’s all about removing non-alphanumeric characters from a string in Python. Also See: Remove
specific characters from a Python string
Remove Punctuations from a Python string
Thanks for reading. Please use our online compiler to post code in comments using C,
C++, Java, Python, JavaScript, C#, PHP, and many more popular programming languages. Like us? Refer us to your friends and help us grow. Happy coding 🙂
How do you remove an alphanumeric character from a string?
You can also use [^\w] regular expression, which is equivalent to [^a-zA-Z_0-9] . It will replace characters that are not present in the character range A-Z , a-z , 0-9 , _ . Alternatively, you can use the character class \W that directly matches with any non-word character, i.e., [a-zA-Z_0-9] .
How do you delete alphanumeric words in Python?
“remove alphanumeric words in a pandas column python” Code Answer. import pandas as pd.. ded = pd. DataFrame({'strings': ['a#bc1!', ' a(b$c']}). ded. strings. str. replace('[^a-zA-Z0-9]', '').
How do you get rid of alphanumeric?
Non-alphanumeric characters can be remove by using preg_replace() function. This function perform regular expression search and replace. The function preg_replace() searches for string specified by pattern and replaces pattern with replacement if found.
How do you only find alphanumeric characters in Python?
Python string isalnum() function returns True if it's made of alphanumeric characters only. A character is alphanumeric if it's either an alpha or a number. If the string is empty, then isalnum() returns False .
|