I want something like [A-z] that counts for all alphabetic characters plus stuff like ö , ä , ü etc. If i do [A-ü] i get probably all special characters used by latin languages but it also allows other stuff like ¿¿]|{}[¢§ø欰µ©¥ Example:
https://regex101.com/r/tN9gA5/2 Edit: I need this in python2. asked Apr 8, 2015 at 7:29 yammyamm 1,47315 silver badges25 bronze badges Depending on what regular expression engine you are using, you could use the ^\p{L}+$ regular expression. The \p{L} denotes a unicode letter: In addition to complications, Unicode also brings new possibilities. One is that each Unicode character belongs to a certain category. You can match a single character belonging to the "letter"
category with \p{L}
Source This example should illustrate what I am saying. It seems that the regex engine on Regex101 does support this, you just need to select PCRE (PHP) fromo the top left. answered Apr 8, 2015 at 7:33
npintinpinti 51.4k5
gold badges72 silver badges95 bronze badges 1 When you use [A-z] , you are not only capturing letters from "A"
to "z", you also capture some more non-letter characters: [ \ ] ^ _ ` . In Python, you can use [^\W\d_] with re.U option to match Unicode characters (see this post). Here is a sample based on your input string. Python example: import re
r = re.search(
r'(?P<unicode_word>[^\W\d_]*)',
u'TestöäüéàèÉÀÈéàè',
re.U
)
print r.group('unicode_word')
>>> TestöäüéàèÉÀÈéàè
answered Apr 8, 2015 at 8:10 Wiktor StribiżewWiktor Stribiżew 577k34 gold badges399 silver badges501 bronze badges Timing with random strings of ASCII printables: Nội dung chính - Use the isalnum() Method to Remove All Non-Alphanumeric Characters in Python String
- Use the filter() Function to Remove All Non-Alphanumeric Characters in Python String
- Use Regular Expressions to Remove
All Non-Alphanumeric Characters in Python String
- Related Article - Python String
- How do you only find alphanumeric characters in Python?
- How do you find alphanumeric values in Python?
- How do you make a string only alphanumeric in Python?
- How do I ignore non alphabetic characters in Python?
from inspect import getsource
from random import sample
import re
from string import printable
from timeit import timeit
pattern_single = re.compile(r'[\W]')
pattern_repeat = re.compile(r'[\W]+')
translation_tb = str.maketrans('', '', ''.join(c for c in map(chr, range(256)) if not c.isalnum()))
def generate_test_string(length):
return ''.join(sample(printable, length))
def main():
for i in range(0, 60, 10):
for test in [
lambda: ''.join(c for c in generate_test_string(i) if c.isalnum()),
lambda: ''.join(filter(str.isalnum, generate_test_string(i))),
lambda: re.sub(r'[\W]', '', generate_test_string(i)),
lambda: re.sub(r'[\W]+', '', generate_test_string(i)),
lambda: pattern_single.sub('', generate_test_string(i)),
lambda: pattern_repeat.sub('', generate_test_string(i)),
lambda: generate_test_string(i).translate(translation_tb),
]:
print(timeit(test), i, getsource(test).lstrip(' lambda: ').rstrip(',\n'), sep='\t')
if __name__ == '__main__':
main()
Result (Python 3.7): Time Length Code
6.3716264850008880 00 ''.join(c for c in generate_test_string(i) if c.isalnum())
5.7285426190064750 00 ''.join(filter(str.isalnum, generate_test_string(i)))
8.1875841680011940 00 re.sub(r'[\W]', '', generate_test_string(i))
8.0002205439959650 00 re.sub(r'[\W]+', '', generate_test_string(i))
5.5290945199958510 00 pattern_single.sub('', generate_test_string(i))
5.4417179649972240 00 pattern_repeat.sub('', generate_test_string(i))
4.6772285089973590 00 generate_test_string(i).translate(translation_tb)
23.574712151996210 10 ''.join(c for c in generate_test_string(i) if c.isalnum())
22.829975890002970 10 ''.join(filter(str.isalnum, generate_test_string(i)))
27.210196289997840 10 re.sub(r'[\W]', '', generate_test_string(i))
27.203713296003116 10 re.sub(r'[\W]+', '', generate_test_string(i))
24.008979928999906 10 pattern_single.sub('', generate_test_string(i))
23.945240008994006 10 pattern_repeat.sub('', generate_test_string(i))
21.830899796994345 10 generate_test_string(i).translate(translation_tb)
38.731336012999236 20 ''.join(c for c in generate_test_string(i) if c.isalnum())
37.942474347000825 20 ''.join(filter(str.isalnum, generate_test_string(i)))
42.169366310001350 20 re.sub(r'[\W]', '', generate_test_string(i))
41.933375883003464 20 re.sub(r'[\W]+', '', generate_test_string(i))
38.899814646996674 20 pattern_single.sub('', generate_test_string(i))
38.636144253003295 20 pattern_repeat.sub('', generate_test_string(i))
36.201238164998360 20 generate_test_string(i).translate(translation_tb)
49.377356811004574 30 ''.join(c for c in generate_test_string(i) if c.isalnum())
48.408927293996385 30 ''.join(filter(str.isalnum, generate_test_string(i)))
53.901889764994850 30 re.sub(r'[\W]', '', generate_test_string(i))
52.130339455994545 30 re.sub(r'[\W]+', '', generate_test_string(i))
50.061149017004940 30 pattern_single.sub('', generate_test_string(i))
49.366573111998150 30 pattern_repeat.sub('', generate_test_string(i))
46.649754120997386 30 generate_test_string(i).translate(translation_tb)
63.107938601999194 40 ''.join(c for c in generate_test_string(i) if c.isalnum())
65.116287978999030 40 ''.join(filter(str.isalnum, generate_test_string(i)))
71.477421126997800 40 re.sub(r'[\W]', '', generate_test_string(i))
66.027950693998720 40 re.sub(r'[\W]+', '', generate_test_string(i))
63.315361931003280 40 pattern_single.sub('', generate_test_string(i))
62.342320287003530 40 pattern_repeat.sub('', generate_test_string(i))
58.249303059004890 40 generate_test_string(i).translate(translation_tb)
73.810345625002810 50 ''.join(c for c in generate_test_string(i) if c.isalnum())
72.593953348005020 50 ''.join(filter(str.isalnum, generate_test_string(i)))
76.048324580995540 50 re.sub(r'[\W]', '', generate_test_string(i))
75.106637657001560 50 re.sub(r'[\W]+', '', generate_test_string(i))
74.681338128997600 50 pattern_single.sub('', generate_test_string(i))
72.430461594005460 50 pattern_repeat.sub('', generate_test_string(i))
69.394243567003290 50 generate_test_string(i).translate(translation_tb)
str.maketrans & str.translate is fastest, but includes all non-ASCII characters. re.compile & pattern.sub is slower, but is somehow faster than ''.join & filter .
- HowTo
- Python How-To's
- Remove
Non-Alphanumeric Characters From Python String
Created: May-28, 2021 - Use the
isalnum() Method to Remove All Non-Alphanumeric Characters in Python String - Use the
filter() Function to Remove All Non-Alphanumeric Characters in Python String - Use Regular Expressions to Remove All Non-Alphanumeric Characters in Python String
Alphanumeric characters contain the blend of the 26 characters of the letter set and the numbers 0 to 9.
Non-alphanumeric characters include characters that are not letters or digits, like + and @ . In this tutorial, we will discuss how to remove non-alphanumeric characters from a string in Python. Use the isalnum() Method to Remove All Non-Alphanumeric Characters in Python StringWe can use the isalnum() method to check whether a given character or string is
alphanumeric or not. We can compare each character individually from a string, and if it is alphanumeric, then we combine it using the join() function. For example, string_value = "[email protected]__"
s = ''.join(ch for ch in string_value if ch.isalnum())
print(s)
Output: alphanumeric123
Use the filter() Function to Remove All Non-Alphanumeric Characters in Python StringThe filter() function is used to construct an iterator from components of the iterable object and filters the
object’s elements using a function. For our problem, the string is our object, and we will use the isalnum() function, which checks whether a given string contains alphanumeric characters or not by checking each character. The join() function combines all the characters to return a string. For example, string_value = "[email protected]__"
s = ''.join(filter(str.isalnum, string_value))
print(s)
Output: alphanumeric123
This method does not work with Python 3. Use
Regular Expressions to Remove All Non-Alphanumeric Characters in Python StringA regular expression is an exceptional grouping of characters that helps you match different strings or sets of strings, utilizing a specific syntax in a pattern. To use regular expressions, we import the re module. We can use the sub() function from this module to replace all the string that matches a non-alphanumeric character by an empty character. For example, import re
string_value = "[email protected]__"
s=re.sub(r'[\W_]+', '', string_value)
print(s)
Output:
alphanumeric123
Alternatively, we can also use the following pattern. import re
string_value = "[email protected]__"
s = re.sub(r'[^a-zA-Z0-9]', '', string_value)
print(s)
Output: alphanumeric123
Write for us DelftStack articles are written by software geeks like you. If you also would like to contribute to DelftStack by writing paid articles, you can check the write for us page. Related Article - Python StringRemove Commas From String in PythonCheck a String Is Empty in a Pythonic WayConvert a String to Variable
Name in PythonRemove Whitespace From a String in PythonHow do you only find alphanumeric characters in Python?Python string isalnum() function returns True if
it's made of alphanumeric characters only. A character is alphanumeric if it's either an alpha or a number. If the string is empty, then isalnum() returns False . How do you find alphanumeric values in Python?Python String isalnum() Method The isalnum() method returns True if all the characters are alphanumeric, meaning alphabet letter (a-z) and numbers (0-9). Example of characters that are not alphanumeric:
(space)! How do you make a string only alphanumeric in Python?Using string isalnum() and string join() functions You can use the string isalnum() function along with the string join() function to create a string with only alphanumeric characters. You can see that the resulting string doesn't have any non alphanumeric characters. How do I ignore non
alphabetic characters in Python?Use the isalnum() Method to Remove All Non-Alphanumeric Characters in Python String. We can use the isalnum() method to check whether a given character or string is alphanumeric or not. We can compare each character individually from a string, and if it is alphanumeric, then we combine it using the join() function.
|