I have a dataframe which looks like this:
A B C 1 red78 square big235 2 green circle small123 3 blue45 triangle big657I need to be able to remove the non-numeric characters from all the rows in column C so that my dataframe looks like:
A B C 1 red78 square 235 2 green circle 123 3 blue45 triangle 657I tried using the following but get the error expected string or buffer:
import re dfOutput.imgID = dfOutput.imgID.apply(re.sub('[^0-9]','', dfOutput.imgID), axis = 0)What should I do instead?
Code to create dataframe:
dfObject = pd.DataFrame() dfObject.set_value(1, 'A', 'red78') dfObject.set_value(1, 'B', 'square') dfObject.set_value(1, 'C', 'big235') dfObject.set_value(2, 'A', 'green') dfObject.set_value(2, 'B', 'circle') dfObject.set_value(2, 'C', 'small123') dfObject.set_value(3, 'A', 'blue45') dfObject.set_value(3, 'B', 'triangle') dfObject.set_value(3, 'C', 'big657')Question :
How can I remove all non-numeric characters from all the values in a particular column in pandas dataframe?
I have a dataframe which looks like this:
A B C 1 red78 square big235 2 green circle small123 3 blue45 triangle big657I need to be able to remove the non-numeric characters from all the rows in column C so that my dataframe looks like:
A B C 1 red78 square 235 2 green circle 123 3 blue45 triangle 657I tried using the following but get the error expected string or buffer:
import re dfOutput.imgID = dfOutput.imgID.apply(re.sub('[^0-9]','', dfOutput.imgID), axis = 0)What should I do instead?
Code to create dataframe:
dfObject = pd.DataFrame() dfObject.set_value(1, 'A', 'red78') dfObject.set_value(1, 'B', 'square') dfObject.set_value(1, 'C', 'big235') dfObject.set_value(2, 'A', 'green') dfObject.set_value(2, 'B', 'circle') dfObject.set_value(2, 'C', 'small123') dfObject.set_value(3, 'A', 'blue45') dfObject.set_value(3, 'B', 'triangle') dfObject.set_value(3, 'C', 'big657')Answer #1:
Use str.extract and pass a regex pattern to extract just the numeric parts:
In[40]: dfObject['C'] = dfObject['C'].str.extract('(d+)', expand=False) dfObject Out[40]: A B C 1 red78 square 235 2 green circle 123 3 blue45 triangle 657If needed you can cast to int:
dfObject['C'] = dfObject['C'].astype(int)Answer #2:
You can use .str.replace with a regex:
dfObject['C'] = dfObject.C.str.replace(r"[a-zA-Z]",'')output:
A B C 1 red78 square 235 2 green circle 123 3 blue45 triangle 657Answer #3:
To remove all non-digit characters from strings in a Pandas column you should use str.replace with D+ or [^0-9]+ patterns:
dfObject['C'] = dfObject['C'].str.replace(r'D+', '')Or, since in Python 3, D is fully Unicode-aware by default and thus does not match non-ASCII digits (like ?????????, see proof) you should consider
dfObject['C'] = dfObject['C'].str.replace(r'[^0-9]+', '')So,
import re print ( re.sub( r'D+', '', '1?????????0') ) # => 1?????????0 print ( re.sub( r'[^0-9]+', '', '1?????????0') ) # => 10Answer #4:
You can also do this via a lambda function with str.isdigit:
import pandas as pd df = pd.DataFrame({'Name': ['John5', 'Tom 8', 'Ron 722']}) df['Name'] = df['Name'].map(lambda x: ''.join([i for i in x if i.isdigit()])) # Name # 0 5 # 1 8 # 2 722Answered By: jpp
Answer #5:
After 2 years, to help others, I actually think that you were very close to the answer. I have used your logic but made it work. basically you create a function that does the clean up and then apply it to the column C.
import pandas as pd import re df = pd.DataFrame({ 'A': ['red78', 'green', 'blue45'], 'B': ['square', 'circle', 'triangle'], 'C': ['big235', 'small123', 'big657'] }) def remove_chars(s): return re.sub('[^0-9]+', '', s) df['C'] = df['C'].apply(remove_chars) dfResult below:
A B C 0 red78 square 235 1 green circle 123 2 blue45 triangle 657