I have a dataframe which looks like this: A B C
1 red78 square big235
2 green circle small123
3 blue45 triangle big657
I need to be able to remove the non-numeric characters from all the rows in column C so that my dataframe looks like: A B C
1 red78 square 235
2 green circle 123
3 blue45 triangle 657
I tried using the following but get the error expected string or buffer: import re
dfOutput.imgID = dfOutput.imgID.apply(re.sub('[^0-9]','', dfOutput.imgID), axis = 0)
What should I do instead? Code to create dataframe: dfObject = pd.DataFrame()
dfObject.set_value(1, 'A', 'red78')
dfObject.set_value(1, 'B', 'square')
dfObject.set_value(1, 'C', 'big235')
dfObject.set_value(2, 'A', 'green')
dfObject.set_value(2, 'B', 'circle')
dfObject.set_value(2, 'C', 'small123')
dfObject.set_value(3, 'A', 'blue45')
dfObject.set_value(3, 'B', 'triangle')
dfObject.set_value(3, 'C', 'big657')
Question :How can I remove all non-numeric characters from all the values in a particular column in pandas dataframe? I have a dataframe which looks like this: A B C
1 red78 square big235
2 green circle small123
3 blue45 triangle big657
I need to be able to remove the non-numeric characters from all the rows in column C so that my dataframe looks like: A B C
1 red78 square 235
2 green circle 123
3 blue45 triangle 657
I tried using the following but get the error expected
string or buffer: import re
dfOutput.imgID = dfOutput.imgID.apply(re.sub('[^0-9]','', dfOutput.imgID), axis = 0)
What should I do instead? Code to create dataframe: dfObject = pd.DataFrame()
dfObject.set_value(1, 'A', 'red78')
dfObject.set_value(1, 'B', 'square')
dfObject.set_value(1, 'C', 'big235')
dfObject.set_value(2, 'A', 'green')
dfObject.set_value(2, 'B', 'circle')
dfObject.set_value(2, 'C', 'small123')
dfObject.set_value(3, 'A', 'blue45')
dfObject.set_value(3, 'B', 'triangle')
dfObject.set_value(3, 'C', 'big657')
Answer #1: Use str.extract and pass a regex pattern to extract just the numeric parts: In[40]:
dfObject['C'] = dfObject['C'].str.extract('(d+)', expand=False)
dfObject
Out[40]:
A B C
1 red78 square 235
2 green circle 123
3 blue45 triangle 657
If needed you can cast to int : dfObject['C'] = dfObject['C'].astype(int)
Answer #2: You can use .str.replace with a regex: dfObject['C'] = dfObject.C.str.replace(r"[a-zA-Z]",'')
output: A B C
1 red78 square 235
2 green circle 123
3 blue45 triangle 657
Answer #3: To remove all non-digit characters from strings in a Pandas column you
should use str.replace with D+ or [^0-9]+ patterns: dfObject['C'] = dfObject['C'].str.replace(r'D+', '')
Or, since in Python 3, D is fully Unicode-aware by default and thus does not match non-ASCII digits (like ????????? , see proof) you should consider dfObject['C'] = dfObject['C'].str.replace(r'[^0-9]+', '')
So, import re
print ( re.sub( r'D+', '', '1?????????0') ) # => 1?????????0
print ( re.sub( r'[^0-9]+', '', '1?????????0') ) # => 10
Answer #4:
You can also do this via a lambda function with str.isdigit : import pandas as pd
df = pd.DataFrame({'Name': ['John5', 'Tom 8', 'Ron 722']})
df['Name'] = df['Name'].map(lambda x: ''.join([i for i in x if i.isdigit()]))
# Name
# 0 5
# 1 8
# 2 722
Answered By: jpp Answer #5: After 2 years, to help others, I actually think that you were very close to the answer. I have used your logic but made it work. basically you create a function that does the clean up and then apply it to the column C . import pandas as pd
import re
df = pd.DataFrame({
'A': ['red78', 'green', 'blue45'],
'B': ['square', 'circle', 'triangle'],
'C': ['big235', 'small123', 'big657']
})
def remove_chars(s):
return re.sub('[^0-9]+', '', s)
df['C'] = df['C'].apply(remove_chars)
df
Result below: A B C
0 red78 square 235
1 green circle 123
2 blue45 triangle 657
How do you remove non
sub() method to remove all non-numeric characters from a string, e.g. result = re. sub(r'[^0-9]', '', my_str) . The re. sub() method will remove all non-numeric characters from the string by replacing them with empty strings.
How do I remove a character from a DataFrame?
To remove characters from columns in Pandas DataFrame, use the replace(~) method.. DataFrame({"A":["a","ab","cc"]}) df. A. 0 a. 1 ab. 2 cc.. df["A"]. str. replace("a","") 1 b. 2 cc. Name: A, dtype: object.. df["A"]. str. replace("[ab]","") 2 cc. Name: A, dtype: object..
How do I get rid of NaN and INF in pandas?
replace() , replace the infinite values with the NaN values and then use the pandas. DataFrame. dropna() method to remove the rows with NaN, Null/None values.
How can NaN pandas replace non
Process I follow. Step 1: Try to import the data and let pandas infer data types. Check if the data types are as expected. ... . Step 2: If data types are not as expected, explicitly set the data types on read using dtypes parameter. ... . Step 3: Create an explicit list of values pandas cannot convert and cast them to NaN on read..
|