Remove all non numeric characters python dataframe

I have a dataframe which looks like this:

     A       B           C
1   red78   square    big235
2   green   circle    small123
3   blue45  triangle  big657

I need to be able to remove the non-numeric characters from all the rows in column C so that my dataframe looks like:

     A       B           C
1   red78   square    235
2   green   circle    123
3   blue45  triangle  657

I tried using the following but get the error expected string or buffer:

import re
dfOutput.imgID = dfOutput.imgID.apply(re.sub('[^0-9]','', dfOutput.imgID), axis = 0)

What should I do instead?

Code to create dataframe:

dfObject = pd.DataFrame()
dfObject.set_value(1, 'A', 'red78')
dfObject.set_value(1, 'B', 'square')
dfObject.set_value(1, 'C', 'big235')
dfObject.set_value(2, 'A', 'green')
dfObject.set_value(2, 'B', 'circle')
dfObject.set_value(2, 'C', 'small123')
dfObject.set_value(3, 'A', 'blue45')
dfObject.set_value(3, 'B', 'triangle')
dfObject.set_value(3, 'C', 'big657')

Question :

How can I remove all non-numeric characters from all the values in a particular column in pandas dataframe?

I have a dataframe which looks like this:

     A       B           C
1   red78   square    big235
2   green   circle    small123
3   blue45  triangle  big657

I need to be able to remove the non-numeric characters from all the rows in column C so that my dataframe looks like:

     A       B           C
1   red78   square    235
2   green   circle    123
3   blue45  triangle  657

I tried using the following but get the error expected string or buffer:

import re
dfOutput.imgID = dfOutput.imgID.apply(re.sub('[^0-9]','', dfOutput.imgID), axis = 0)

What should I do instead?

Code to create dataframe:

dfObject = pd.DataFrame()
dfObject.set_value(1, 'A', 'red78')
dfObject.set_value(1, 'B', 'square')
dfObject.set_value(1, 'C', 'big235')
dfObject.set_value(2, 'A', 'green')
dfObject.set_value(2, 'B', 'circle')
dfObject.set_value(2, 'C', 'small123')
dfObject.set_value(3, 'A', 'blue45')
dfObject.set_value(3, 'B', 'triangle')
dfObject.set_value(3, 'C', 'big657')

Answer #1:

Use str.extract and pass a regex pattern to extract just the numeric parts:

In[40]:
dfObject['C'] = dfObject['C'].str.extract('(d+)', expand=False)
dfObject

Out[40]: 
        A         B    C
1   red78    square  235
2   green    circle  123
3  blue45  triangle  657

If needed you can cast to int:

dfObject['C'] = dfObject['C'].astype(int)

Answer #2:

You can use .str.replace with a regex:

dfObject['C'] = dfObject.C.str.replace(r"[a-zA-Z]",'')

output:

        A         B    C
1   red78    square  235
2   green    circle  123
3  blue45  triangle  657

Answer #3:

To remove all non-digit characters from strings in a Pandas column you should use str.replace with D+ or [^0-9]+ patterns:

dfObject['C'] = dfObject['C'].str.replace(r'D+', '')

Or, since in Python 3, D is fully Unicode-aware by default and thus does not match non-ASCII digits (like ?????????, see proof) you should consider

dfObject['C'] = dfObject['C'].str.replace(r'[^0-9]+', '')

So,

import re
print ( re.sub( r'D+', '', '1?????????0') )         # => 1?????????0
print ( re.sub( r'[^0-9]+', '', '1?????????0') )     # => 10

Answer #4:

You can also do this via a lambda function with str.isdigit:

import pandas as pd

df = pd.DataFrame({'Name': ['John5', 'Tom 8', 'Ron 722']})

df['Name'] = df['Name'].map(lambda x: ''.join([i for i in x if i.isdigit()]))

#   Name
# 0    5
# 1    8
# 2  722

Answered By: jpp

Answer #5:

After 2 years, to help others, I actually think that you were very close to the answer. I have used your logic but made it work. basically you create a function that does the clean up and then apply it to the column C.

import pandas as pd
import re

df = pd.DataFrame({
     'A': ['red78', 'green', 'blue45'],
     'B': ['square', 'circle', 'triangle'],
    'C': ['big235', 'small123',  'big657']
})

def remove_chars(s):
    return re.sub('[^0-9]+', '', s) 

df['C'] = df['C'].apply(remove_chars)
df

Result below:

A   B   C
0   red78   square  235
1   green   circle  123
2   blue45  triangle    657

How do you remove non

sub() method to remove all non-numeric characters from a string, e.g. result = re. sub(r'[^0-9]', '', my_str) . The re. sub() method will remove all non-numeric characters from the string by replacing them with empty strings.

How do I get rid of NaN values in pandas?

By using dropna() method you can drop rows with NaN (Not a Number) and None values from pandas DataFrame. Note that by default it returns the copy of the DataFrame after removing rows. If you wanted to remove from the existing DataFrame, you should use inplace=True .

How do I get rid of NaN and INF in pandas?

Using pandas replace() & dropna() To Drop Infinite Values nan and use pd. DataFrame. dropna(axis=0) . to drop rows. This ideally drops all infinite values from pandas DataFrame.

Does Dropna remove none?

dropna() is used to drop/remove columns with NaN / None values. Python doesn't support Null hence any missing data is represented as None or NaN values.