Remove non numeric characters python dataframe

I have a dataframe which looks like this:

     A       B           C
1   red78   square    big235
2   green   circle    small123
3   blue45  triangle  big657

I need to be able to remove the non-numeric characters from all the rows in column C so that my dataframe looks like:

     A       B           C
1   red78   square    235
2   green   circle    123
3   blue45  triangle  657

I tried using the following but get the error expected string or buffer:

import re
dfOutput.imgID = dfOutput.imgID.apply(re.sub('[^0-9]','', dfOutput.imgID), axis = 0)

What should I do instead?

Code to create dataframe:

dfObject = pd.DataFrame()
dfObject.set_value(1, 'A', 'red78')
dfObject.set_value(1, 'B', 'square')
dfObject.set_value(1, 'C', 'big235')
dfObject.set_value(2, 'A', 'green')
dfObject.set_value(2, 'B', 'circle')
dfObject.set_value(2, 'C', 'small123')
dfObject.set_value(3, 'A', 'blue45')
dfObject.set_value(3, 'B', 'triangle')
dfObject.set_value(3, 'C', 'big657')

Question :

How can I remove all non-numeric characters from all the values in a particular column in pandas dataframe?

I have a dataframe which looks like this:

     A       B           C
1   red78   square    big235
2   green   circle    small123
3   blue45  triangle  big657

I need to be able to remove the non-numeric characters from all the rows in column C so that my dataframe looks like:

     A       B           C
1   red78   square    235
2   green   circle    123
3   blue45  triangle  657

I tried using the following but get the error expected string or buffer:

import re
dfOutput.imgID = dfOutput.imgID.apply(re.sub('[^0-9]','', dfOutput.imgID), axis = 0)

What should I do instead?

Code to create dataframe:

dfObject = pd.DataFrame()
dfObject.set_value(1, 'A', 'red78')
dfObject.set_value(1, 'B', 'square')
dfObject.set_value(1, 'C', 'big235')
dfObject.set_value(2, 'A', 'green')
dfObject.set_value(2, 'B', 'circle')
dfObject.set_value(2, 'C', 'small123')
dfObject.set_value(3, 'A', 'blue45')
dfObject.set_value(3, 'B', 'triangle')
dfObject.set_value(3, 'C', 'big657')

Answer #1:

Use str.extract and pass a regex pattern to extract just the numeric parts:

In[40]:
dfObject['C'] = dfObject['C'].str.extract('(d+)', expand=False)
dfObject

Out[40]: 
        A         B    C
1   red78    square  235
2   green    circle  123
3  blue45  triangle  657

If needed you can cast to int:

dfObject['C'] = dfObject['C'].astype(int)

Answer #2:

You can use .str.replace with a regex:

dfObject['C'] = dfObject.C.str.replace(r"[a-zA-Z]",'')

output:

        A         B    C
1   red78    square  235
2   green    circle  123
3  blue45  triangle  657

Answer #3:

To remove all non-digit characters from strings in a Pandas column you should use str.replace with D+ or [^0-9]+ patterns:

dfObject['C'] = dfObject['C'].str.replace(r'D+', '')

Or, since in Python 3, D is fully Unicode-aware by default and thus does not match non-ASCII digits (like ?????????, see proof) you should consider

dfObject['C'] = dfObject['C'].str.replace(r'[^0-9]+', '')

So,

import re
print ( re.sub( r'D+', '', '1?????????0') )         # => 1?????????0
print ( re.sub( r'[^0-9]+', '', '1?????????0') )     # => 10

Answer #4:

You can also do this via a lambda function with str.isdigit:

import pandas as pd

df = pd.DataFrame({'Name': ['John5', 'Tom 8', 'Ron 722']})

df['Name'] = df['Name'].map(lambda x: ''.join([i for i in x if i.isdigit()]))

#   Name
# 0    5
# 1    8
# 2  722

Answered By: jpp

Answer #5:

After 2 years, to help others, I actually think that you were very close to the answer. I have used your logic but made it work. basically you create a function that does the clean up and then apply it to the column C.

import pandas as pd
import re

df = pd.DataFrame({
     'A': ['red78', 'green', 'blue45'],
     'B': ['square', 'circle', 'triangle'],
    'C': ['big235', 'small123',  'big657']
})

def remove_chars(s):
    return re.sub('[^0-9]+', '', s) 

df['C'] = df['C'].apply(remove_chars)
df

Result below:

A   B   C
0   red78   square  235
1   green   circle  123
2   blue45  triangle    657

How do you remove non

sub() method to remove all non-numeric characters from a string, e.g. result = re. sub(r'[^0-9]', '', my_str) . The re. sub() method will remove all non-numeric characters from the string by replacing them with empty strings.

How do I remove a character from a DataFrame?

To remove characters from columns in Pandas DataFrame, use the replace(~) method..
DataFrame({"A":["a","ab","cc"]}) df. A. 0 a. 1 ab. 2 cc..
df["A"]. str. replace("a","") 1 b. 2 cc. Name: A, dtype: object..
df["A"]. str. replace("[ab]","") 2 cc. Name: A, dtype: object..

How do I get rid of NaN and INF in pandas?

replace() , replace the infinite values with the NaN values and then use the pandas. DataFrame. dropna() method to remove the rows with NaN, Null/None values.

How can NaN pandas replace non

Process I follow.
Step 1: Try to import the data and let pandas infer data types. Check if the data types are as expected. ... .
Step 2: If data types are not as expected, explicitly set the data types on read using dtypes parameter. ... .
Step 3: Create an explicit list of values pandas cannot convert and cast them to NaN on read..