Remove non numeric characters python dataframe

I have a dataframe which looks like this:

A B C 1 red78 square big235 2 green circle small123 3 blue45 triangle big657

I need to be able to remove the non-numeric characters from all the rows in column C so that my dataframe looks like:

A B C 1 red78 square 235 2 green circle 123 3 blue45 triangle 657

I tried using the following but get the error expected string or buffer:

import re dfOutput.imgID = dfOutput.imgID.apply(re.sub('[^0-9]','', dfOutput.imgID), axis = 0)

What should I do instead?

Code to create dataframe:

dfObject = pd.DataFrame() dfObject.set_value(1, 'A', 'red78') dfObject.set_value(1, 'B', 'square') dfObject.set_value(1, 'C', 'big235') dfObject.set_value(2, 'A', 'green') dfObject.set_value(2, 'B', 'circle') dfObject.set_value(2, 'C', 'small123') dfObject.set_value(3, 'A', 'blue45') dfObject.set_value(3, 'B', 'triangle') dfObject.set_value(3, 'C', 'big657')

Question :

How can I remove all non-numeric characters from all the values in a particular column in pandas dataframe?

I have a dataframe which looks like this:

A B C 1 red78 square big235 2 green circle small123 3 blue45 triangle big657

I need to be able to remove the non-numeric characters from all the rows in column C so that my dataframe looks like:

A B C 1 red78 square 235 2 green circle 123 3 blue45 triangle 657

I tried using the following but get the error expected string or buffer:

import re dfOutput.imgID = dfOutput.imgID.apply(re.sub('[^0-9]','', dfOutput.imgID), axis = 0)

What should I do instead?

Code to create dataframe:

dfObject = pd.DataFrame() dfObject.set_value(1, 'A', 'red78') dfObject.set_value(1, 'B', 'square') dfObject.set_value(1, 'C', 'big235') dfObject.set_value(2, 'A', 'green') dfObject.set_value(2, 'B', 'circle') dfObject.set_value(2, 'C', 'small123') dfObject.set_value(3, 'A', 'blue45') dfObject.set_value(3, 'B', 'triangle') dfObject.set_value(3, 'C', 'big657')

Answer #1:

Use str.extract and pass a regex pattern to extract just the numeric parts:

In[40]: dfObject['C'] = dfObject['C'].str.extract('(d+)', expand=False) dfObject Out[40]: A B C 1 red78 square 235 2 green circle 123 3 blue45 triangle 657

If needed you can cast to int:

dfObject['C'] = dfObject['C'].astype(int)

Answer #2:

You can use .str.replace with a regex:

dfObject['C'] = dfObject.C.str.replace(r"[a-zA-Z]",'')

output:

A B C 1 red78 square 235 2 green circle 123 3 blue45 triangle 657

Answer #3:

To remove all non-digit characters from strings in a Pandas column you should use str.replace with D+ or [^0-9]+ patterns:

dfObject['C'] = dfObject['C'].str.replace(r'D+', '')

Or, since in Python 3, D is fully Unicode-aware by default and thus does not match non-ASCII digits (like ?????????, see proof) you should consider

dfObject['C'] = dfObject['C'].str.replace(r'[^0-9]+', '')

So,

import re print ( re.sub( r'D+', '', '1?????????0') ) # => 1?????????0 print ( re.sub( r'[^0-9]+', '', '1?????????0') ) # => 10

Answer #4:

You can also do this via a lambda function with str.isdigit:

import pandas as pd df = pd.DataFrame({'Name': ['John5', 'Tom 8', 'Ron 722']}) df['Name'] = df['Name'].map(lambda x: ''.join([i for i in x if i.isdigit()])) # Name # 0 5 # 1 8 # 2 722

Answered By: jpp

Answer #5:

After 2 years, to help others, I actually think that you were very close to the answer. I have used your logic but made it work. basically you create a function that does the clean up and then apply it to the column C.

import pandas as pd import re df = pd.DataFrame({ 'A': ['red78', 'green', 'blue45'], 'B': ['square', 'circle', 'triangle'], 'C': ['big235', 'small123', 'big657'] }) def remove_chars(s): return re.sub('[^0-9]+', '', s) df['C'] = df['C'].apply(remove_chars) df

Result below:

A B C 0 red78 square 235 1 green circle 123 2 blue45 triangle 657

How do you remove non

sub() method to remove all non-numeric characters from a string, e.g. result = re. sub(r'[^0-9]', '', my_str) . The re. sub() method will remove all non-numeric characters from the string by replacing them with empty strings.

How do I remove a character from a DataFrame?

To remove characters from columns in Pandas DataFrame, use the replace(~) method..
DataFrame({"A":["a","ab","cc"]}) df. A. 0 a. 1 ab. 2 cc..
df["A"]. str. replace("a","") 1 b. 2 cc. Name: A, dtype: object..
df["A"]. str. replace("[ab]","") 2 cc. Name: A, dtype: object..

How do I get rid of NaN and INF in pandas?

replace() , replace the infinite values with the NaN values and then use the pandas. DataFrame. dropna() method to remove the rows with NaN, Null/None values.

How can NaN pandas replace non

Process I follow.
Step 1: Try to import the data and let pandas infer data types. Check if the data types are as expected. ... .
Step 2: If data types are not as expected, explicitly set the data types on read using dtypes parameter. ... .
Step 3: Create an explicit list of values pandas cannot convert and cast them to NaN on read..

Chủ đề