Pandas read excel skip first column

Question

I know beforehand what columns I don't need from an excel file and I'd like to avoid them when reading the file to improve the performance. Something like this:

Nội dung chính Show

2 . How to skip rows when reading an excel file in pandas
3 . How to skip columns when reading an excel file in pandas.
4 . How to read an excel file in pandas with header
How do I skip a column in pandas DataFrame?
How do you skip first row in Excel pandas?
How do I make one column read only in pandas Excel?
How do I drop the first column in a DataFrame?

import pandas as pd
df = pd.read_excel('large_excel_file.xlsx', skip_cols=['col_a', 'col_b',...,'col_zz'])

There is nothing related to this in the documentation. is there any workaround for this?

Aran-Fey

36.5k10 gold badges96 silver badges141 bronze badges

asked Apr 5, 2018 at 16:32

5

If your version of pandas allows (check first if you can pass a function to usecols), I would try something like:

import pandas as pd
df = pd.read_excel('large_excel_file.xlsx', usecols=lambda x: 'Unnamed' not in x,)

This should skip all columns without header names. You could substitute 'Unnamed' with a list of column names you do not want.

answered May 22, 2019 at 8:28

MarMatMarMat

6545 silver badges12 bronze badges

2

You can use the following technique. Let the columns we don't want(want to skip) are 2 5 8, then find all reamining columns we DO WANT TO KEEP as cols such that:

In [7]: cols2skip = [2,5,8]  
In [8]: cols = [i for i in range(10) if i not in cols2skip]

In [9]: cols
Out[9]: [0, 1, 3, 4, 6, 7, 9]

and then we can use those remaining columns(which we DO WANT TO KEEP) using usecols:

df = pd.read_excel(filename, usecols=cols)

answered Apr 5, 2018 at 17:14

1

In this post you will learn –

1 . How to read an Excel file in pandas.

2 . How to skip rows when reading an excel file in pandas

3 . How to skip columns when reading an excel file in pandas.

4 . How to read an excel file in pandas with header.

Before we read an excel file in pandas, we need to install the openpyxl library. This is required dependency library that is need to read and write excel file in pandas.

In the Terminal on Mac or Command Line in windows, run the following command first.

pip install openpyxl

And In this post, we will work with this a sample file which you can download from here – data download link

Here, we have some sales data in multiple sheets and we want to read these data using pandas. let’s see how to do that.

First to get the name of all the sheets in the excel file, we can use the pd.ExcelFile()

sheets_names = pd.ExcelFile('reading_excel_file.xlsx').sheet_names
sheets_names

let’s say that we want to read the Purchase orders 1 data. To do that we need to pass the sheet name to the sheet_name parameter in pandas.

df = pd.read_excel('reading_excel_file.xlsx', sheet_name='Purchase Orders 1')

2 . How to skip rows when reading an excel file in pandas

If you want to skip the number of n rows from the top, you need to use the skiprows parameter.

Let’s say we want to skip the first 2 rows when reading the file.

df = pd.read_excel('reading_excel_file.xlsx', 
                   sheet_name='Purchase Orders 1',
                  skiprows=2)

We can see that top two rows has been skipped from the result.

And if you want to skip n rows from the end you can use the skipfooter parameter.

Let’s say we want to skip 3 rows from the end.

df = pd.read_excel('reading_excel_file.xlsx', 
                   sheet_name='Purchase Orders 1',
                  skipfooter=3)

3 . How to skip columns when reading an excel file in pandas.

To skip columns when reading an excel file, you can use the usecols parameter of read_excel.

let’s say I am only interested in reading the columns from Product ID to Ordered Quantity.

df = pd.read_excel('reading_excel_file.xlsx', 
                   sheet_name='Purchase Orders 1',
                  usecols='C:F')

Now, Suppose you want to select the first two columns and the last two columns. To do that you will write

df = pd.read_excel('reading_excel_file.xlsx', 
                   sheet_name='Purchase Orders 1',
                  usecols='A:B, H:I')

4 . How to read an excel file in pandas with header

Sometimes it could happen that the data that you want to work with does not starts with the first row. let’s copy the sales data to a new excel file and add some blank lines before the data to demonstrate that.

Now, if you look at the pic below, you can see that the data does not start from the first row. There are lots of blank rows which pandas fills with NaN (Not a number), and also the column names are be named as Unnamed.

If you want to avoid this then you can use the header parameter of the read_excel file.

You can tell pandas from where the header starts.

df = pd.read_excel("sales_excel.xlsx", 
                   sheet_name='Sheet1',
                  header=5)

1. Pandas read_csv() – read a csv file in Python.

2 . Pandas to_csv – write a dataframe to a csv file.

How do I skip a column in pandas DataFrame?

We can exclude one column from the pandas dataframe by using the loc function. This function removes the column based on the location. Here we will be using the loc() function with the given data frame to exclude columns with name,city, and cost in python.

How do you skip first row in Excel pandas?

Method 1: Using iloc() function Here this function is used to drop the first row by using row index.

How do I make one column read only in pandas Excel?

pandas read excel certain columns.

import pandas as pd..

import numpy as np..

file_loc = "path.xlsx".

df = pd. read_excel(file_loc, index_col=None, na_values=['NA'], usecols = "A,C:AA").

print(df).

How do I drop the first column in a DataFrame?

Use drop() to remove first column of pandas dataframe. Use del keyword to remove first column of pandas dataframe. Use pop() to remove first column of pandas dataframe.

programming