How do i view multiple excel files in pandas?

I have a large spreadsheet file (.xlsx) that I'm processing using python pandas. It happens that I need data from two tabs (sheets) in that large file. One of the tabs has a ton of data and the other is just a few square cells.

When I use pd.read_excel() on any worksheet, it looks to me like the whole file is loaded (not just the worksheet I'm interested in). So when I use the method twice (once for each sheet), I effectively have to suffer the whole workbook being read in twice (even though we're only using the specified sheet).

How do I only load specific sheet(s) with pd.read_excel()?

smci

30.5k18 gold badges110 silver badges145 bronze badges

asked Oct 23, 2014 at 4:21

How do i view multiple excel files in pandas?

1

Try pd.ExcelFile:

xls = pd.ExcelFile('path_to_file.xls')
df1 = pd.read_excel(xls, 'Sheet1')
df2 = pd.read_excel(xls, 'Sheet2')

As noted by @HaPsantran, the entire Excel file is read in during the ExcelFile() call (there doesn't appear to be a way around this). This merely saves you from having to read the same file in each time you want to access a new sheet.

Note that the sheet_name argument to pd.read_excel() can be the name of the sheet (as above), an integer specifying the sheet number (eg 0, 1, etc), a list of sheet names or indices, or None. If a list is provided, it returns a dictionary where the keys are the sheet names/indices and the values are the data frames. The default is to simply return the first sheet (ie, sheet_name=0).

If None is specified, all sheets are returned, as a {sheet_name:dataframe} dictionary.

answered Oct 23, 2014 at 5:16

NoahNoah

20.2k8 gold badges62 silver badges71 bronze badges

11

There are a few options:

Read all sheets directly into an ordered dictionary.

import pandas as pd

# for pandas version >= 0.21.0
sheet_to_df_map = pd.read_excel(file_name, sheet_name=None)

# for pandas version < 0.21.0
sheet_to_df_map = pd.read_excel(file_name, sheetname=None)

Read the first sheet directly into dataframe

df = pd.read_excel('excel_file_path.xls')
# this will read the first sheet into df

Read the excel file and get a list of sheets. Then chose and load the sheets.

xls = pd.ExcelFile('excel_file_path.xls')

# Now you can list all sheets in the file
xls.sheet_names
# ['house', 'house_extra', ...]

# to read just one sheet to dataframe:
df = pd.read_excel(file_name, sheet_name="house")

Read all sheets and store it in a dictionary. Same as first but more explicit.

# to read all sheets to a map
sheet_to_df_map = {}
for sheet_name in xls.sheet_names:
    sheet_to_df_map[sheet_name] = xls.parse(sheet_name)
    # you can also use sheet_index [0,1,2..] instead of sheet name.

Thanks @ihightower for pointing it out way to read all sheets and @toto_tico,@red-headphone for pointing out the version issue.

sheetname : string, int, mixed list of strings/ints, or None, default 0 Deprecated since version 0.21.0: Use sheet_name instead Source Link

How do i view multiple excel files in pandas?

answered Sep 6, 2017 at 18:16

How do i view multiple excel files in pandas?

Vikash SinghVikash Singh

12.3k8 gold badges37 silver badges67 bronze badges

2

You can also use the index for the sheet:

xls = pd.ExcelFile('path_to_file.xls')
sheet1 = xls.parse(0)

will give the first worksheet. for the second worksheet:

sheet2 = xls.parse(1)

answered Feb 25, 2015 at 22:58

ElliottElliott

1,29112 silver badges12 bronze badges

1

You could also specify the sheet name as a parameter:

data_file = pd.read_excel('path_to_file.xls', sheet_name="sheet_name")

will upload only the sheet "sheet_name".

How do i view multiple excel files in pandas?

answered Feb 11, 2017 at 19:37

Mat0kanMat0kan

4214 silver badges4 bronze badges

Option 1

If one doesn't know the sheets names

# Read all sheets in your File
df = pd.read_excel('FILENAME.xlsm', sheet_name=None)
    
# Prints all the sheets name in an ordered dictionary
print(df.keys())

Then, depending on the sheet one wants to read, one can pass each of them to a specific dataframe, such as

sheet1_df = pd.read_excel('FILENAME.xlsm', sheet_name=SHEET1NAME)
sheet2_df = pd.read_excel('FILENAME.xlsm', sheet_name=SHEET2NAME)

Option 2

If the name is not relevant and all one cares about is the position of the sheet. Let's say one wants only the first sheet,

# Read all sheets in your File
df = pd.read_excel('FILENAME.xlsm', sheet_name=None)

sheet1 = list(df.keys())[0]

Then, depending on the sheet name, one can pass each it to a specific dataframe, such as

sheet1_df = pd.read_excel('FILENAME.xlsm', sheet_name=SHEET1NAME)

answered Nov 24, 2020 at 12:15

How do i view multiple excel files in pandas?

Gonçalo PeresGonçalo Peres

7,6033 gold badges40 silver badges68 bronze badges

pd.read_excel('filename.xlsx') 

by default read the first sheet of workbook.

pd.read_excel('filename.xlsx', sheet_name = 'sheetname') 

read the specific sheet of workbook and

pd.read_excel('filename.xlsx', sheet_name = None) 

read all the worksheets from excel to pandas dataframe as a type of OrderedDict means nested dataframes, all the worksheets as dataframes collected inside dataframe and it's type is OrderedDict.

answered Aug 1, 2019 at 17:01

How do i view multiple excel files in pandas?

Ashu007Ashu007

7271 gold badge9 silver badges13 bronze badges

If you are interested in reading all sheets and merging them together. The best and fastest way to do it

sheet_to_df_map = pd.read_excel('path_to_file.xls', sheet_name=None)
mdf = pd.concat(sheet_to_df_map, axis=0, ignore_index=True)

This will convert all the sheet into a single data frame m_df

answered Aug 11, 2020 at 3:02

How do i view multiple excel files in pandas?

anujsyalanujsyal

1581 silver badge6 bronze badges

You can read all the sheets using the following lines

import pandas as pd
file_instance = pd.ExcelFile('your_file.xlsx')

main_df = pd.concat([pd.read_excel('your_file.xlsx', sheet_name=name) for name in file_instance.sheet_names] , axis=0)

answered Sep 1, 2021 at 13:09

1

If:

  • you want multiple, but not all, worksheets, and
  • you want a single df as an output

Then, you can pass a list of worksheet names. Which you could populate manually:

import pandas as pd
    
path = "C:\\Path\\To\\Your\\Data\\"
file = "data.xlsx"
sheet_lst_wanted = ["01_SomeName","05_SomeName","12_SomeName"] # tab names from Excel

### import and compile data ###
    
# read all sheets from list into an ordered dictionary    
dict_temp = pd.read_excel(path+file, sheet_name= sheet_lst_wanted)

# concatenate the ordered dict items into a dataframe
df = pd.concat(dict_temp, axis=0, ignore_index=True)

OR

A bit of automation is possible if your desired worksheets have a common naming convention that also allows you to differentiate from unwanted sheets:

# substitute following block for the sheet_lst_wanted line in above block

import xlrd

# string common to only worksheets you want
str_like = "SomeName" 
    
### create list of sheet names in Excel file ###
xls = xlrd.open_workbook(path+file, on_demand=True)
sheet_lst = xls.sheet_names()
    
### create list of sheets meeting criteria  ###
sheet_lst_wanted = []
    
for s in sheet_lst:
    # note: following conditional statement based on my sheets ending with the string defined in sheet_like
    if s[-len(str_like):] == str_like:
        sheet_lst_wanted.append(s)
    else:
        pass

answered Aug 17, 2020 at 21:32

CreekGeekCreekGeek

1,4491 gold badge12 silver badges19 bronze badges

1

df = pd.read_excel('FileName.xlsx', 'SheetName')

This will read sheet SheetName from file FileName.xlsx

answered Jun 27, 2021 at 10:32

TajinderTajinder

2,1784 gold badges30 silver badges49 bronze badges

Yes unfortunately it will always load the full file. If you're doing this repeatedly probably best to extract the sheets to separate CSVs and then load separately. You can automate that process with d6tstack which also adds additional features like checking if all the columns are equal across all sheets or multiple Excel files.

import d6tstack
c = d6tstack.convert_xls.XLStoCSVMultiSheet('multisheet.xlsx')
c.convert_all() # ['multisheet-Sheet1.csv','multisheet-Sheet2.csv']

See d6tstack Excel examples

answered Dec 17, 2018 at 3:10

How do i view multiple excel files in pandas?

citynormancitynorman

4,4702 gold badges34 silver badges38 bronze badges

If you have saved the excel file in the same folder as your python program (relative paths) then you just need to mention sheet number along with file name.

Example:

 data = pd.read_excel("wt_vs_ht.xlsx", "Sheet2")
 print(data)
 x = data.Height
 y = data.Weight
 plt.plot(x,y,'x')
 plt.show()

How do i view multiple excel files in pandas?

albert

7,3478 gold badges44 silver badges79 bronze badges

answered Jan 13, 2020 at 7:25

How do you read all excel files in a folder in pandas?

To read all excel files in a directory, use the Glob module and the read_excel() method.

How do I view excel files in pandas?

To read an excel file as a DataFrame, use the pandas read_excel() method. You can read the first sheet, specific sheets, multiple sheets or all sheets. Pandas converts this to the DataFrame structure, which is a tabular like structure.

How read multiple excel files in Python?

Approach:.
Import necessary python packages like pandas, glob, and os..
Use glob python package to retrieve files/pathnames matching a specified pattern i.e. '. xlsx'.
Loop over the list of excel files, read that file using pandas. ... .
Convert each excel file into a dataframe..
Display its location, name, and content..

How do I view data in multiple excel files?

Highlight each worksheet tab you want to search by holding down the Ctrl key and clicking each tab you would like to search. Once each worksheet you want to search is highlighted, perform a Find, and all highlighted worksheets will be searched.