Làm cách nào để thay đổi độ rộng của cột trong excel bằng openpyxl?

Tôi đang sử dụng OpenPyXL để tạo một số bản tải xuống Excel từ một trong các ứng dụng Django của mình. In general, I'm pretty happy with how it works. Đặt chiều rộng cột thật dễ dàng. Tìm ra cách để làm điều đó từ tài liệu là khó. Vì vậy, đây là, chỉ trong trường hợp người khác cần biết (hoặc tôi một tuần kể từ bây giờ khi tôi quên)

Show

Vì một số lý do, việc đặt cài đặt chiều rộng bên trong

pip install openpyxl
647 hoạt động

using PyCall
xl = pyimport("openpyxl")
wb = xl.Workbook()
ws = wb.active
# ws.column_dimensions["A"].width = 25  ## ERROR: KeyError: key "A" not found
py"""
$ws.column_dimensions["A"].width = 25
"""
wb.save("foo.xlsx")

Worksheet objects have

pip install openpyxl
648 and
pip install openpyxl
649 attributes that control row heights and column widths. A sheet’s
pip install openpyxl
648 and
pip install openpyxl
649are dictionary-like values; row_dimensions contains RowDimension objects and column_dimensions contains ColumnDimension objects. Trong row_dimensions, người ta có thể truy cập một trong các đối tượng bằng cách sử dụng số của hàng (trong trường hợp này là 1 hoặc 2). Trong column_dimensions, người ta có thể truy cập một trong các đối tượng bằng chữ cái của cột (trong trường hợp này là A hoặc B)

Mã số 1. Chương trình thiết lập kích thước của các ô




pip install openpyxl
652

pip install openpyxl
653
pip install openpyxl
654

pip install openpyxl
655

pip install openpyxl
656

pip install openpyxl
657

pip install openpyxl
6490
pip install openpyxl
6491
pip install openpyxl
6492

pip install openpyxl
655

pip install openpyxl
6494

pip install openpyxl
6495

pip install openpyxl
6496
pip install openpyxl
6491
pip install openpyxl
6498

pip install openpyxl
655

pip install openpyxl
6480

pip install openpyxl
6481
pip install openpyxl
6491
pip install openpyxl
64823
pip install openpyxl
6484
pip install openpyxl
6491
pip install openpyxl
64823
pip install openpyxl
64960
pip install openpyxl
6491
pip install openpyxl
64962
pip install openpyxl
6491
pip install openpyxl
64964
pip install openpyxl
64843
pip install openpyxl
6491
pip install openpyxl
64845
pip install openpyxl
6538

First of all, you must make sure to install the openpyxl library. You can do the same by running the below command on your terminal

pip install openpyxl

How to install openpyxl in Python

To change or modify column width size

In order to change the column width size, you can make use of the column_dimesnsions method of the worksheet class.
Syntax. worksheet. column_dimensions[tên cột]. width=size

Hãy để chúng tôi xem xét tương tự với ví dụ dưới đây

Consider an existing excel file codespeedy. xlsx as shown below;

Làm cách nào để thay đổi độ rộng của cột trong excel bằng openpyxl?
Làm cách nào để thay đổi độ rộng của cột trong excel bằng openpyxl?

So, now let us change the column size of column A;

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

As you can see, we have modified the column size of A to 20 and saved the file after modification as codespeedy1. xlsx

Làm cách nào để thay đổi độ rộng của cột trong excel bằng openpyxl?
Làm cách nào để thay đổi độ rộng của cột trong excel bằng openpyxl?

Tương tự, bạn cũng có thể sửa đổi độ rộng cột của nhiều hàng như hình minh họa;

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
sheet.column_dimensions['C'].width = 20
sheet.column_dimensions['E'].width = 30
worksheet.save("codespeedy1.xlsx")

Làm cách nào để thay đổi độ rộng của cột trong excel bằng openpyxl?
Làm cách nào để thay đổi độ rộng của cột trong excel bằng openpyxl?

Well, isn’t it amazing how you can manage such significant changes with such simple, small lines of code? Well, that in itself is the beauty of Python

The pandas I/O API is a set of top level

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
05 functions accessed like
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
06 that generally return a pandas object. Các hàm
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
07 tương ứng là các phương thức đối tượng được truy cập như
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
08. Below is a table containing available
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
09 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
10

Format Type

Data Description

Reader

Writer

chữ

CSV

read_csv

to_csv

chữ

Tệp văn bản có chiều rộng cố định

read_fwf

chữ

JSON

read_json

to_json

chữ

HTML

read_html

to_html

chữ

LaTeX

Styler. to_latex

chữ

XML

read_xml

to_xml

chữ

Bảng tạm cục bộ

read_clipboard

to_clipboard

nhị phân

MS Excel

read_excel

to_excel

nhị phân

tài liệu mở

read_excel

nhị phân

Định dạng HDF5

read_hdf

to_hdf

nhị phân

định dạng lông vũ

read_feather

to_feather

nhị phân

định dạng sàn gỗ

read_parquet

to_parquet

nhị phân

Định dạng ORC

read_orc

to_orc

nhị phân

trạng thái

read_stata

to_stata

nhị phân

SAS

read_sas

nhị phân

SPSS

read_spss

nhị phân

Định dạng dưa chua Python

read_pickle

to_pickle

SQL

SQL

read_sql

to_sql

SQL

Google BigQuery

read_gbq

to_gbq

Đây là so sánh hiệu suất không chính thức đối với một số phương thức IO này.

Ghi chú

Đối với các ví dụ sử dụng lớp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
11, hãy đảm bảo bạn nhập lớp đó bằng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
12 cho Python 3

Tệp CSV & văn bản#

Hàm đặc biệt để đọc tệp văn bản (a. k. a. tệp phẳng) là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
13. Xem sách dạy nấu ăn để biết một số chiến lược nâng cao.

Tùy chọn phân tích cú pháp #

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
13 chấp nhận các đối số phổ biến sau

Căn bản#

filepath_or_buffer khác nhau

Đường dẫn đến tệp (một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
15,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
16 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
17), URL (bao gồm các vị trí http, ftp và S3) hoặc bất kỳ đối tượng nào có phương thức
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
18 (chẳng hạn như tệp đang mở hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
11)

sep str, mặc định là
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
20 cho
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
13,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
22 cho
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
23

Dấu phân cách để sử dụng. Nếu sep là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24, công cụ C không thể tự động phát hiện dấu phân cách, nhưng công cụ phân tích cú pháp Python thì có thể, nghĩa là công cụ phân tích cú pháp sau sẽ được sử dụng và tự động phát hiện dấu phân cách bằng công cụ trình thám thính dựng sẵn của Python,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
25. Ngoài ra, dấu phân cách dài hơn 1 ký tự và khác với
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
26 sẽ được hiểu là biểu thức chính quy và cũng sẽ buộc sử dụng công cụ phân tích cú pháp Python. Lưu ý rằng các dấu phân cách regex có xu hướng bỏ qua dữ liệu được trích dẫn. Ví dụ về biểu thức chính quy.
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
27

dấu phân cách str, mặc định là
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Tên đối số thay thế cho sep

delim_whitespace boolean, default False

Chỉ định có hay không khoảng trắng (e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
29 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
30) will be used as the delimiter. Equivalent to setting
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
31. If this option is set to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32, nothing should be passed in for the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
33 parameter

Column and index locations and names#

header int or list of ints, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
34

Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names. if no names are passed the behavior is identical to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
35 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
36. Explicitly pass
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
35 to be able to replace existing names

The header can be a list of ints that specify row locations for a MultiIndex on the columns e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
38. Intervening rows that are not specified will be skipped (e. g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
39, so header=0 denotes the first line of data rather than the first line of the file

tên dạng mảng, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

List of column names to use. Nếu tệp không chứa hàng tiêu đề, thì bạn nên chuyển rõ ràng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
36. Bản sao trong danh sách này không được phép

index_col int, str, chuỗi int / str hoặc Sai, tùy chọn, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

(Các) cột để sử dụng làm nhãn hàng của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43, được cung cấp dưới dạng tên chuỗi hoặc chỉ mục cột. Nếu một chuỗi int / str được đưa ra, Multi Index được sử dụng

Ghi chú

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
44 có thể được sử dụng để buộc gấu trúc không sử dụng cột đầu tiên làm chỉ mục, e. g. khi bạn có tệp không đúng định dạng với dấu phân cách ở cuối mỗi dòng

Giá trị mặc định của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24 hướng dẫn gấu trúc đoán. Nếu số trường trong hàng tiêu đề cột bằng với số trường trong phần thân của tệp dữ liệu thì chỉ mục mặc định được sử dụng. Nếu nó lớn hơn, thì các cột đầu tiên được sử dụng làm chỉ mục sao cho số trường còn lại trong phần nội dung bằng với số trường trong tiêu đề

Hàng đầu tiên sau tiêu đề được sử dụng để xác định số lượng cột sẽ được đưa vào chỉ mục. If the subsequent rows contain less columns than the first row, they are filled with

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

Điều này có thể tránh được thông qua

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47. Điều này đảm bảo rằng các cột được lấy nguyên trạng và dữ liệu theo sau bị bỏ qua

usecols giống như danh sách hoặc có thể gọi được, mặc định là
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Return a subset of the columns. If list-like, all elements must either be positional (i. e. integer indices into the document columns) or strings that correspond to column names provided either by the user in

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
49 or inferred from the document header row(s). Nếu
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
49 được cung cấp, (các) hàng tiêu đề tài liệu không được tính đến. For example, a valid list-like
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 parameter would be
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
52 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
53

Element order is ignored, so

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
54 is the same as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
55. Để khởi tạo một DataFrame từ
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
56 với thứ tự phần tử được giữ nguyên, hãy sử dụng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
57 cho các cột theo thứ tự
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
58 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
59 cho thứ tự
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
60

Nếu có thể gọi được, hàm có thể gọi được sẽ được đánh giá dựa trên tên cột, trả về các tên mà hàm có thể gọi được đánh giá là True

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0

Sử dụng tham số này dẫn đến thời gian phân tích cú pháp nhanh hơn nhiều và sử dụng bộ nhớ thấp hơn khi sử dụng công cụ c. The Python engine loads the data first before deciding which columns to drop

squeeze boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

Nếu dữ liệu được phân tích cú pháp chỉ chứa một cột thì hãy trả về

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62

Không dùng nữa kể từ phiên bản 1. 4. 0. Nối

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
63 vào lệnh gọi tới
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
64 để nén dữ liệu.

tiền tố str, mặc định là
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Tiền tố để thêm vào số cột khi không có tiêu đề, e. g. 'X' cho X0, X1, ...

Không dùng nữa kể từ phiên bản 1. 4. 0. Sử dụng cách hiểu danh sách trên các cột của DataFrame sau khi gọi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7

mangle_dupe_cols boolean, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

Các cột trùng lặp sẽ được chỉ định là 'X', 'X. 1’…’X. N’, rather than ‘X’…’X’. Passing in

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61 will cause data to be overwritten if there are duplicate names in the columns

Không dùng nữa kể từ phiên bản 1. 5. 0. Đối số chưa bao giờ được triển khai và thay vào đó, một đối số mới có thể chỉ định mẫu đổi tên sẽ được thêm vào.

Cấu hình phân tích cú pháp chung#

dtype Nhập tên hoặc chính tả của cột -> loại, mặc định là
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Kiểu dữ liệu cho dữ liệu hoặc cột. e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
70 Sử dụng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
15 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
72 cùng với cài đặt
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
73 phù hợp để giữ nguyên và không diễn giải dtype. Nếu bộ chuyển đổi được chỉ định, chúng sẽ được áp dụng THAY THẾ cho chuyển đổi dtype

Mới trong phiên bản 1. 5. 0. Đã thêm hỗ trợ cho defaultdict. Chỉ định một defaultdict làm đầu vào trong đó mặc định xác định dtype của các cột không được liệt kê rõ ràng.

công cụ {
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
74,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
75,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
76}

Công cụ phân tích cú pháp để sử dụng. The C and pyarrow engines are faster, while the python engine is currently more feature-complete. Đa luồng hiện chỉ được hỗ trợ bởi công cụ pyarrow

New in version 1. 4. 0. Công cụ “pyarrow” đã được thêm làm công cụ thử nghiệm và một số tính năng không được hỗ trợ hoặc có thể không hoạt động chính xác với công cụ này.

bộ chuyển đổi chính tả, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Dict của các chức năng để chuyển đổi các giá trị trong các cột nhất định. Các khóa có thể là số nguyên hoặc nhãn cột

true_values danh sách, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Các giá trị được coi là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

false_values danh sách, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Values to consider as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

skipinitialspace boolean, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

Bỏ qua khoảng trắng sau dấu phân cách

skiprows dạng danh sách hoặc số nguyên, mặc định là
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Số dòng cần bỏ qua (được lập chỉ mục 0) hoặc số dòng cần bỏ qua (int) ở đầu tệp

Nếu có thể gọi được, hàm có thể gọi được sẽ được đánh giá dựa trên các chỉ số hàng, trả về True nếu hàng sẽ bị bỏ qua và Sai nếu không

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
5

skipfooter int, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
84

Number of lines at bottom of file to skip (unsupported with engine=’c’)

nrows int, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Số hàng của tập tin để đọc. Hữu ích để đọc các phần của tệp lớn

low_memory boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

Xử lý nội bộ tệp theo khối, dẫn đến việc sử dụng bộ nhớ thấp hơn trong khi phân tích cú pháp, nhưng có thể suy luận kiểu hỗn hợp. Để đảm bảo không có loại hỗn hợp, hãy đặt

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61 hoặc chỉ định loại bằng tham số
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88. Note that the entire file is read into a single
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 regardless, use the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
91 parameter to return the data in chunks. (Chỉ hợp lệ với trình phân tích cú pháp C)

memory_map boolean, mặc định Sai

Nếu đường dẫn tệp được cung cấp cho ______ 492, ánh xạ đối tượng tệp trực tiếp vào bộ nhớ và truy cập dữ liệu trực tiếp từ đó. Sử dụng tùy chọn này có thể cải thiện hiệu suất vì không còn bất kỳ chi phí I/O nào nữa

NA và xử lý dữ liệu bị thiếu#

na_values vô hướng, str, dạng danh sách hoặc dict, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Các chuỗi bổ sung để nhận dạng là NA/NaN. Nếu dict được thông qua, các giá trị NA cụ thể trên mỗi cột. Xem na values ​​const bên dưới để biết danh sách các giá trị được hiểu là NaN theo mặc định.

keep_default_na boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

Whether or not to include the default NaN values when parsing the data. Depending on whether

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
73 is passed in, the behavior is as follows

  • Nếu

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    96 là
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    32 và
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    73 được chỉ định, thì
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    73 được thêm vào các giá trị NaN mặc định được sử dụng để phân tích cú pháp

  • If

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    96 is
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    32, and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    73 are not specified, only the default NaN values are used for parsing

  • Nếu

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    96 là
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    61 và
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    73 được chỉ định, thì chỉ các giá trị NaN được chỉ định
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    73 được sử dụng để phân tích cú pháp

  • Nếu

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    96 là
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    61 và
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    73 không được chỉ định, sẽ không có chuỗi nào được phân tích thành NaN

Note that if

pip install openpyxl
1210 is passed in as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61, the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
96 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
73 parameters will be ignored

na_filter boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

Phát hiện các điểm đánh dấu giá trị bị thiếu (chuỗi trống và giá trị của na_values). Trong dữ liệu không có bất kỳ NA nào, việc chuyển

pip install openpyxl
1215 có thể cải thiện hiệu suất đọc một tệp lớn

verbose boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

Cho biết số lượng giá trị NA được đặt trong các cột không phải là số

skip_blank_lines boolean, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

Nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32, hãy bỏ qua các dòng trống thay vì diễn giải dưới dạng giá trị NaN

Xử lý ngày giờ #

parse_dates boolean or list of ints or names or list of lists or dict, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61.
  • Nếu

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    32 -> thử phân tích cú pháp chỉ mục

  • Nếu

    pip install openpyxl
    1221 -> thử phân tích từng cột 1, 2, 3 thành một cột ngày riêng biệt

  • Nếu

    pip install openpyxl
    1222 -> kết hợp cột 1 và 3 và phân tích dưới dạng một cột ngày

  • Nếu

    pip install openpyxl
    1223 -> phân tích cột 1, 3 thành ngày và gọi kết quả là 'foo'

Ghi chú

Đường dẫn nhanh tồn tại cho các ngày có định dạng iso8601

infer_datetime_format boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

Nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32 và parse_dates được bật cho một cột, hãy thử suy ra định dạng ngày giờ để tăng tốc độ xử lý

keep_date_col boolean, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32 and parse_dates specifies combining multiple columns then keep the original columns

date_parser , mặc định là
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Hàm sử dụng để chuyển đổi một chuỗi các cột chuỗi thành một mảng các thể hiện thời gian. Mặc định sử dụng

pip install openpyxl
1229 để thực hiện chuyển đổi. gấu trúc sẽ cố gắng gọi date_parser theo ba cách khác nhau, chuyển sang cách tiếp theo nếu xảy ra ngoại lệ. 1) Chuyển một hoặc nhiều mảng (như được định nghĩa bởi parse_dates) làm đối số;

dayfirst boolean, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

Ngày định dạng DD/MM, định dạng quốc tế và châu Âu

cache_dates boolean, default True

Nếu Đúng, hãy sử dụng bộ nhớ cache của các ngày đã chuyển đổi, duy nhất để áp dụng chuyển đổi ngày giờ. Có thể tạo ra tốc độ tăng đáng kể khi phân tích chuỗi ngày trùng lặp, đặc biệt là các chuỗi có chênh lệch múi giờ

Mới trong phiên bản 0. 25. 0

Lần lặp #

trình lặp boolean, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

Trả về đối tượng

pip install openpyxl
1232 để lặp lại hoặc nhận khối với
pip install openpyxl
1233

chunksize int, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Trả về đối tượng

pip install openpyxl
1232 để lặp lại. Xem lặp lại và phân đoạn bên dưới.

Định dạng trích dẫn, nén và tệp #

nén {
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
34,
pip install openpyxl
1237,
pip install openpyxl
1238,
pip install openpyxl
1239,
pip install openpyxl
1240,
pip install openpyxl
1241,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24,
pip install openpyxl
1243}, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
34

Để giải nén dữ liệu trên đĩa nhanh chóng. Nếu 'suy ra', thì hãy sử dụng gzip, bz2, zip, xz hoặc zstandard nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
92 giống như đường dẫn kết thúc bằng '. gz', '. bz2', '. nén', '. xz', '. zst', tương ứng và không giải nén nếu không. Nếu sử dụng 'zip', tệp ZIP chỉ được chứa một tệp dữ liệu để đọc trong. Đặt thành
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24 để không giải nén. Can also be a dict with key
pip install openpyxl
1247 set to one of {
pip install openpyxl
1239,
pip install openpyxl
1237,
pip install openpyxl
1238,
pip install openpyxl
1241} and other key-value pairs are forwarded to
pip install openpyxl
1252,
pip install openpyxl
1253,
pip install openpyxl
1254, or
pip install openpyxl
1255. As an example, the following could be passed for faster compression and to create a reproducible gzip archive.
pip install openpyxl
1256

Đã thay đổi trong phiên bản 1. 1. 0. tùy chọn dict được mở rộng để hỗ trợ

pip install openpyxl
1257 và
pip install openpyxl
1258.

Đã thay đổi trong phiên bản 1. 2. 0. Các phiên bản trước đã chuyển tiếp các mục nhập chính tả cho ‘gzip’ tới

pip install openpyxl
1259.

nghìn str, mặc định là
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Dấu phân cách hàng nghìn

thập phân str, mặc định
pip install openpyxl
1261

Ký tự để nhận dạng là dấu thập phân. e. g. sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
20 cho dữ liệu châu Âu

float_precision string, default None

Specifies which converter the C engine should use for floating-point values. Các tùy chọn là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24 cho bộ chuyển đổi thông thường,
pip install openpyxl
1264 cho bộ chuyển đổi có độ chính xác cao và
pip install openpyxl
1265 cho bộ chuyển đổi khứ hồi

lineterminator str (độ dài 1), mặc định là
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Ký tự để chia tệp thành các dòng. Only valid with C parser

quotechar str (độ dài 1)

The character used to denote the start and end of a quoted item. Các mục được trích dẫn có thể bao gồm dấu phân cách và nó sẽ bị bỏ qua

quoting int or
pip install openpyxl
1267 instance, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
84

Control field quoting behavior per

pip install openpyxl
1267 constants. Sử dụng một trong số
pip install openpyxl
1270 (0),
pip install openpyxl
1271 (1),
pip install openpyxl
1272 (2) hoặc
pip install openpyxl
1273 (3)

trích dẫn kép boolean, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

When

pip install openpyxl
1275 is specified and
pip install openpyxl
1276 is not
pip install openpyxl
1273, indicate whether or not to interpret two consecutive
pip install openpyxl
1275 elements inside a field as a single
pip install openpyxl
1275 element

escapechar str (độ dài 1), mặc định là
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Chuỗi một ký tự được sử dụng để thoát khỏi dấu phân cách khi trích dẫn là

pip install openpyxl
1273

nhận xét str, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Cho biết phần còn lại của dòng không nên được phân tích cú pháp. Nếu được tìm thấy ở đầu dòng, dòng đó sẽ bị bỏ qua hoàn toàn. Tham số này phải là một ký tự đơn. Giống như các dòng trống (miễn là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
39), các dòng nhận xét đầy đủ bị bỏ qua bởi tham số
pip install openpyxl
1284 chứ không phải bởi
pip install openpyxl
1285. For example, if
pip install openpyxl
1286, parsing ‘#empty\na,b,c\n1,2,3’ with
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
35 will result in ‘a,b,c’ being treated as the header

encoding str, default
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Mã hóa để sử dụng cho UTF khi đọc/ghi (e. g.

pip install openpyxl
1289). Danh sách mã hóa tiêu chuẩn Python

phương ngữ str hoặc ví dụ
pip install openpyxl
1290, mặc định là
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Nếu được cung cấp, thông số này sẽ ghi đè giá trị (mặc định hoặc không) cho các thông số sau.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
33,
pip install openpyxl
1293,
pip install openpyxl
1294,
pip install openpyxl
1295,
pip install openpyxl
1275 và
pip install openpyxl
1276. Nếu cần ghi đè các giá trị, Cảnh báo phân tích cú pháp sẽ được đưa ra. Xem tài liệu
pip install openpyxl
1290 để biết thêm chi tiết

Xử lý lỗi#

error_bad_lines boolean, tùy chọn, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Dòng có quá nhiều trường (e. g. một dòng csv có quá nhiều dấu phẩy) theo mặc định sẽ gây ra một ngoại lệ và không có

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 nào được trả về. Nếu
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61, thì những "dòng xấu" này sẽ bị loại bỏ khỏi
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 được trả về. Xem đường xấu bên dưới.

Không dùng nữa kể từ phiên bản 1. 3. 0. Thay vào đó, nên sử dụng tham số

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0503 để chỉ định hành vi khi gặp phải một dòng xấu.

warn_bad_lines boolean, tùy chọn, mặc định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24

Nếu error_bad_lines là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61, vàWarner_bad_lines là
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32, một cảnh báo cho mỗi “dòng xấu” sẽ được xuất ra

Không dùng nữa kể từ phiên bản 1. 3. 0. Thay vào đó, nên sử dụng tham số

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0503 để chỉ định hành vi khi gặp phải một dòng xấu.

on_bad_lines ('lỗi', 'cảnh báo', 'bỏ ​​qua'), 'lỗi' mặc định

Chỉ định những việc cần làm khi gặp phải một dòng xấu (một dòng có quá nhiều trường). Các giá trị được phép là

  • 'lỗi', tăng ParserError khi gặp phải một dòng xấu

  • ‘warn’, in cảnh báo khi gặp dòng xấu và bỏ qua dòng đó

  • 'bỏ qua', bỏ qua các dòng xấu mà không báo trước hoặc cảnh báo khi gặp phải

Mới trong phiên bản 1. 3. 0

Chỉ định kiểu dữ liệu cột#

Bạn có thể chỉ định loại dữ liệu cho toàn bộ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 hoặc từng cột riêng lẻ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

May mắn thay, pandas cung cấp nhiều cách để đảm bảo rằng (các) cột của bạn chỉ chứa một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88. Nếu chưa quen với những khái niệm này, bạn có thể xem tại đây để tìm hiểu thêm về dtypes và tại đâytại đây . tại đây . tại đây . tại đây . tại đây . tại đây . tại đây . tại đây . tại đây . to learn more about
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
72 conversion in pandas.

Chẳng hạn, bạn có thể sử dụng đối số

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0511 của
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
13

pip install openpyxl
12

Hoặc bạn có thể sử dụng hàm

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0513 để ép buộc các dtypes sau khi đọc dữ liệu,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
05

sẽ chuyển đổi tất cả phân tích cú pháp hợp lệ thành float, để lại phân tích cú pháp không hợp lệ là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

Cuối cùng, cách bạn xử lý việc đọc trong các cột có chứa các kiểu dữ liệu hỗn hợp tùy thuộc vào nhu cầu cụ thể của bạn. Trong trường hợp trên, nếu bạn muốn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46 loại bỏ các điểm bất thường về dữ liệu, thì
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0513 có lẽ là lựa chọn tốt nhất của bạn. Tuy nhiên, nếu bạn muốn tất cả dữ liệu được ép buộc, bất kể loại nào, thì việc sử dụng đối số
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0511 của
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
13 chắc chắn sẽ đáng để thử

Ghi chú

Trong một số trường hợp, việc đọc dữ liệu bất thường với các cột chứa các kiểu dữ liệu hỗn hợp sẽ dẫn đến tập dữ liệu không nhất quán. Nếu bạn dựa vào gấu trúc để suy ra các kiểu dữ liệu của các cột, công cụ phân tích cú pháp sẽ đi và suy ra các kiểu dữ liệu cho các khối dữ liệu khác nhau, thay vì toàn bộ tập dữ liệu cùng một lúc. Do đó, bạn có thể kết thúc với (các) cột có các kiểu dữ liệu hỗn hợp. Ví dụ,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
24

sẽ dẫn đến

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0519 chứa một kiểu dữ liệu
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0520 cho một số đoạn nhất định của cột và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
15 cho những cột khác do các kiểu dữ liệu hỗn hợp từ dữ liệu được đọc trong. Điều quan trọng cần lưu ý là toàn bộ cột sẽ được đánh dấu bằng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88 của
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
72, được sử dụng cho các cột có kiểu chữ hỗn hợp

Chỉ định phân loại dtype#

Các cột

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0524 có thể được phân tích cú pháp trực tiếp bằng cách chỉ định
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0525 hoặc
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0526

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
33

Các cột riêng lẻ có thể được phân tích cú pháp dưới dạng

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0524 bằng cách sử dụng đặc tả chính tả

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
35

Việc chỉ định

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0525 sẽ dẫn đến một
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0524 không có thứ tự có
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0530 là các giá trị duy nhất được quan sát thấy trong dữ liệu. Để kiểm soát nhiều hơn đối với các danh mục và thứ tự, hãy tạo trước một
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0531 và chuyển mã đó cho
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88 của cột đó

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
41

Khi sử dụng

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0533, các giá trị "bất ngờ" bên ngoài
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0534 được coi là giá trị bị thiếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
70

Điều này phù hợp với hành vi của

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0535

Ghi chú

Với

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0525, các danh mục kết quả sẽ luôn được phân tích thành chuỗi (đối tượng dtype). If the categories are numeric they can be converted using the
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0513 function, or as appropriate, another converter such as
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0538

Khi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88 là một
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0531 với đồng nhất
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0530 ( tất cả là số, tất cả ngày giờ, v.v. ), quá trình chuyển đổi được thực hiện tự động

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
71

Đặt tên và sử dụng các cột#

Xử lý tên cột#

Một tệp có thể có hoặc không có hàng tiêu đề. gấu trúc giả sử hàng đầu tiên nên được sử dụng làm tên cột

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
72

Bằng cách chỉ định đối số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
49 kết hợp với
pip install openpyxl
1284, bạn có thể chỉ ra các tên khác sẽ sử dụng và có nên loại bỏ hàng tiêu đề hay không (nếu có)

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
73

Nếu tiêu đề nằm trong một hàng khác với hàng đầu tiên, hãy chuyển số hàng cho

pip install openpyxl
1284. Điều này sẽ bỏ qua các hàng trước

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
74

Ghi chú

Hành vi mặc định là suy ra tên cột. nếu không có tên nào được chuyển thì hành vi giống hệt với

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
35 và tên cột được suy ra từ dòng không trống đầu tiên của tệp, nếu tên cột được chuyển rõ ràng thì hành vi giống hệt với
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
36

Phân tích cú pháp tên trùng lặp#

Không dùng nữa kể từ phiên bản 1. 5. 0. ______20547 chưa bao giờ được triển khai và thay vào đó, một đối số mới trong đó mẫu đổi tên có thể được chỉ định sẽ được thêm vào.

Nếu tệp hoặc tiêu đề chứa tên trùng lặp, theo mặc định, gấu trúc sẽ phân biệt giữa chúng để ngăn ghi đè dữ liệu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
75

Không còn dữ liệu trùng lặp vì theo mặc định,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0548 sẽ sửa đổi một loạt các cột trùng lặp 'X', ..., 'X' thành 'X', 'X. 1’, …, ‘X. N'

Lọc cột (
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47)#

Đối số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 cho phép bạn chọn bất kỳ tập hợp con nào của các cột trong một tệp, bằng cách sử dụng tên cột, số vị trí hoặc có thể gọi được

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
76

Đối số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 cũng có thể được sử dụng để chỉ định cột nào không được sử dụng trong kết quả cuối cùng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
77

Trong trường hợp này, khả năng gọi được chỉ định rằng chúng tôi loại trừ các cột “a” và “c” khỏi đầu ra

Nhận xét và dòng trống#

Bỏ qua chú thích dòng và dòng trống#

Nếu tham số

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0552 được chỉ định, thì các dòng nhận xét hoàn toàn sẽ bị bỏ qua. Theo mặc định, các dòng hoàn toàn trống cũng sẽ bị bỏ qua

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
78

Nếu

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0553, thì
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 sẽ không bỏ qua các dòng trống

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
79

Cảnh báo

Sự hiện diện của các dòng bị bỏ qua có thể tạo ra sự mơ hồ liên quan đến số dòng;

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
50

Nếu cả

pip install openpyxl
1284 và
pip install openpyxl
1285 đều được chỉ định, thì
pip install openpyxl
1284 sẽ liên quan đến phần cuối của
pip install openpyxl
1285. Ví dụ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
51

Bình luận#

Đôi khi nhận xét hoặc dữ liệu meta có thể được bao gồm trong một tệp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
52

Theo mặc định, trình phân tích cú pháp bao gồm các nhận xét trong đầu ra

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
53

We can suppress the comments using the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0552 keyword

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
54

Xử lý dữ liệu Unicode#

Đối số

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0562 nên được sử dụng cho dữ liệu unicode được mã hóa, điều này sẽ dẫn đến kết quả là các chuỗi byte được giải mã thành unicode

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
55

Một số định dạng mã hóa tất cả các ký tự dưới dạng nhiều byte, chẳng hạn như UTF-16, sẽ không phân tích cú pháp chính xác nếu không chỉ định mã hóa. Danh sách đầy đủ các bảng mã tiêu chuẩn của Python

Cột chỉ mục và dấu phân cách #

Nếu một tệp có nhiều hơn một cột dữ liệu so với số lượng tên cột, thì cột đầu tiên sẽ được sử dụng làm tên hàng của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
56

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
57

Thông thường, bạn có thể đạt được hành vi này bằng cách sử dụng tùy chọn

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0564

Có một số trường hợp ngoại lệ khi một tệp đã được chuẩn bị với các dấu phân cách ở cuối mỗi dòng dữ liệu, gây nhầm lẫn cho trình phân tích cú pháp. Để vô hiệu hóa rõ ràng suy luận cột chỉ mục và loại bỏ cột cuối cùng, hãy vượt qua

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
44

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
58

Nếu một tập hợp con dữ liệu đang được phân tích cú pháp bằng tùy chọn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47, thì thông số kỹ thuật của
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0564 dựa trên tập hợp con đó, không phải dữ liệu gốc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
59

Xử lý ngày#

Chỉ định cột ngày #

Để tạo điều kiện thuận lợi hơn khi làm việc với dữ liệu ngày giờ,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
13 sử dụng các đối số từ khóa
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0569 và
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0570 để cho phép người dùng chỉ định nhiều cột và định dạng ngày/giờ để biến dữ liệu văn bản đầu vào thành các đối tượng
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0571

Trường hợp đơn giản nhất là chỉ cần vượt qua trong

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0572

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
0

Thông thường, chúng tôi có thể muốn lưu trữ dữ liệu ngày và giờ riêng biệt hoặc lưu trữ các trường ngày khác nhau một cách riêng biệt. từ khóa

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0569 có thể được sử dụng để chỉ định tổ hợp các cột để phân tích ngày và/hoặc thời gian từ

Bạn có thể chỉ định một danh sách các danh sách cột thành

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0569, các cột ngày kết quả sẽ được thêm vào đầu ra (để không ảnh hưởng đến thứ tự cột hiện có) và các tên cột mới sẽ là phần nối của các tên cột thành phần

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
1

Theo mặc định, trình phân tích cú pháp loại bỏ các cột ngày của thành phần, nhưng bạn có thể chọn giữ lại chúng thông qua từ khóa

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0575

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
2

Lưu ý rằng nếu bạn muốn kết hợp nhiều cột thành một cột ngày, thì phải sử dụng danh sách lồng nhau. Nói cách khác,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0576 chỉ ra rằng mỗi cột thứ hai và thứ ba phải được phân tích cú pháp thành các cột ngày riêng biệt trong khi
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0577 có nghĩa là hai cột phải được phân tích cú pháp thành một cột

Bạn cũng có thể sử dụng lệnh để chỉ định các cột tên tùy chỉnh

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
3

Điều quan trọng cần nhớ là nếu nhiều cột văn bản được phân tích thành một cột ngày, thì một cột mới sẽ được thêm vào trước dữ liệu. Thông số kỹ thuật của

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0564 dựa trên tập hợp cột mới này thay vì các cột dữ liệu ban đầu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
4

Ghi chú

Nếu một cột hoặc chỉ mục chứa ngày không thể phân tích cú pháp, thì toàn bộ cột hoặc chỉ mục đó sẽ được trả về không thay đổi dưới dạng kiểu dữ liệu đối tượng. Để phân tích cú pháp ngày giờ không chuẩn, hãy sử dụng

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0538 sau
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0580

Ghi chú

read_csv có fast_path để phân tích chuỗi ngày giờ ở định dạng iso8601, e. g “2000-01-01T00. 01. 02+00. 00” và các biến thể tương tự. Nếu bạn có thể sắp xếp dữ liệu của mình để lưu trữ thời gian ở định dạng này, thì thời gian tải sẽ nhanh hơn đáng kể, đã quan sát được ~20 lần

Chức năng phân tích ngày #

Finally, the parser allows you to specify a custom

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0570 function to take full advantage of the flexibility of the date parsing API

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
5

gấu trúc sẽ cố gắng gọi hàm

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0570 theo ba cách khác nhau. Nếu một ngoại lệ được đưa ra, ngoại lệ tiếp theo sẽ được thử

  1. import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    0570 được gọi đầu tiên với một hoặc nhiều mảng làm đối số, như được định nghĩa bằng cách sử dụng
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    0569 (e. g. ,
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    0585)

  2. Nếu #1 không thành công, thì

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    0570 được gọi với tất cả các cột được nối theo hàng thành một mảng duy nhất (e. g. ,
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    0587)

Lưu ý rằng về hiệu suất, bạn nên thử các phương pháp phân tích ngày này theo thứ tự

  1. Cố gắng suy ra định dạng bằng cách sử dụng

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    0588 (xem phần bên dưới)

  2. Nếu bạn biết định dạng, hãy sử dụng

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    0589.
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    0590

  3. Nếu bạn có định dạng thực sự không chuẩn, hãy sử dụng hàm

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    0570 tùy chỉnh. Để có hiệu suất tối ưu, điều này nên được vector hóa, tôi. e. , nó sẽ chấp nhận mảng làm đối số

Phân tích cú pháp CSV với các múi giờ hỗn hợp#

pandas không thể đại diện cho một cột hoặc chỉ mục với các múi giờ hỗn hợp. Nếu tệp CSV của bạn chứa các cột có nhiều múi giờ khác nhau, thì kết quả mặc định sẽ là cột kiểu đối tượng có chuỗi, ngay cả với

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0569

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
6

Để phân tích cú pháp các giá trị múi giờ hỗn hợp dưới dạng cột ngày giờ, hãy chuyển một

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0538 được áp dụng một phần với
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0594 là
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0570

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7

Suy ra định dạng ngày giờ #

If you have

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0569 enabled for some or all of your columns, and your datetime strings are all formatted the same way, you may get a large speed up by setting
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0588. If set, pandas will attempt to guess the format of your datetime strings, and then use a faster means of parsing the strings. 5-10x parsing speeds have been observed. pandas will fallback to the usual parsing if either the format cannot be guessed or the format that was guessed cannot properly parse the entire column of strings. So in general,
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0598 should not have any negative consequences if enabled

Here are some examples of datetime strings that can be guessed (All representing December 30th, 2011 at 00. 00. 00)

  • “20111230”

  • “2011/12/30”

  • “20111230 00. 00. 00”

  • “12/30/2011 00. 00. 00”

  • “30/Dec/2011 00. 00. 00”

  • “30/December/2011 00. 00. 00”

Note that

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0598 is sensitive to
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2400. With
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2401, it will guess “01/12/2011” to be December 1st. With
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2402 (default) it will guess “01/12/2011” to be January 12th

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
8

International date formats#

While US date formats tend to be MM/DD/YYYY, many international formats use DD/MM/YYYY instead. For convenience, a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2400 keyword is provided

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
9

Writing CSVs to binary file objects#

New in version 1. 2. 0

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2404 allows writing a CSV to a file object opened binary mode. In most cases, it is not necessary to specify
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2405 as Pandas will auto-detect whether the file object is opened in text or binary mode

pip install openpyxl
120

Specifying method for floating-point conversion#

The parameter

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2406 can be specified in order to use a specific floating-point converter during parsing with the C engine. The options are the ordinary converter, the high-precision converter, and the round-trip converter (which is guaranteed to round-trip values after writing to a file). For example

pip install openpyxl
121

Thousand separators#

For large numbers that have been written with a thousands separator, you can set the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2407 keyword to a string of length 1 so that integers will be parsed correctly

By default, numbers with a thousands separator will be parsed as strings

pip install openpyxl
122

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2407 keyword allows integers to be parsed correctly

pip install openpyxl
123

NA values#

To control which values are parsed as missing values (which are signified by

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46), specify a string in
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
73. If you specify a list of strings, then all values in it are considered to be missing values. If you specify a number (a
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2411, like
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2412 or an
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2413 like
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2414), the corresponding equivalent values will also imply a missing value (in this case effectively
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2415 are recognized as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46)

To completely override the default values that are recognized as missing, specify

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2417

The default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46 recognized values are
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2419

Let us consider some examples

pip install openpyxl
124

In the example above

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2414 and
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2412 will be recognized as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46, in addition to the defaults. A string will first be interpreted as a numerical
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2414, then as a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

pip install openpyxl
125

Above, only an empty field will be recognized as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

pip install openpyxl
126

Above, both

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2426 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
84 as strings are
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

pip install openpyxl
127

The default values, in addition to the string

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2429 are recognized as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

Infinity#

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2431 like values will be parsed as
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2432 (positive infinity), and
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2433 as
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2434 (negative infinity). These will ignore the case of the value, meaning
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2435, will also be parsed as
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2432

Returning Series#

Using the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2437 keyword, the parser will return output with a single column as a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62

Deprecated since version 1. 4. 0. Users should append

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
63 to the DataFrame returned by
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 instead.

pip install openpyxl
128

Boolean values#

The common values

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61,
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2443, and
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2444 are all recognized as boolean. Occasionally you might want to recognize other values as being boolean. To do this, use the
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2445 and
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2446 options as follows

pip install openpyxl
129

Handling “bad” lines#

Some files may have malformed lines with too few fields or too many. Lines with too few fields will have NA values filled in the trailing fields. Lines with too many fields will raise an error by default

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
050

You can elect to skip bad lines

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
051

Or pass a callable function to handle the bad line if

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2447. The bad line will be a list of strings that was split by the
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2448

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
052

You can also use the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 parameter to eliminate extraneous column data that appear in some lines but not others

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
053

In case you want to keep all data including the lines with too many fields, you can specify a sufficient number of

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
49. This ensures that lines with not enough fields are filled with
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
054

Dialect#

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2452 keyword gives greater flexibility in specifying the file format. By default it uses the Excel dialect but you can specify either the dialect name or a
pip install openpyxl
1290 instance

Suppose you had data with unenclosed quotes

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
055

By default,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 uses the Excel dialect and treats the double quote as the quote character, which causes it to fail when it finds a newline before it finds the closing double quote

We can get around this using

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2452

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
056

All of the dialect options can be specified separately by keyword arguments

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
057

Another common dialect option is

pip install openpyxl
1295, to skip any whitespace after a delimiter

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
058

The parsers make every attempt to “do the right thing” and not be fragile. Type inference is a pretty big deal. If a column can be coerced to integer dtype without altering the contents, the parser will do so. Any non-numeric columns will come through as object dtype as with the rest of pandas objects

Quoting and Escape Characters#

Quotes (and other escape characters) in embedded fields can be handled in any number of ways. One way is to use backslashes; to properly parse this data, you should pass the

pip install openpyxl
1294 option

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
059

Files with fixed width columns#

While

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
13 reads delimited data, the
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2459 function works with data files that have known and fixed column widths. The function parameters to
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2460 are largely the same as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 with two extra parameters, and a different usage of the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
33 parameter

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    2463. A list of pairs (tuples) giving the extents of the fixed-width fields of each line as half-open intervals (i. e. , [from, to[ ). String value ‘infer’ can be used to instruct the parser to try detecting the column specifications from the first 100 rows of the data. Default behavior, if not specified, is to infer

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    2464. A list of field widths which can be used instead of ‘colspecs’ if the intervals are contiguous

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    33. Characters to consider as filler characters in the fixed-width file. Can be used to specify the filler character of the fields if it is not spaces (e. g. , ‘~’)

Consider a typical fixed-width data file

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
240

In order to parse this file into a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43, we simply need to supply the column specifications to the
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2460 function along with the file name

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
241

Note how the parser automatically picks column names X. when

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
36 argument is specified. Alternatively, you can supply just the column widths for contiguous columns:

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
242

The parser will take care of extra white spaces around the columns so it’s ok to have extra separation between the columns in the file

By default,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2460 will try to infer the file’s
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2463 by using the first 100 rows of the file. It can do it only in cases when the columns are aligned and correctly separated by the provided
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
33 (default delimiter is whitespace)

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
243

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2460 supports the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88 parameter for specifying the types of parsed columns to be different from the inferred type

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
244

Indexes#

Files with an “implicit” index column#

Consider a file with one less entry in the header than the number of data column

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
245

In this special case,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 assumes that the first column is to be used as the index of the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
246

Note that the dates weren’t automatically parsed. In that case you would need to do as before

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
247

Reading an index with a
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2476#

Suppose you have data indexed by two columns

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
248

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0564 argument to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 can take a list of column numbers to turn multiple columns into a
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2476 for the index of the returned object

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
249

Reading columns with a
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2476#

By specifying list of row locations for the

pip install openpyxl
1284 argument, you can read in a
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2476 for the columns. Specifying non-consecutive rows will skip the intervening rows

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
330

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 is also able to interpret a more common format of multi-columns indices

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
331

Ghi chú

If an

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0564 is not specified (e. g. you don’t have an index, or wrote it with
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2485, then any
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
49 on the columns index will be lost

Automatically “sniffing” the delimiter#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 is capable of inferring delimited (not necessarily comma-separated) files, as pandas uses the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
25 class of the csv module. For this, you have to specify
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2489

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
332

Reading multiple files to create a single DataFrame#

It’s best to use

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2490 to combine multiple files. See the cookbook for an example.

Iterating through files chunk by chunk#

Suppose you wish to iterate through a (potentially very large) file lazily rather than reading the entire file into memory, such as the following

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
333

By specifying a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90 to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66, the return value will be an iterable object of type
pip install openpyxl
1232

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
334

Changed in version 1. 2.

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2494 return a context-manager when iterating through a file.

Specifying

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2495 will also return the
pip install openpyxl
1232 object

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
335

Specifying the parser engine#

Pandas currently supports three engines, the C engine, the python engine, and an experimental pyarrow engine (requires the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2497 package). In general, the pyarrow engine is fastest on larger workloads and is equivalent in speed to the C engine on most other workloads. The python engine tends to be slower than the pyarrow and C engines on most workloads. However, the pyarrow engine is much less robust than the C engine, which lacks a few features compared to the Python engine

Where possible, pandas uses the C parser (specified as

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2498), but it may fall back to Python if C-unsupported options are specified

Currently, options unsupported by the C and pyarrow engines include

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    2448 other than a single character (e. g. regex separators)

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3300

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    2489 with
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3302

Specifying any of the above options will produce a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3303 unless the python engine is selected explicitly using
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3304

Options that are unsupported by the pyarrow engine which are not covered by the list above include

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    2406

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    90

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    0552

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3308

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    2407

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3310

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    2452

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3312

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3313

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    0503

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3315

  • pip install openpyxl
    1276

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3317

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    0511

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3319

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    91

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    2400

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    0598

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3323

  • pip install openpyxl
    1295

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3325

Specifying these options with

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3326 will raise a
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3327

Reading/writing remote files#

You can pass in a URL to read or write remote files to many of pandas’ IO functions - the following example shows reading a CSV file

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
336

Mới trong phiên bản 1. 3. 0

A custom header can be sent alongside HTTP(s) requests by passing a dictionary of header key value mappings to the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3328 keyword argument as shown below

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
337

All URLs which are not local files or HTTP(s) are handled by fsspec, if installed, and its various filesystem implementations (including Amazon S3, Google Cloud, SSH, FTP, webHDFS…). Some of these implementations will require additional packages to be installed, for example S3 URLs require the s3fs library

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
338

When dealing with remote storage systems, you might need extra configuration with environment variables or config files in special locations. For example, to access data in your S3 bucket, you will need to define credentials in one of the several ways listed in the S3Fs documentation. The same is true for several of the storage backends, and you should follow the links at fsimpl1 for implementations built into

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3329 and fsimpl2 for those not included in the main
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3329 distribution

You can also pass parameters directly to the backend driver. For example, if you do not have S3 credentials, you can still access public data by specifying an anonymous connection, such as

New in version 1. 2. 0

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
339

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3329 also allows complex URLs, for accessing data in compressed archives, local caching of files, and more. To locally cache the above example, you would modify the call to

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
350

where we specify that the “anon” parameter is meant for the “s3” part of the implementation, not to the caching implementation. Note that this caches to a temporary directory for the duration of the session only, but you can also specify a permanent store

Writing out data#

Writing to CSV format#

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 objects have an instance method
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3334 which allows storing the contents of the object as a comma-separated-values file. The function takes a number of arguments. Only the first is required

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3335. A string path to the file to write or a file object. If a file object it must be opened with
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3336

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    2448 . Field delimiter for the output file (default “,”)

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3338. A string representation of a missing value (default ‘’)

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3339. Format string for floating point numbers

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3340. Columns to write (default None)

  • pip install openpyxl
    1284. Whether to write out the column names (default True)

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3342. whether to write row (index) names (default True)

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3343. Column label(s) for index column(s) if desired. Nếu Không có (mặc định) và
    pip install openpyxl
    1284 và
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3342 là Đúng, thì tên chỉ mục được sử dụng. (A sequence should be given if the
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    43 uses MultiIndex)

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    2405 . Python write mode, default ‘w’

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    0562. a string representing the encoding to use if the contents are non-ASCII, for Python versions prior to 3

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3317. Character sequence denoting line end (default
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3350)

  • pip install openpyxl
    1276. Set quoting rules as in csv module (default csv. QUOTE_MINIMAL). Note that if you have set a
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3339 then floats are converted to strings and csv. QUOTE_NONNUMERIC will treat them as non-numeric

  • pip install openpyxl
    1275. Character used to quote fields (default ‘”’)

  • pip install openpyxl
    1293. Control quoting of
    pip install openpyxl
    1275 in fields (default True)

  • pip install openpyxl
    1294. Character used to escape
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    2448 and
    pip install openpyxl
    1275 when appropriate (default None)

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    90. Number of rows to write at a time

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3360. Format string for datetime objects

Writing a formatted string#

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 object has an instance method
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3362 which allows control over the string representation of the object. All arguments are optional

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3363 default None, for example a StringIO object

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3340 default None, which columns to write

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3365 default None, minimum width of each column

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3338 default
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    46, representation of NA value

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3368 default None, a dictionary (by column) of functions each of which takes a single argument and returns a formatted string

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3339 default None, a function which takes a single (float) argument and returns a formatted string; to be applied to floats in the
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    43

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3371 default True, set to False for a
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    43 with a hierarchical index to print every MultiIndex key at each row

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3373 default True, will print the names of the indices

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3342 default True, will print the index (ie, row labels)

  • pip install openpyxl
    1284 default True, will print the column labels

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3376 default
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3377, will print column headers left- or right-justified

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62 object also has a
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3362 method, but with only the
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3363,
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3338,
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3339 arguments. Ngoài ra còn có một đối số
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3383, nếu được đặt thành
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32, sẽ xuất thêm độ dài của Sê-ri

JSON#

Read and write

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3385 format files and strings

Writing JSON#

A

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 can be converted to a valid JSON string. Use
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3388 with optional parameters

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3335 . the pathname or buffer to write the output This can be
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    24 in which case a JSON string is returned

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3391

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    62
    • default is

      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3342

    • allowed values are {

      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3394,
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3395,
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3342}

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    43
    • default is

      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3340

    • allowed values are {

      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3394,
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3395,
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3342,
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3340,
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3503,
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3504}

    The format of the JSON string

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3394

    dict like {index -> [index], columns -> [columns], data -> [values]}

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3395

    list like [{column -> value}, … , {column -> value}]

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3342

    dict like {index -> {column -> value}}

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3340

    dict like {column -> {index -> value}}

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3503

    just the values array

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3504

    adhering to the JSON Table Schema

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3360 . string, type of date conversion, ‘epoch’ for timestamp, ‘iso’ for ISO8601

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3512 . The number of decimal places to use when encoding floating point values, default 10

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3513 . force encoded string to be ASCII, default True

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3514 . The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’ or ‘ns’ for seconds, milliseconds, microseconds and nanoseconds respectively. Default ‘ms’

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3515 . The handler to call if an object cannot otherwise be converted to a suitable format for JSON. Takes a single argument, which is the object to convert, and returns a serializable object

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3516 . If
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3395 orient, then will write each record per line as json

Note

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
46’s,
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3519’s and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24 will be converted to
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3521 and
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0571 objects will be converted based on the
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3360 and
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3514 parameters

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
351

Orient options#

There are a number of different options for the format of the resulting JSON file / string. Consider the following

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
352

Column oriented (the default for

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43) serializes the data as nested JSON objects with column labels acting as the primary index

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
353

Index oriented (the default for

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62) similar to column oriented but the index labels are now primary

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
354

Record oriented serializes the data to a JSON array of column -> value records, index labels are not included. This is useful for passing

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 data to plotting libraries, for example the JavaScript library
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3530

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
355

Value oriented is a bare-bones option which serializes to nested JSON arrays of values only, column and index labels are not included

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
356

Split oriented serializes to a JSON object containing separate entries for values, index and columns. Name is also included for

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
357

Table oriented serializes to the JSON Table Schema, allowing for the preservation of metadata including but not limited to dtypes and index names

Ghi chú

Any orient option that encodes to a JSON object will not preserve the ordering of index and column labels during round-trip serialization. If you wish to preserve label ordering use the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3394 option as it uses ordered containers

Date handling#

Writing in ISO date format

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
358

Writing in ISO date format, with microseconds

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
359

Epoch timestamps, in seconds

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
410

Writing to a file, with a date index and a date column

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
411

Fallback behavior#

If the JSON serializer cannot handle the container contents directly it will fall back in the following manner

  • if the dtype is unsupported (e. g.

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3533) then the
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3515, if provided, will be called for each value, otherwise an exception is raised

  • if an object is unsupported it will attempt the following

    • check if the object has defined a

      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3535 method and call it. A
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3535 method should return a
      pip install openpyxl
      1243 which will then be JSON serialized

    • invoke the

      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3515 if one was provided

    • convert the object to a

      pip install openpyxl
      1243 by traversing its contents. However this will often fail with an
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3540 or give unexpected results

In general the best approach for unsupported objects or dtypes is to provide a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3515. For example

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
412

can be dealt with by specifying a simple

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3515

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
413

Reading JSON#

Reading a JSON string to pandas object can take a number of parameters. The parser will try to parse a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 if
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3544 is not supplied or is
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24. To explicitly force
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62 parsing, pass
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3547

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    92 . a VALID JSON string or file handle / StringIO. The string could be a URL. Valid URL schemes include http, ftp, S3, and file. For file URLs, a host is expected. For instance, a local file could be file . //localhost/path/to/table. json

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3544 . type of object to recover (series or frame), default ‘frame’

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3391

    Series
    • default is

      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3342

    • allowed values are {

      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3394,
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3395,
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3342}

    DataFrame
    • default is

      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3340

    • allowed values are {

      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3394,
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3395,
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3342,
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3340,
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3503,
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3504}

    The format of the JSON string

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3394

    dict like {index -> [index], columns -> [columns], data -> [values]}

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3395

    list like [{column -> value}, … , {column -> value}]

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3342

    dict like {index -> {column -> value}}

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3340

    dict like {column -> {index -> value}}

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3503

    just the values array

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3504

    adhering to the JSON Table Schema

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    88 . if True, infer dtypes, if a dict of column to dtype, then use those, if
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    61, then don’t infer dtypes at all, default is True, apply only to the data

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3570 . boolean, try to convert the axes to the proper dtypes, default is
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    32

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3572. một danh sách các cột để phân tích ngày tháng;

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3575 . boolean, default
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    32. If parsing dates, then parse the default date-like columns

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3577 . direct decoding to NumPy arrays. default is
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    61; Supports numeric data only, although labels may be non-numeric. Also note that the JSON ordering MUST be the same for each term if
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3579

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3580 . boolean, default
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    61. Set to enable usage of higher precision (strtod) function when decoding string to double values. Default (
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    61) is to use fast but less precise builtin functionality

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3514 . string, the timestamp unit to detect if converting dates. Default None. By default the timestamp precision will be detected, if this is not desired then pass one of ‘s’, ‘ms’, ‘us’ or ‘ns’ to force timestamp precision to seconds, milliseconds, microseconds or nanoseconds respectively

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3516 . reads file as one json object per line

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    0562 . The encoding to use to decode py3 bytes

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    90 . when used in combination with
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3587, return a JsonReader which reads in
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    90 lines per iteration

The parser will raise one of

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3589 if the JSON is not parseable

If a non-default

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3391 was used when encoding to JSON be sure to pass the same option here so that decoding produces sensible results, see Orient Options for an overview

Data conversion#

The default of

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3591,
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3592, and
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3593 will try to parse the axes, and all of the data into appropriate types, including dates. If you need to override specific dtypes, pass a dict to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88.
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3570 should only be set to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61 if you need to preserve string-like numbers (e. g. ‘1’, ‘2’) in an axes

Ghi chú

Large integer values may be converted to dates if

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3593 and the data and / or column labels appear ‘date-like’. The exact threshold depends on the
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3514 specified. ‘date-like’ means that the column label meets one of the following criteria

  • it ends with

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3599

  • it ends with

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4100

  • it begins with

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4101

  • it is

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4102

  • it is

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4103

Cảnh báo

When reading JSON data, automatic coercing into dtypes has some quirks

  • an index can be reconstructed in a different order from serialization, that is, the returned order is not guaranteed to be the same as before serialization

  • a column that was

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    2411 data will be converted to
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    2413 if it can be done safely, e. g. a column of
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4106

  • bool columns will be converted to

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    2413 on reconstruction

Thus there are times where you may want to specify specific dtypes via the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88 keyword argument

Reading from a JSON string

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
414

Reading from a file

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
415

Don’t convert any data (but still convert axes and dates)

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
416

Specify dtypes for conversion

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
417

Preserve string indices

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
418

Dates written in nanoseconds need to be read back in nanoseconds

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
419

The Numpy parameter#

Ghi chú

This param has been deprecated as of version 1. 0. 0 and will raise a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4109

This supports numeric data only. Index and columns labels may be non-numeric, e. g. strings, dates etc

If

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3579 is passed to
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4111 an attempt will be made to sniff an appropriate dtype during deserialization and to subsequently decode directly to NumPy arrays, bypassing the need for intermediate Python objects

This can provide speedups if you are deserialising a large amount of numeric data

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
700

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
701

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
702

The speedup is less noticeable for smaller datasets

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
703

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
704

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
705

Cảnh báo

Direct NumPy decoding makes a number of assumptions and may fail or produce unexpected output if these assumptions are not satisfied

  • data is numeric

  • data is uniform. The dtype is sniffed from the first value decoded. A

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3327 may be raised, or incorrect output may be produced if this condition is not satisfied

  • labels are ordered. Labels are only read from the first container, it is assumed that each subsequent row / column has been encoded in the same order. This should be satisfied if the data was encoded using

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3388 but may not be the case if the JSON is from another source

Normalization#

pandas provides a utility function to take a dict or list of dicts and normalize this semi-structured data into a flat table

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
706

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
707

The max_level parameter provides more control over which level to end normalization. With max_level=1 the following snippet normalizes until 1st nesting level of the provided dict

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
708

Line delimited json#

pandas is able to read and write line-delimited json files that are common in data processing pipelines using Hadoop or Spark

For line-delimited json files, pandas can also return an iterator which reads in

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90 lines at a time. This can be useful for large files or to read from a stream

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
709

Table schema#

Table Schema is a spec for describing tabular datasets as a JSON object. The JSON includes information on the field names, types, and other attributes. You can use the orient

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3504 to build a JSON string with two fields,
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4116 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
56

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
710

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4116 field contains the
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4119 key, which itself contains a list of column name to type pairs, including the
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4120 or
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2476 (see below for a list of types). The
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4116 field also contains a
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4123 field if the (Multi)index is unique

The second field,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
56, contains the serialized data with the
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3395 orient. The index is included, and any datetimes are ISO 8601 formatted, as required by the Table Schema spec

The full list of types supported are described in the Table Schema spec. This table shows the mapping from pandas types

pandas type

Table Schema type

int64

integer

float64

number

bool

boolean

datetime64[ns]

datetime

timedelta64[ns]

duration

categorical

any

object

str

A few notes on the generated table schema

  • The

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4116 object contains a
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4127 field. This contains the version of pandas’ dialect of the schema, and will be incremented with each revision

  • All dates are converted to UTC when serializing. Even timezone naive values, which are treated as UTC with an offset of 0

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    711

  • datetimes with a timezone (before serializing), include an additional field

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4128 with the time zone name (e. g.
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4129)

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    712

  • Periods are converted to timestamps before serialization, and so have the same behavior of being converted to UTC. In addition, periods will contain and additional field

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4130 with the period’s frequency, e. g.
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4131

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    713

  • Categoricals use the

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4132 type and an
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4133 constraint listing the set of possible values. Additionally, an
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4134 field is included

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    714

  • A

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4123 field, containing an array of labels, is included if the index is unique

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    715

  • The

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4123 behavior is the same with MultiIndexes, but in this case the
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4123 is an array

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    716

  • The default naming roughly follows these rules

    • For series, the

      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      4138 is used. If that’s none, then the name is
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3503

    • For

      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      4140, the stringified version of the column name is used

    • For

      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      4120 (not
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      2476),
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      4143 is used, with a fallback to
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      3342 if that is None

    • For

      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      2476,
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      4146 is used. If any level has no name, then
      import openpyxl
      worksheet = openpyxl.load_workbook("codespeedy.xlsx")
      sheet = worksheet.active
      sheet.column_dimensions['A'].width = 20
      worksheet.save("codespeedy1.xlsx")
      4147 is used

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4111 also accepts
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4149 as an argument. This allows for the preservation of metadata such as dtypes and index names in a round-trippable manner

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
717

Please note that the literal string ‘index’ as the name of an

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4120 is not round-trippable, nor are any names beginning with
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4151 within a
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2476. These are used by default in
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4153 to indicate missing values and the subsequent read cannot distinguish the intent

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
718

When using

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4149 along with user-defined
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4155, the generated schema will contain an additional
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4156 key in the respective
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4119 element. This extra key is not standard but does enable JSON roundtrips for extension types (e. g.
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4158)

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4156 key carries the name of the extension, if you have properly registered the
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4160, pandas will use said name to perform a lookup into the registry and re-convert the serialized data into your custom dtype

HTML#

Reading HTML content#

Cảnh báo

We highly encourage you to read the HTML Table Parsing gotchas below regarding the issues surrounding the BeautifulSoup4/html5lib/lxml parsers.

The top-level

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4161 function can accept an HTML string/file/URL and will parse HTML tables into list of pandas
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4140. Let’s look at a few examples

Ghi chú

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4163 returns a
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4164 of
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 objects, even if there is only a single table contained in the HTML content

Read a URL with no options

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
719

Ghi chú

The data from the above URL changes every Monday so the resulting data above may be slightly different

Read in the content of the file from the above URL and pass it to

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4163 as a string

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
720

You can even pass in an instance of

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
11 if you so desire

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
721

Ghi chú

The following examples are not run by the IPython evaluator due to the fact that having so many network-accessing functions slows down the documentation build. If you spot an error or an example that doesn’t run, please do not hesitate to report it over on pandas GitHub issues page

Read a URL and match a table that contains specific text

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
722

Specify a header row (by default

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4168 or
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4169 elements located within a
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4170 are used to form the column index, if multiple rows are contained within
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4170 then a MultiIndex is created); if specified, the header row is taken from the data minus the parsed header elements (
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4168 elements)

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
723

Specify an index column

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
724

Specify a number of rows to skip

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
725

Specify a number of rows to skip using a list (

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4173 works as well)

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
726

Specify an HTML attribute

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
727

Specify values that should be converted to NaN

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
728

Specify whether to keep the default set of NaN values

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
729

Specify converters for columns. This is useful for numerical text data that has leading zeros. By default columns that are numerical are cast to numeric types and the leading zeros are lost. To avoid this, we can convert these columns to strings

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
730

Use some combination of the above

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
731

Read in pandas

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4174 output (with some loss of floating point precision)

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
732

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4175 backend will raise an error on a failed parse if that is the only parser you provide. If you only have a single parser you can provide just a string, but it is considered good practice to pass a list with one string if, for example, the function expects a sequence of strings. You may use

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
733

Or you could pass

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4176 without a list

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
734

However, if you have bs4 and html5lib installed and pass

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24 or
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4178 then the parse will most likely succeed. Note that as soon as a parse succeeds, the function will return

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
735

Links can be extracted from cells along with the text using

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4179

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
736

New in version 1. 5. 0

Writing to HTML files#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 objects have an instance method
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4174 which renders the contents of the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 as an HTML table. Các đối số của hàm như trong phương thức
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3362 được mô tả ở trên

Ghi chú

Không phải tất cả các tùy chọn có thể có cho

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4184 đều được hiển thị ở đây vì lý do ngắn gọn. See
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4185 for the full set of options

Ghi chú

In an HTML-rendering supported environment like a Jupyter Notebook,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4186 will render the raw HTML into the environment

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
737

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3340 argument will limit the columns shown

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
738

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3339 takes a Python callable to control the precision of floating point values

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
739

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4189 will make the row labels bold by default, but you can turn that off

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
740

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4190 argument provides the ability to give the resulting HTML table CSS classes. Note that these classes are appended to the existing
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4191 class

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
741

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4192 argument provides the ability to add hyperlinks to cells that contain URLs

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
742

Finally, the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4193 argument allows you to control whether the “<”, “>” and “&” characters escaped in the resulting HTML (by default it is
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32). So to get the HTML without escaped characters pass
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4195

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
743

Escaped

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
744

Not escaped

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
745

Ghi chú

Some browsers may not show a difference in the rendering of the previous two HTML tables

HTML Table Parsing Gotchas#

There are some versioning issues surrounding the libraries that are used to parse HTML tables in the top-level pandas io function

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4163

Issues with lxml

  • Benefits

    • lxml is very fast

    • lxml requires Cython to install correctly

  • Drawbacks

    • lxml does not make any guarantees about the results of its parse unless it is given strictly valid markup

    • In light of the above, we have chosen to allow you, the user, to use the lxml backend, but this backend will use html5lib if lxml fails to parse

    • It is therefore highly recommended that you install both BeautifulSoup4 and html5lib, so that you will still get a valid result (provided everything else is valid) even if lxml fails

Issues with BeautifulSoup4 using lxml as a backend

  • The above issues hold here as well since BeautifulSoup4 is essentially just a wrapper around a parser backend

Issues with BeautifulSoup4 using html5lib as a backend

  • Benefits

    • html5lib is far more lenient than lxml and consequently deals with real-life markup in a much saner way rather than just, e. g. , dropping an element without notifying you

    • html5lib generates valid HTML5 markup from invalid markup automatically. This is extremely important for parsing HTML tables, since it guarantees a valid document. However, that does NOT mean that it is “correct”, since the process of fixing markup does not have a single definition

    • html5lib is pure Python and requires no additional build steps beyond its own installation

  • Drawbacks

    • The biggest drawback to using html5lib is that it is slow as molasses. However consider the fact that many tables on the web are not big enough for the parsing algorithm runtime to matter. It is more likely that the bottleneck will be in the process of reading the raw text from the URL over the web, i. e. , IO (input-output). For very large tables, this might not be true

LaTeX#

Mới trong phiên bản 1. 3. 0

Currently there are no methods to read from LaTeX, only output methods

Writing to LaTeX files#

Ghi chú

DataFrame and Styler objects currently have a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4197 method. We recommend using the Styler. to_latex() method over DataFrame. to_latex() due to the former’s greater flexibility with conditional styling, and the latter’s possible future deprecation.

Review the documentation for Styler. to_latex , which gives examples of conditional styling and explains the operation of its keyword arguments.

For simple application the following pattern is sufficient

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
746

To format values before output, chain the Styler. format method.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
747

XML#

Đọc XML#

Mới trong phiên bản 1. 3. 0

Hàm

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4198 cấp cao nhất có thể chấp nhận một chuỗi/tệp/URL XML và sẽ phân tích cú pháp các nút và thuộc tính thành một gấu trúc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43

Ghi chú

Since there is no standard XML structure where design types can vary in many ways,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7000 works best with flatter, shallow versions. If an XML document is deeply nested, use the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7001 feature to transform XML into a flatter version

Let’s look at a few examples

Read an XML string

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
748

Read a URL with no options

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
749

Read in the content of the “books. xml” file and pass it to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7000 as a string

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
750

Read in the content of the “books. xml” as instance of

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
11 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7004 and pass it to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7000

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
751

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
752

Even read XML from AWS S3 buckets such as NIH NCBI PMC Article Datasets providing Biomedical and Life Science Jorurnals

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
753

With lxml as default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7006, you access the full-featured XML library that extends Python’s ElementTree API. One powerful tool is ability to query nodes selectively or conditionally with more expressive XPath

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
754

Specify only elements or only attributes to parse

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
755

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
756

XML documents can have namespaces with prefixes and default namespaces without prefixes both of which are denoted with a special attribute

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7007. In order to parse by node under a namespace context,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7008 must reference a prefix

For example, below XML contains a namespace with prefix,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7009, and URI at
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7010. In order to parse
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7011 nodes,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7012 must be used

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
757

Similarly, an XML document can have a default namespace without prefix. Failing to assign a temporary prefix will return no nodes and raise a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3327. But assigning any temporary name to correct URI allows parsing by nodes

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
758

However, if XPath does not reference node names such as default,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7014, then
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7012 is not required

With lxml as parser, you can flatten nested XML documents with an XSLT script which also can be string/file/URL types. As background, XSLT is a special-purpose language written in a special XML file that can transform original XML documents into other XML, HTML, even text (CSV, JSON, etc. ) using an XSLT processor

For example, consider this somewhat nested structure of Chicago “L” Rides where station and rides elements encapsulate data in their own sections. With below XSLT,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4175 can transform original nested document into a flatter output (as shown below for demonstration) for easier parse into
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
759

For very large XML files that can range in hundreds of megabytes to gigabytes,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7018 supports parsing such sizeable files using lxml’s iterparse and etree’s iterparse which are memory-efficient methods to iterate through an XML tree and extract specific elements and attributes. without holding entire tree in memory

New in version 1. 5. 0

To use this feature, you must pass a physical XML file path into

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7000 and use the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7020 argument. Files should not be compressed or point to online sources but stored on local disk. Also,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7020 should be a dictionary where the key is the repeating nodes in document (which become the rows) and the value is a list of any element or attribute that is a descendant (i. e. , child, grandchild) of repeating node. Since XPath is not used in this method, descendants do not need to share same relationship with one another. Below shows example of reading in Wikipedia’s very large (12 GB+) latest article data dump

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
760

Writing XML#

Mới trong phiên bản 1. 3. 0

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 objects have an instance method
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7023 which renders the contents of the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 as an XML document

Ghi chú

This method does not support special properties of XML including DTD, CData, XSD schemas, processing instructions, comments, and others. Only namespaces at the root level is supported. However,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7001 allows design changes after initial output

Let’s look at a few examples

Write an XML without options

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
761

Write an XML with new root and row name

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
762

Write an attribute-centric XML

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
763

Write a mix of elements and attributes

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
764

Any

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4140 with hierarchical columns will be flattened for XML element names with levels delimited by underscores

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
765

Write an XML with default namespace

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
766

Write an XML with namespace prefix

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
767

Write an XML without declaration or pretty print

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
768

Write an XML and transform with stylesheet

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
769

XML Final Notes#

  • All XML documents adhere to W3C specifications. Both

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7027 and
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4175 parsers will fail to parse any markup document that is not well-formed or follows XML syntax rules. Do be aware HTML is not an XML document unless it follows XHTML specs. However, other popular markup types including KML, XAML, RSS, MusicML, MathML are compliant XML schemas

  • For above reason, if your application builds XML prior to pandas operations, use appropriate DOM libraries like

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7027 and
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4175 to build the necessary document and not by string concatenation or regex adjustments. Always remember XML is a special text file with markup rules

  • With very large XML files (several hundred MBs to GBs), XPath and XSLT can become memory-intensive operations. Be sure to have enough available RAM for reading and writing to large XML files (roughly about 5 times the size of text)

  • Because XSLT is a programming language, use it with caution since such scripts can pose a security risk in your environment and can run large or infinite recursive operations. Always test scripts on small fragments before full run

  • The etree parser supports all functionality of both

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7000 and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7023 except for complex XPath and any XSLT. Though limited in features,
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7027 is still a reliable and capable parser and tree builder. Its performance may trail
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4175 to a certain degree for larger files but relatively unnoticeable on small to medium size files

Excel files#

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7035 method can read Excel 2007+ (
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7036) files using the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7037 Python module. Excel 2003 (
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7038) files can be read using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7039. Binary Excel (
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7040) files can be read using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7041. The
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7042 instance method is used for saving a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 to Excel. Generally the semantics are similar to working with csv data. See the cookbook for some advanced strategies.

Cảnh báo

The xlwt package for writing old-style

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7038 excel files is no longer maintained. The xlrd package is now only for reading old-style
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7038 files

Before pandas 1. 3. 0, the default argument

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7046 to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7035 would result in using the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7039 engine in many cases, including new Excel 2007+ (
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7036) files. pandas will now default to using the openpyxl engine

It is strongly encouraged to install

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7037 to read Excel 2007+ (
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7036) files. Please do not report issues when using ``xlrd`` to read ``. xlsx`` files. This is no longer supported, switch to using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7037 instead

Cố gắng sử dụng công cụ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7053 sẽ tăng
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4109 trừ khi tùy chọn
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7055 được đặt thành
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7056. While this option is now deprecated and will also raise a
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4109, it can be globally set and the warning suppressed. Users are recommended to write
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7036 files using the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7037 engine instead

Reading Excel files#

In the most basic use-case,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7060 takes a path to an Excel file, and the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7061 indicating which sheet to parse

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
770

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7062 class#

To facilitate working with multiple sheets from the same file, the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7062 class can be used to wrap the file and can be passed into
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7060 There will be a performance benefit for reading multiple sheets as the file is read into memory only once

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
771

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7062 class can also be used as a context manager

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
772

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7066 property will generate a list of the sheet names in the file

The primary use-case for an

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7062 is parsing multiple sheets with different parameters

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
773

Note that if the same parsing parameters are used for all sheets, a list of sheet names can simply be passed to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7060 with no loss in performance

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
774

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7062 can also be called with a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7070 object as a parameter. This allows the user to control how the excel file is read. For example, sheets can be loaded on demand by calling
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7071 with
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7072

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
775

Specifying sheets#

Ghi chú

The second argument is

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7061, not to be confused with
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7074

Ghi chú

An ExcelFile’s attribute

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7066 provides access to a list of sheets

  • The arguments

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7061 allows specifying the sheet or sheets to read

  • The default value for

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7061 is 0, indicating to read the first sheet

  • Pass a string to refer to the name of a particular sheet in the workbook

  • Pass an integer to refer to the index of a sheet. Indices follow Python convention, beginning at 0

  • Pass a list of either strings or integers, to return a dictionary of specified sheets

  • Pass a

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    24 to return a dictionary of all available sheets

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
776

Using the sheet index

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
777

Using all default values

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
778

Using None to get all sheets

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
779

Using a list to get multiple sheets

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
780

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7060 can read more than one sheet, by setting
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7061 to either a list of sheet names, a list of sheet positions, or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24 to read all sheets. Sheets can be specified by sheet index or sheet name, using an integer or string, respectively

Reading a
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2476#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7060 can read a
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2476 index, by passing a list of columns to
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0564 and a
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2476 column by passing a list of rows to
pip install openpyxl
1284. If either the
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3342 or
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3340 have serialized level names those will be read in as well by specifying the rows/columns that make up the levels

For example, to read in a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2476 index without names

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
781

If the index has level names, they will parsed as well, using the same parameters

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
782

If the source file has both

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2476 index and columns, lists specifying each should be passed to
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0564 and
pip install openpyxl
1284

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
783

Các giá trị bị thiếu trong các cột được chỉ định trong

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0564 sẽ được điền chuyển tiếp để cho phép thực hiện vòng lặp với
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7095 cho
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7096. To avoid forward filling the missing values use
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7097 after reading the data instead of
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0564

Parsing specific columns#

It is often the case that users will insert columns to do temporary computations in Excel and you may not want to read in those columns.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7060 takes a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 keyword to allow you to specify a subset of columns to parse

Changed in version 1. 0. 0

Passing in an integer for

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 will no longer work. Please pass in a list of ints from 0 to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 inclusive instead

You can specify a comma-delimited set of Excel columns and ranges as a string

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
784

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 is a list of integers, then it is assumed to be the file column indices to be parsed

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
785

Element order is ignored, so

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
54 is the same as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
55

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 is a list of strings, it is assumed that each string corresponds to a column name provided either by the user in
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
49 or inferred from the document header row(s). Those strings define which columns will be parsed

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
786

Element order is ignored, so

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7108 is the same as
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7109

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 is callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
787

Parsing dates#

Datetime-like values are normally automatically converted to the appropriate dtype when reading the excel file. But if you have a column of strings that look like dates (but are not actually formatted as dates in excel), you can use the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0569 keyword to parse those strings to datetimes

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
788

Cell converters#

It is possible to transform the contents of Excel cells via the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0511 option. For instance, to convert a column to boolean

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
789

This options handles missing values and treats exceptions in the converters as missing data. Transformations are applied cell by cell rather than to the column as a whole, so the array dtype is not guaranteed. For instance, a column of integers with missing values cannot be transformed to an array with integer dtype, because NaN is strictly a float. You can manually mask missing data to recover integer dtype

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
790

Dtype specifications#

As an alternative to converters, the type for an entire column can be specified using the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88 keyword, which takes a dictionary mapping column names to types. To interpret data with no type inference, use the type
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
15 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
72

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
791

Writing Excel files#

Writing Excel files to disk#

To write a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 object to a sheet of an Excel file, you can use the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7095 instance method. The arguments are largely the same as
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3334 described above, the first argument being the name of the excel file, and the optional second argument the name of the sheet to which the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 should be written. For example

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
792

Files with a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7038 extension will be written using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7053 and those with a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7036 extension will be written using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7124 (if available) or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7037

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 will be written in a way that tries to mimic the REPL output. The
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3343 will be placed in the second row instead of the first. You can place it in the first row by setting the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7128 option in
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7042 to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
793

In order to write separate

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4140 to separate sheets in a single Excel file, one can pass an
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7132

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
794

Writing Excel files to memory#

pandas supports writing Excel files to buffer-like objects such as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
11 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7004 using
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7132

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
795

Ghi chú

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7136 is optional but recommended. Setting the engine determines the version of workbook produced. Đặt
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7137 sẽ tạo sổ làm việc định dạng Excel 2003 (xls). Sử dụng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7138 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7139 sẽ tạo sổ làm việc định dạng Excel 2007 (xlsx). Nếu bỏ qua, sổ làm việc có định dạng Excel 2007 sẽ được tạo

Công cụ soạn thảo Excel#

Không dùng nữa kể từ phiên bản 1. 2. 0. Vì gói xlwt không còn được duy trì, công cụ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7053 sẽ bị xóa khỏi phiên bản pandas trong tương lai. Đây là công cụ duy nhất trong gấu trúc hỗ trợ ghi vào tệp
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7038.

gấu trúc chọn một trình soạn thảo Excel thông qua hai phương pháp

  1. đối số từ khóa

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7136

  2. phần mở rộng tên tệp (thông qua mặc định được chỉ định trong tùy chọn cấu hình)

Theo mặc định, gấu trúc sử dụng XlsxWriter cho tệp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7036, openpyxl cho tệp
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7144 và xlwt cho tệp
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7038. Nếu bạn đã cài đặt nhiều công cụ, bạn có thể đặt công cụ mặc định thông qua đặt tùy chọn cấu hình
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7146 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7055. pandas will fall back on openpyxl for
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7036 files if Xlsxwriter is not available.

Để chỉ định bạn muốn sử dụng trình soạn thảo nào, bạn có thể chuyển đối số từ khóa công cụ tới

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7095 và tới
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7132. The built-in engines are

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7037. phiên bản 2. 4 hoặc cao hơn là bắt buộc

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7124

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7053

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
796

Phong cách và định dạng#

Giao diện của bảng tính Excel được tạo từ gấu trúc có thể được sửa đổi bằng cách sử dụng các tham số sau trên phương pháp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7095 của
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3339. Chuỗi định dạng cho số dấu phẩy động (mặc định là
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    24)

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7158. Một bộ gồm hai số nguyên đại diện cho hàng dưới cùng và cột ngoài cùng bên phải để đóng băng. Mỗi tham số này đều dựa trên một tham số, vì vậy (1, 1) sẽ đóng băng hàng đầu tiên và cột đầu tiên (mặc định là
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    24)

Using the Xlsxwriter engine provides many options for controlling the format of an Excel worksheet created with the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7095 method. Các ví dụ tuyệt vời có thể được tìm thấy trong tài liệu Xlsxwriter tại đây. https. //xlsxwriter. đọcthedocs. io/working_with_pandas. html

Bảng tính OpenDocument#

Mới trong phiên bản 0. 25

Phương pháp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7035 cũng có thể đọc bảng tính OpenDocument bằng mô-đun
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7162. Ngữ nghĩa và các tính năng để đọc bảng tính OpenDocument phù hợp với những gì có thể thực hiện được đối với các tệp Excel bằng cách sử dụng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7163

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
797

Ghi chú

Hiện tại pandas chỉ hỗ trợ đọc bảng tính OpenDocument. Viết không được thực hiện

Excel nhị phân (. tệp xlsb) #

Mới trong phiên bản 1. 0. 0

Phương pháp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7035 cũng có thể đọc các tệp Excel nhị phân bằng cách sử dụng mô-đun
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7041. Ngữ nghĩa và các tính năng để đọc tệp Excel nhị phân hầu hết khớp với những gì có thể thực hiện được đối với tệp Excel bằng cách sử dụng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7166.
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7041 không nhận ra các loại ngày giờ trong tệp và thay vào đó sẽ trả về số float

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
798

Ghi chú

Hiện tại pandas chỉ hỗ trợ đọc các tệp Excel nhị phân. Viết không được thực hiện

Bảng nhớ tạm #

Một cách thuận tiện để lấy dữ liệu là sử dụng phương thức

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7168, lấy nội dung của bộ đệm clipboard và chuyển chúng đến phương thức
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66. Chẳng hạn, bạn có thể sao chép văn bản sau vào khay nhớ tạm (CTRL-C trên nhiều hệ điều hành)

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
799

Và sau đó nhập dữ liệu trực tiếp vào

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 bằng cách gọi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
500

Phương thức

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7171 có thể được sử dụng để ghi nội dung của
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 vào khay nhớ tạm. Sau đó, bạn có thể dán nội dung khay nhớ tạm vào các ứng dụng khác (CTRL-V trên nhiều hệ điều hành). Ở đây chúng tôi minh họa việc viết một
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 vào khay nhớ tạm và đọc lại

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
501

Chúng tôi có thể thấy rằng chúng tôi đã lấy lại cùng một nội dung mà chúng tôi đã ghi vào khay nhớ tạm trước đó

Ghi chú

Bạn có thể cần cài đặt xclip hoặc xsel (với PyQt5, PyQt4 hoặc qtpy) trên Linux để sử dụng các phương pháp này

dưa muối#

Tất cả các đối tượng pandas đều được trang bị các phương thức

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7174 sử dụng mô-đun
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7175 của Python để lưu cấu trúc dữ liệu vào đĩa bằng định dạng dưa chua

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
502

Hàm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7176 trong không gian tên
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7177 có thể được sử dụng để tải bất kỳ đối tượng pickled pandas nào (hoặc bất kỳ đối tượng được ngâm nào khác) từ tệp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
503

Cảnh báo

Loading pickled data received from untrusted sources can be unsafe

Nhìn thấy. https. // tài liệu. con trăn. org/3/library/dưa chua. html

Cảnh báo

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7178 chỉ được đảm bảo tương thích ngược với pandas phiên bản 0. 20. 3

Tập tin dưa nén #

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7178,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7180 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7181 có thể đọc và ghi các tệp dưa nén. Các kiểu nén của ________ 11257, ________ 11258, ________ 67184, _______ 67185 được hỗ trợ để đọc và ghi. Định dạng tệp
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7186 chỉ hỗ trợ đọc và chỉ được chứa một tệp dữ liệu để đọc

Loại nén có thể là một tham số rõ ràng hoặc được suy ra từ phần mở rộng tệp. Nếu 'suy ra', thì lần lượt sử dụng

pip install openpyxl
1257,
pip install openpyxl
1258,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7186,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7184,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7185 nếu tên tệp kết thúc bằng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7192,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7193,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7194,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7195 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7196

Tham số nén cũng có thể là một

pip install openpyxl
1243 để chuyển các tùy chọn cho giao thức nén. Nó phải có khóa
pip install openpyxl
1247 được đặt thành tên của giao thức nén, phải là một trong {
pip install openpyxl
1239,
pip install openpyxl
1237,
pip install openpyxl
1238,
pip install openpyxl
1240,
pip install openpyxl
1241}. Tất cả các cặp khóa-giá trị khác được chuyển đến thư viện nén cơ bản

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
504

Using an explicit compression type

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
505

Suy ra kiểu nén từ tiện ích mở rộng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
506

Mặc định là 'suy luận'

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
507

Chuyển các tùy chọn cho giao thức nén để tăng tốc độ nén

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
508

msgpack#

hỗ trợ gấu trúc cho

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7204 đã bị xóa trong phiên bản 1. 0. 0. Bạn nên sử dụng dưa chua để thay thế.

Ngoài ra, bạn cũng có thể định dạng tuần tự hóa Arrow IPC để truyền trực tuyến các đối tượng gấu trúc. Để biết tài liệu về pyarrow, xem tại đây

HDF5 (PyTables)#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7205 là một đối tượng giống như dict đọc và ghi pandas bằng định dạng HDF5 hiệu suất cao bằng thư viện PyTables xuất sắc. Xem sách dạy nấu ăn để biết một số chiến lược nâng cao

Cảnh báo

gấu trúc sử dụng PyTables để đọc và ghi các tệp HDF5, cho phép tuần tự hóa dữ liệu kiểu đối tượng bằng dưa chua. Loading pickled data received from untrusted sources can be unsafe

Nhìn thấy. https. // tài liệu. con trăn. org/3/library/dưa chua. html để biết thêm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
509

Các đối tượng có thể được ghi vào tệp giống như thêm các cặp khóa-giá trị vào một lệnh

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
510

Trong phiên Python hiện tại hoặc mới hơn, bạn có thể truy xuất các đối tượng được lưu trữ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
511

Xóa đối tượng được chỉ định bởi khóa

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
512

Đóng Cửa hàng và sử dụng trình quản lý ngữ cảnh

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
513

Đọc/ghi API#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7205 hỗ trợ API cấp cao nhất sử dụng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7207 để đọc và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7208 để viết, tương tự như cách hoạt động của
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
66 và
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3334

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
514

HDFStore theo mặc định sẽ không loại bỏ các hàng bị thiếu. Hành vi này có thể được thay đổi bằng cách đặt

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7211

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
515

Định dạng cố định #

Các ví dụ trên cho thấy việc lưu trữ bằng cách sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7212, viết HDF5 thành
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7213 ở định dạng mảng cố định, được gọi là định dạng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7214. Các loại cửa hàng này không thể nối thêm sau khi được viết (mặc dù bạn có thể chỉ cần xóa chúng và viết lại). Chúng cũng không thể truy vấn được; . Họ cũng không hỗ trợ các khung dữ liệu có tên cột không phải là duy nhất. The
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7214 format stores offer very fast writing and slightly faster reading than
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3504 stores. Định dạng này được chỉ định theo mặc định khi sử dụng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7212 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7208 hoặc bởi
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7219 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7220

Cảnh báo

Định dạng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7214 sẽ tăng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7222 nếu bạn cố truy xuất bằng cách sử dụng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7223

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
516

Định dạng bảng #

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7205 hỗ trợ định dạng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7213 khác trên đĩa, định dạng
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3504. Về mặt khái niệm, một
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3504 có hình dạng rất giống một DataFrame, với các hàng và cột. Một
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3504 có thể được thêm vào trong cùng một phiên hoặc các phiên khác. Ngoài ra, các hoạt động loại truy vấn và xóa được hỗ trợ. Định dạng này được chỉ định bởi
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7229 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7230 đến
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7231 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7212 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7208

Định dạng này cũng có thể được đặt làm tùy chọn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7234 để cho phép
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7235 lưu trữ theo mặc định ở định dạng
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3504

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
517

Ghi chú

Bạn cũng có thể tạo một

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3504 bằng cách chuyển
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7229 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7230 cho một hoạt động
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7212

Hierarchical keys#

Keys to a store can be specified as a string. Chúng có thể ở định dạng giống như tên đường dẫn phân cấp (e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7241), sẽ tạo ra một hệ thống phân cấp các cửa hàng phụ (hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7242 theo cách nói của PyTables). Keys can be specified without the leading ‘/’ and are always absolute (e. g. ‘foo’ refers to ‘/foo’). Removal operations can remove everything in the sub-store and below, so be careful

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
518

You can walk through the group hierarchy using the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7243 method which will yield a tuple for each group key along with the relative keys of its contents

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
519

Cảnh báo

Hierarchical keys cannot be retrieved as dotted (attribute) access as described above for items stored under the root node

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
520

Instead, use explicit string based keys

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
521

Storing types#

Storing mixed types in a table#

Storing mixed-dtype data is supported. Các chuỗi được lưu trữ dưới dạng chiều rộng cố định bằng cách sử dụng kích thước tối đa của cột được nối thêm. Subsequent attempts at appending longer strings will raise a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3327

Chuyển

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7245 làm tham số để nối thêm sẽ đặt giá trị tối thiểu lớn hơn cho các cột chuỗi. Storing
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7246 are currently supported. For string columns, passing
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7247 to append will change the default nan representation on disk (which converts to/from
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7248), this defaults to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7249

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
522

Storing MultiIndex DataFrames#

Storing MultiIndex

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4140 as tables is very similar to storing/selecting from homogeneous index
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4140

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
523

Ghi chú

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3342 keyword is reserved and cannot be use as a level name

Querying#

Truy vấn một bảng#

Các hoạt động của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7253 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7254 có một tiêu chí tùy chọn có thể được chỉ định để chỉ chọn/xóa một tập hợp con của dữ liệu. Điều này cho phép một người có một bảng trên đĩa rất lớn và chỉ truy xuất một phần dữ liệu

A query is specified using the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7255 class under the hood, as a boolean expression

  • import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3342 and
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3340 are supported indexers of
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4140

  • if

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7259 are specified, these can be used as additional indexers

  • tên cấp độ trong MultiIndex, với tên mặc định là

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7260,
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7261, … nếu không được cung cấp

Các toán tử so sánh hợp lệ là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7262

Biểu thức boolean hợp lệ được kết hợp với

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7263. or

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7264. và

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7265 and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7266 . để nhóm

These rules are similar to how boolean expressions are used in pandas for indexing

Ghi chú

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7267 sẽ được tự động mở rộng thành toán tử so sánh
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7268

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7269 là toán tử not, nhưng chỉ có thể được sử dụng trong một số trường hợp rất hạn chế

  • Nếu một danh sách/bộ biểu thức được thông qua, chúng sẽ được kết hợp thông qua

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7264

The following are valid expressions

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7271

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7272

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7273

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7274

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7275

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7276

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7277

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7278

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7279

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7280

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7281 nằm ở vế trái của biểu thức con

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3340,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7283,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7284

Vế phải của biểu thức con (sau toán tử so sánh) có thể là

  • functions that will be evaluated, e. g.

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7285

  • strings, e. g.

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7286

  • giống như ngày, e. g.

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7287, or
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7288

  • lists, e. g.

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7289

  • các biến được xác định trong không gian tên cục bộ, e. g.

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7290

Ghi chú

Passing a string to a query by interpolating it into the query expression is not recommended. Chỉ cần gán chuỗi quan tâm cho một biến và sử dụng biến đó trong một biểu thức. For example, do this

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
524

instead of this

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
525

The latter will not work and will raise a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7291. Note that there’s a single quote followed by a double quote in the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7292 variable

Nếu bạn phải nội suy, hãy sử dụng công cụ xác định định dạng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7293

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
526

sẽ trích dẫn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7292

Here are some examples

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
527

Use boolean expressions, with in-line function evaluation

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
528

Sử dụng tham chiếu cột nội tuyến

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
529

Từ khóa

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3340 có thể được cung cấp để chọn danh sách các cột sẽ được trả về, điều này tương đương với việc chuyển một số
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7296

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
530

Các tham số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7297 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7298 có thể được chỉ định để giới hạn tổng không gian tìm kiếm. Đây là về tổng số hàng trong một bảng

Ghi chú

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7253 will raise a
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3327 if the query expression has an unknown variable reference. Usually this means that you are trying to select on a column that is not a data_column

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7253 sẽ tăng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7291 nếu biểu thức truy vấn không hợp lệ

Query timedelta64[ns]#

Bạn có thể lưu trữ và truy vấn bằng cách sử dụng loại

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7303. Điều khoản có thể được chỉ định trong định dạng.
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7304, where float may be signed (and fractional), and unit can be
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7305 for the timedelta. Đây là một ví dụ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
531

Truy vấn MultiIndex#

Selecting from a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2476 can be achieved by using the name of the level

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
532

If the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2476 levels names are
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24, the levels are automatically made available via the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7309 keyword with
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7310 the level of the
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2476 you want to select from

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
533

Lập chỉ mục #

You can create/modify an index for a table with

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7312 after data is already in the table (after and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7313 operation). Creating a table index is highly encouraged. This will speed your queries a great deal when you use a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7253 with the indexed dimension as the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7223

Ghi chú

Indexes are automagically created on the indexables and any data columns you specify. This behavior can be turned off by passing

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7316 to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7231

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
534

Oftentimes when appending large amounts of data to a store, it is useful to turn off index creation for each append, then recreate at the end

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
535

Then create the index when finished appending

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
536

See here for how to create a completely-sorted-index (CSI) on an existing store

Query via data columns#

You can designate (and index) certain columns that you want to be able to perform queries (other than the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7318 columns, which you can always query). For instance say you want to perform this common operation, on-disk, and return just the frame that matches this query. You can specify
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7319 to force all columns to be
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7259

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
537

There is some performance degradation by making lots of columns into

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7321, so it is up to the user to designate these. Ngoài ra, bạn không thể thay đổi các cột dữ liệu (cũng như không thể lập chỉ mục) sau thao tác thêm/đặt đầu tiên (Tất nhiên bạn có thể chỉ cần đọc dữ liệu và tạo một bảng mới. )

Iterator#

You can pass

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2495 or
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7323 to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7253 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7325 to return an iterator on the results. The default is 50,000 rows returned in a chunk

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
538

Ghi chú

You can also use the iterator with

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7207 which will open, then automatically close the store when finished iterating

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
539

Note, that the chunksize keyword applies to the source rows. So if you are doing a query, then the chunksize will subdivide the total rows in the table and the query applied, returning an iterator on potentially unequal sized chunks

Đây là một công thức để tạo một truy vấn và sử dụng nó để tạo các khối trả về có kích thước bằng nhau

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
540

Truy vấn nâng cao#

Chọn một cột #

Để truy xuất một cột dữ liệu hoặc có thể lập chỉ mục, hãy sử dụng phương thức

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7327. This will, for example, enable you to get the index very quickly. These return a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
62 of the result, indexed by the row number. Chúng hiện không chấp nhận bộ chọn
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7223

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
541

Đang chọn tọa độ#

Đôi khi bạn muốn lấy tọa độ (a. k. a the index locations) of your query. This returns an

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7330 of the resulting locations. Các tọa độ này cũng có thể được chuyển cho các hoạt động tiếp theo của
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7223

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
542

Chọn bằng cách sử dụng mặt nạ #

Đôi khi truy vấn của bạn có thể liên quan đến việc tạo danh sách các hàng để chọn. Thông thường,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7332 này sẽ là kết quả của
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3342 từ thao tác lập chỉ mục. Ví dụ này chọn các tháng của datetimeindex là 5

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
543

Đối tượng lưu trữ #

Nếu bạn muốn kiểm tra đối tượng được lưu trữ, hãy truy xuất qua

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7334. Bạn có thể sử dụng điều này theo lập trình để nói lấy số lượng hàng trong một đối tượng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
544

Nhiều truy vấn bảng#

The methods

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7335 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7325 can perform appending/selecting from multiple tables at once. Ý tưởng là có một bảng (gọi nó là bảng chọn) mà bạn lập chỉ mục cho hầu hết/tất cả các cột và thực hiện các truy vấn của mình. The other table(s) are data tables with an index matching the selector table’s index. Sau đó, bạn có thể thực hiện một truy vấn rất nhanh trên bảng bộ chọn nhưng vẫn nhận được nhiều dữ liệu. This method is similar to having a very wide table, but enables more efficient queries

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7335 method splits a given single DataFrame into multiple tables according to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7338, a dictionary that maps the table names to a list of ‘columns’ you want in that table. Nếu
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
24 được sử dụng thay cho danh sách, bảng đó sẽ có các cột không xác định còn lại của Khung dữ liệu đã cho. Đối số
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7340 xác định bảng nào là bảng chọn (bạn có thể thực hiện truy vấn từ đó). The argument
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7341 will drop rows from the input
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 to ensure tables are synchronized. This means that if a row for one of the tables being written to is entirely
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7343, that row will be dropped from all tables

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7341 is False, THE USER IS RESPONSIBLE FOR SYNCHRONIZING THE TABLES. Remember that entirely
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7345 rows are not written to the HDFStore, so if you choose to call
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7346, some tables may have more rows than others, and therefore
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7325 may not work or it may return unexpected results

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
545

Delete from a table#

Bạn có thể xóa khỏi bảng một cách có chọn lọc bằng cách chỉ định một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7223. In deleting rows, it is important to understand the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7213 deletes rows by erasing the rows, then moving the following data. Thus deleting can potentially be a very expensive operation depending on the orientation of your data. To get optimal performance, it’s worthwhile to have the dimension you are deleting be the first of the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7350

Data is ordered (on the disk) in terms of the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7350. Đây là một trường hợp sử dụng đơn giản. You store panel-type data, with dates in the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7283 and ids in the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7353. The data is then interleaved like this

  • date_1
    • id_1

    • id_2

    • .

    • id_n

  • ngày_2
    • id_1

    • .

    • id_n

Rõ ràng là thao tác xóa trên

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7283 sẽ khá nhanh, vì một đoạn được xóa, sau đó dữ liệu sau sẽ được di chuyển. Mặt khác, thao tác xóa trên
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7353 sẽ rất tốn kém. In this case it would almost certainly be faster to rewrite the table using a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7223 that selects all but the missing data

Cảnh báo

Xin lưu ý rằng HDF5 KHÔNG TỰ ĐỘNG ĐẶT LẠI KHÔNG GIAN trong các tệp h5. Do đó, liên tục xóa (hoặc loại bỏ các nút) và thêm lại, SẼ CÓ XU HƯỚNG TĂNG KÍCH THƯỚC TẬP TIN

Để đóng gói lại và xóa tệp, hãy sử dụng ptrepack .

Notes & caveats#

Nén#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7213 allows the stored data to be compressed. Điều này áp dụng cho tất cả các loại cửa hàng, không chỉ bàn. Hai tham số được sử dụng để kiểm soát nén.
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7358 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7359

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7358 chỉ định nếu và mức độ cứng của dữ liệu được nén.
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7361 and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7362 disables compression and
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7363 enables compression

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7359 chỉ định sử dụng thư viện nén nào. Nếu không có gì được chỉ định, thư viện mặc định
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7365 được sử dụng. Thư viện nén thường tối ưu hóa để có tốc độ hoặc tốc độ nén tốt và kết quả sẽ phụ thuộc vào loại dữ liệu. Lựa chọn kiểu nén nào tùy thuộc vào nhu cầu và dữ liệu cụ thể của bạn. Danh sách các thư viện nén được hỗ trợ

    • zlib. The default compression library. Cổ điển về mặt nén, đạt tốc độ nén tốt nhưng hơi chậm

    • lzo. Fast compression and decompression

    • bzip2. Tỷ lệ nén tốt

    • blosc. Fast compression and decompression

      Support for alternative blosc compressors

      • khối. blosclz This is the default compressor for

        In [13]: import numpy as np
        
        In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
        
        In [15]: print(data)
        a,b,c,d
        1,2,3,4
        5,6,7,8
        9,10,11
        
        In [16]: df = pd.read_csv(StringIO(data), dtype=object)
        
        In [17]: df
        Out[17]: 
           a   b   c    d
        0  1   2   3    4
        1  5   6   7    8
        2  9  10  11  NaN
        
        In [18]: df["a"][0]
        Out[18]: '1'
        
        In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
        
        In [20]: df.dtypes
        Out[20]: 
        a      int64
        b     object
        c    float64
        d      Int64
        dtype: object
        
        7366

      • khối. lz4. A compact, very popular and fast compressor

      • khối. lz4hc. Phiên bản tinh chỉnh của LZ4, tạo ra tỷ lệ nén tốt hơn với chi phí tốc độ

      • blosc. snappy. A popular compressor used in many places

      • blosc. zlib. A classic; somewhat slower than the previous ones, but achieving better compression ratios

      • blosc. zstd. An extremely well balanced codec; it provides the best compression ratios among the others above, and at reasonably fast speed

    Nếu

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7359 được định nghĩa là một cái gì đó khác với các thư viện được liệt kê, một ngoại lệ
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3327 sẽ được ban hành

Ghi chú

If the library specified with the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7359 option is missing on your platform, compression defaults to
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7365 without further ado

Kích hoạt tính năng nén cho tất cả các đối tượng trong tệp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
546

Hoặc tính năng nén nhanh (điều này chỉ áp dụng cho các bảng) trong các cửa hàng không bật tính năng nén

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
547

ptrepack#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7213 offers better write performance when tables are compressed after they are written, as opposed to turning on compression at the very beginning. Bạn có thể sử dụng tiện ích
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7213 được cung cấp
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7373. In addition,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7373 can change compression levels after the fact

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
548

Furthermore

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7375 will repack the file to allow you to reuse previously deleted space. Alternatively, one can simply remove the file and write again, or use the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7376 method

Hãy cẩn thận #

Cảnh báo

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7205 is not-threadsafe for writing. The underlying
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7213 only supports concurrent reads (via threading or processes). Nếu bạn cần đọc và ghi đồng thời, bạn cần tuần tự hóa các hoạt động này trong một chuỗi trong một quy trình duy nhất. Bạn sẽ làm hỏng dữ liệu của mình nếu không. See the (GH2397) for more information

  • Nếu bạn sử dụng khóa để quản lý quyền ghi giữa nhiều quy trình, bạn có thể muốn sử dụng

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7379 trước khi giải phóng khóa ghi. For convenience you can use
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7380 to do this for you

  • Khi một

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3504 được tạo, các cột (DataFrame) được cố định;

  • Hãy nhận biết rằng múi giờ (e. g. ,

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7382) không nhất thiết phải bằng nhau giữa các phiên bản múi giờ. So if data is localized to a specific timezone in the HDFStore using one version of a timezone library and that data is updated with another version, the data will be converted to UTC since these timezones are not considered equal. Sử dụng cùng một phiên bản thư viện múi giờ hoặc sử dụng
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7383 với định nghĩa múi giờ được cập nhật

Cảnh báo

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7213 sẽ hiển thị
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7385 nếu không thể sử dụng tên cột làm bộ chọn thuộc tính. Định danh tự nhiên chỉ chứa các chữ cái, số và dấu gạch dưới và không được bắt đầu bằng số. Other identifiers cannot be used in a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7223 clause and are generally a bad idea

Loại dữ liệu#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7205 sẽ ánh xạ một dtype đối tượng tới dtype bên dưới của
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7213. Điều này có nghĩa là các loại sau được biết là hoạt động

Loại

Represents missing values

floating .

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7389

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7248

số nguyên.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7391

boolean

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7392

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3519

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7303

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3519

categorical . see the section below

object .

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7396

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7248

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7398 columns are not supported, and WILL FAIL

Dữ liệu phân loại#

You can write data that contains

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7399 dtypes to a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7205. Queries work the same as if it was an object array. Tuy nhiên, dữ liệu dtyped
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7399 được lưu trữ theo cách hiệu quả hơn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
549

String columns#

min_itemsize

Việc triển khai cơ bản của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7205 sử dụng chiều rộng cột cố định (kích thước vật phẩm) cho các cột chuỗi. Kích thước cột chuỗi được tính bằng độ dài tối đa của dữ liệu (đối với cột đó) được chuyển đến
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7205, trong phần nối thêm đầu tiên. Các phần bổ sung tiếp theo, có thể giới thiệu một chuỗi cho một cột lớn hơn cột có thể chứa, một Ngoại lệ sẽ được đưa ra (nếu không, bạn có thể cắt ngắn các cột này, dẫn đến mất thông tin). In the future we may relax this and allow a user-specified truncation to occur

Vượt qua

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7404 trong lần tạo bảng đầu tiên để a-priori chỉ định độ dài tối thiểu của một cột chuỗi cụ thể.
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7404 can be an integer, or a dict mapping a column name to an integer. Bạn có thể chuyển
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3503 làm khóa để cho phép tất cả các mục có thể lập chỉ mục hoặc cột dữ liệu có kích thước tối thiểu này

Passing a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7404 dict will cause all passed columns to be created as data_columns automatically

Ghi chú

If you are not passing any

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7259, then the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7404 will be the maximum of the length of any string passed

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
550

nan_rep

Các cột chuỗi sẽ tuần tự hóa một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7248 (một giá trị bị thiếu) với biểu diễn chuỗi
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7411. Giá trị này mặc định là giá trị chuỗi
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7249. Bạn có thể vô tình biến một giá trị thực tế của
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7249 thành một giá trị bị thiếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
551

Khả năng tương thích bên ngoài#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7205 viết các đối tượng định dạng
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3504 ở các định dạng cụ thể phù hợp để tạo các chuyến khứ hồi không mất dữ liệu tới các đối tượng gấu trúc. Để tương thích với bên ngoài, ________ 67205 có thể đọc các bảng định dạng gốc của ________ 67213

Có thể viết một đối tượng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7205 để có thể dễ dàng nhập vào
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7419 bằng thư viện
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7420 (Trang web gói). Tạo một cửa hàng định dạng bảng như thế này

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
552

Trong R, tệp này có thể được đọc thành đối tượng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7421 bằng thư viện
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7420. Hàm ví dụ sau đọc tên cột và giá trị dữ liệu tương ứng từ các giá trị và tập hợp chúng thành một
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7421

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
553

Bây giờ bạn có thể nhập

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 vào R

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
554

Ghi chú

Hàm R liệt kê toàn bộ nội dung của tệp HDF5 và tập hợp đối tượng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7421 từ tất cả các nút phù hợp, vì vậy chỉ sử dụng hàm này làm điểm bắt đầu nếu bạn đã lưu trữ nhiều đối tượng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 vào một tệp HDF5

Performance#

  • Định dạng

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7427 đi kèm với hình phạt về hiệu suất viết so với các cửa hàng
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7214. Lợi ích là khả năng nối thêm/xóa và truy vấn (có thể là lượng dữ liệu rất lớn). Thời gian viết thường dài hơn so với các cửa hàng thông thường. Thời gian truy vấn có thể khá nhanh, đặc biệt là trên trục được lập chỉ mục

  • Bạn có thể chuyển

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7429 đến
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7231, chỉ định khối lượng ghi (mặc định là 50000). Điều này sẽ làm giảm đáng kể mức sử dụng bộ nhớ của bạn khi viết

  • Bạn có thể chuyển

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7431 cho
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7231 đầu tiên, để đặt TỔNG số hàng mà
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7213 sẽ mong đợi. Điều này sẽ tối ưu hóa hiệu suất đọc/ghi

  • Các hàng trùng lặp có thể được ghi vào bảng, nhưng được lọc ra trong vùng chọn (với các mục cuối cùng được chọn; do đó, một bảng là duy nhất trên các cặp chính, phụ)

  • Một

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7434 sẽ được nâng lên nếu bạn đang cố lưu trữ các loại sẽ được PyTables chọn (chứ không phải được lưu trữ dưới dạng các loại đặc hữu). Xem tại đây để biết thêm thông tin và một số giải pháp

Lông vũ#

Feather cung cấp tuần tự hóa cột nhị phân cho các khung dữ liệu. Nó được thiết kế để làm cho việc đọc và ghi các khung dữ liệu trở nên hiệu quả và giúp việc chia sẻ dữ liệu giữa các ngôn ngữ phân tích dữ liệu trở nên dễ dàng

Feather được thiết kế để tuần tự hóa và hủy tuần tự hóa DataFrames một cách trung thực, hỗ trợ tất cả các kiểu dữ liệu gấu trúc, bao gồm cả các kiểu mở rộng như phân loại và thời gian với tz

Một số lưu ý

  • The format will NOT write an

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4120, or
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    2476 for the
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    43 and will raise an error if a non-default one is provided. Bạn có thể
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7438 để lưu trữ chỉ mục hoặc
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7439 để bỏ qua nó

  • Tên cột trùng lặp và tên cột không phải chuỗi không được hỗ trợ

  • Các đối tượng Python thực tế trong các cột dtype đối tượng không được hỗ trợ. Những điều này sẽ đưa ra một thông báo lỗi hữu ích khi cố gắng tuần tự hóa

Xem tài liệu đầy đủ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
555

Ghi vào một tập tin lông vũ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
556

Đọc từ tệp lông vũ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
557

Sàn gỗ #

Apache Parquet cung cấp tuần tự hóa cột nhị phân được phân vùng cho các khung dữ liệu. Nó được thiết kế để làm cho việc đọc và ghi các khung dữ liệu trở nên hiệu quả và giúp việc chia sẻ dữ liệu giữa các ngôn ngữ phân tích dữ liệu trở nên dễ dàng. Sàn gỗ có thể sử dụng nhiều kỹ thuật nén khác nhau để thu nhỏ kích thước tệp càng nhiều càng tốt trong khi vẫn duy trì hiệu suất đọc tốt

Parquet được thiết kế để tuần tự hóa và hủy tuần tự hóa một cách trung thực

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 s, hỗ trợ tất cả các kiểu dữ liệu pandas, bao gồm các kiểu mở rộng như datetime với tz

Một số lưu ý

  • Tên cột trùng lặp và tên cột không phải chuỗi không được hỗ trợ

  • The

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    2497 engine always writes the index to the output, but
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7442 only writes non-default indexes. Cột bổ sung này có thể gây ra sự cố cho những người tiêu dùng không phải là pandas không mong đợi điều đó. Bạn có thể buộc bao gồm hoặc bỏ qua các chỉ mục bằng đối số
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    3342, bất kể công cụ cơ bản là gì

  • Tên cấp chỉ mục, nếu được chỉ định, phải là chuỗi

  • Trong công cụ

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    2497, các kiểu dữ liệu phân loại cho các loại không phải chuỗi có thể được đánh số thứ tự thành sàn gỗ, nhưng sẽ hủy đánh số thứ tự như kiểu dữ liệu nguyên thủy của chúng

  • Công cụ

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    2497 duy trì cờ
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4134 của các kiểu dữ liệu phân loại với các loại chuỗi.
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7442 không giữ cờ
    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    4134

  • Các loại không được hỗ trợ bao gồm

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7449 và các loại đối tượng Python thực tế. Những điều này sẽ đưa ra một thông báo lỗi hữu ích khi cố gắng tuần tự hóa. Loại
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7450 được hỗ trợ với pyarrow >= 0. 16. 0

  • Công cụ

    import openpyxl
    worksheet = openpyxl.load_workbook("codespeedy.xlsx")
    sheet = worksheet.active
    sheet.column_dimensions['A'].width = 20
    worksheet.save("codespeedy1.xlsx")
    2497 bảo tồn các loại dữ liệu mở rộng, chẳng hạn như loại dữ liệu chuỗi và số nguyên có thể null (yêu cầu pyarrow >= 0. 16. 0 và yêu cầu loại tiện ích mở rộng triển khai các giao thức cần thiết, hãy xem tài liệu về loại tiện ích mở rộng ).

Bạn có thể chỉ định một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7136 để điều khiển quá trình lập số sê-ri. Đây có thể là một trong số
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2497 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7442 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7455. Nếu động cơ KHÔNG được chỉ định, thì tùy chọn
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7456 sẽ được chọn;

Xem tài liệu về pyarrow và fastparquet

Ghi chú

These engines are very similar and should read/write nearly identical parquet format files.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7460 supports timedelta data,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7461 supports timezone aware datetimes. Các thư viện này khác nhau do có các phụ thuộc cơ bản khác nhau (
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7442 bằng cách sử dụng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7463, trong khi
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2497 sử dụng thư viện c)

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
558

Ghi vào một tập tin sàn gỗ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
559

Đọc từ một tập tin sàn gỗ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
560

Chỉ đọc một số cột nhất định của tệp sàn gỗ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
561

Xử lý chỉ mục#

Nối tiếp một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 thành sàn gỗ có thể bao gồm chỉ mục ẩn dưới dạng một hoặc nhiều cột trong tệp đầu ra. Như vậy, mã này

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
562

tạo một tệp sàn gỗ có ba cột nếu bạn sử dụng

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2497 để tuần tự hóa.
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7467,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7468 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7469. Nếu bạn đang sử dụng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7442, chỉ mục có thể được ghi hoặc không vào tệp

Cột bổ sung không mong muốn này khiến một số cơ sở dữ liệu như Amazon Redshift từ chối tệp vì cột đó không tồn tại trong bảng đích

Nếu bạn muốn bỏ qua các chỉ mục của khung dữ liệu khi viết, hãy chuyển

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7316 đến
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7472

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
563

Điều này tạo ra một tệp sàn gỗ chỉ với hai cột dự kiến,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7467 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7468. Nếu
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 của bạn có một chỉ mục tùy chỉnh, bạn sẽ không lấy lại được nó khi tải tệp này vào một
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43

Vượt qua

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7477 sẽ luôn ghi chỉ mục, ngay cả khi đó không phải là hành vi mặc định của công cụ cơ bản

Phân vùng tệp Parquet#

Sàn gỗ hỗ trợ phân vùng dữ liệu dựa trên giá trị của một hoặc nhiều cột

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
564

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7478 chỉ định thư mục mẹ mà dữ liệu sẽ được lưu.
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7479 là các tên cột mà tập dữ liệu sẽ được phân vùng. Các cột được phân vùng theo thứ tự chúng được cung cấp. Sự phân chia phân vùng được xác định bởi các giá trị duy nhất trong các cột phân vùng. Ví dụ trên tạo một tập dữ liệu được phân vùng có thể trông giống như

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
565

ORC#

Mới trong phiên bản 1. 0. 0

Tương tự như định dạng sàn gỗ , Định dạng ORC là tuần tự hóa cột nhị phân cho khung dữ liệu. Nó được thiết kế để làm cho việc đọc khung dữ liệu hiệu quả. gấu trúc cung cấp cả người đọc và người viết cho định dạng ORC,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7480 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7481. This requires the pyarrow library.

Cảnh báo

  • Rất nên cài đặt pyarrow bằng conda do một số sự cố xảy ra bởi pyarrow

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7481 yêu cầu pyarrow>=7. 0. 0

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7480 và
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7481 chưa được hỗ trợ trên Windows, bạn có thể tìm thấy các môi trường hợp lệ trên cài đặt các phần phụ thuộc tùy chọn .

  • Đối với các loại được hỗ trợ, vui lòng tham khảo các tính năng ORC được hỗ trợ trong Mũi tên

  • Các múi giờ hiện tại trong các cột ngày giờ không được giữ nguyên khi khung dữ liệu được chuyển đổi thành tệp ORC

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
566

Ghi vào tệp orc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
567

Đọc từ tệp orc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
568

Chỉ đọc một số cột nhất định của tệp orc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
569

truy vấn SQL#

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7485 module provides a collection of query wrappers to both facilitate data retrieval and to reduce dependency on DB-specific API. Trừu tượng hóa cơ sở dữ liệu được cung cấp bởi SQLAlchemy nếu được cài đặt. Ngoài ra, bạn sẽ cần một thư viện trình điều khiển cho cơ sở dữ liệu của mình. Ví dụ về các trình điều khiển như vậy là psycopg2 cho PostgreSQL hoặc pymysql cho MySQL. Đối với SQLite, điều này được bao gồm trong thư viện chuẩn của Python theo mặc định. Bạn có thể tìm thấy tổng quan về các trình điều khiển được hỗ trợ cho từng phương ngữ SQL trong tài liệu SQLAlchemy

Nếu SQLAlchemy chưa được cài đặt, dự phòng chỉ được cung cấp cho sqlite (và cho mysql để tương thích ngược, nhưng điều này không được dùng nữa và sẽ bị xóa trong phiên bản tương lai). Chế độ này yêu cầu bộ điều hợp cơ sở dữ liệu Python tôn trọng Python DB-API

Xem thêm một số ví dụ về sách dạy nấu ăn để biết một số chiến lược nâng cao.

Các chức năng chính là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7486(tên_bảng, con[, lược đồ,. ])

Đọc bảng cơ sở dữ liệu SQL vào DataFrame

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7487(sql, con[, index_col,. ])

Đọc truy vấn SQL vào DataFrame

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7488(sql, con[, index_col,. ])

Đọc truy vấn SQL hoặc bảng cơ sở dữ liệu vào DataFrame

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7489(tên, con[, sơ đồ,. ])

Ghi các bản ghi được lưu trữ trong DataFrame vào cơ sở dữ liệu SQL

Ghi chú

The function

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7490 is a convenience wrapper around
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7491 and
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7492 (and for backward compatibility) and will delegate to specific function depending on the provided input (database table name or sql query). Tên bảng không cần trích dẫn nếu có ký tự đặc biệt

Trong ví dụ sau, chúng tôi sử dụng công cụ cơ sở dữ liệu SQLite SQL. Bạn có thể sử dụng cơ sở dữ liệu SQLite tạm thời nơi dữ liệu được lưu trữ trong “bộ nhớ”

Để kết nối với SQLAlchemy, bạn sử dụng hàm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7493 để tạo đối tượng công cụ từ URI cơ sở dữ liệu. Bạn chỉ cần tạo công cụ một lần cho mỗi cơ sở dữ liệu mà bạn đang kết nối. Để biết thêm thông tin về
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7493 và định dạng URI, hãy xem các ví dụ bên dưới và tài liệu SQLAlchemy

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
570

Nếu bạn muốn quản lý các kết nối của riêng mình, bạn có thể chuyển một trong các kết nối đó. Ví dụ bên dưới mở một kết nối tới cơ sở dữ liệu bằng trình quản lý bối cảnh Python tự động đóng kết nối sau khi khối hoàn thành. Xem tài liệu SQLAlchemy để được giải thích về cách xử lý kết nối cơ sở dữ liệu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
571

Cảnh báo

Khi bạn mở một kết nối tới cơ sở dữ liệu, bạn cũng chịu trách nhiệm đóng nó. Tác dụng phụ của việc mở kết nối có thể bao gồm khóa cơ sở dữ liệu hoặc hành vi vi phạm khác

Viết DataFrames#

Giả sử dữ liệu sau nằm trong một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
56, chúng ta có thể chèn nó vào cơ sở dữ liệu bằng cách sử dụng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7497

Tôi

Ngày tháng

Cột_1

Cột_2

Cột_3

26

2012-10-18

X

25. 7

Thật

42

2012-10-19

Y

-12. 4

Sai

63

2012-10-20

Z

5. 73

Thật

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
572

Với một số cơ sở dữ liệu, việc ghi DataFrames lớn có thể dẫn đến lỗi do vượt quá giới hạn kích thước gói. Điều này có thể tránh được bằng cách đặt tham số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90 khi gọi
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7499. Ví dụ: phần sau ghi
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
56 vào cơ sở dữ liệu theo lô 1000 hàng cùng một lúc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
573

Các kiểu dữ liệu SQL#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7497 sẽ cố gắng ánh xạ dữ liệu của bạn sang loại dữ liệu SQL thích hợp dựa trên loại dữ liệu. Khi bạn có các cột dtype
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
72, gấu trúc sẽ cố gắng suy ra kiểu dữ liệu

Bạn luôn có thể ghi đè loại mặc định bằng cách chỉ định loại SQL mong muốn của bất kỳ cột nào bằng cách sử dụng đối số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
88. Đối số này cần tên cột ánh xạ từ điển tới các loại SQLAlchemy (hoặc chuỗi cho chế độ dự phòng sqlite3). Ví dụ: chỉ định sử dụng loại sqlalchemy
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7504 thay vì loại
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7505 mặc định cho các cột chuỗi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
574

Ghi chú

Do hỗ trợ hạn chế cho timedelta trong các hương vị cơ sở dữ liệu khác nhau, các cột có loại

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7506 sẽ được ghi dưới dạng giá trị số nguyên dưới dạng nano giây vào cơ sở dữ liệu và cảnh báo sẽ được đưa ra

Ghi chú

Các cột của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7399 dtype sẽ được chuyển thành biểu diễn dày đặc như bạn sẽ nhận được với
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7508 (e. g. đối với các danh mục chuỗi, điều này mang lại một chuỗi các chuỗi). Do đó, việc đọc lại bảng cơ sở dữ liệu không tạo ra một phân loại

Kiểu dữ liệu ngày giờ#

Sử dụng SQLAlchemy,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7497 có khả năng ghi dữ liệu ngày giờ không biết múi giờ hoặc nhận biết múi giờ. Tuy nhiên, dữ liệu kết quả được lưu trữ trong cơ sở dữ liệu cuối cùng phụ thuộc vào loại dữ liệu được hỗ trợ cho dữ liệu ngày giờ của hệ thống cơ sở dữ liệu đang được sử dụng

Bảng sau đây liệt kê các kiểu dữ liệu được hỗ trợ cho dữ liệu ngày giờ đối với một số cơ sở dữ liệu phổ biến. Các phương ngữ cơ sở dữ liệu khác có thể có các loại dữ liệu khác nhau cho dữ liệu ngày giờ

cơ sở dữ liệu

SQL Datetime Types

Hỗ trợ múi giờ

SQLite

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7510

Không

mysql

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7511 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7512

Không

PostgreSQL

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7511 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7514

Đúng

Khi ghi dữ liệu nhận biết múi giờ vào cơ sở dữ liệu không hỗ trợ múi giờ, dữ liệu sẽ được ghi dưới dạng dấu thời gian ngây thơ múi giờ theo giờ địa phương đối với múi giờ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7491 cũng có khả năng đọc dữ liệu ngày giờ nhận biết múi giờ hoặc ngây thơ. Khi đọc các loại
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7514, gấu trúc sẽ chuyển đổi dữ liệu sang UTC

Phương pháp chèn #

Tham số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7517 kiểm soát mệnh đề chèn SQL được sử dụng. giá trị có thể là

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    24. Sử dụng mệnh đề SQL
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7519 tiêu chuẩn (mỗi hàng một cái)

  • In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7520. Truyền nhiều giá trị trong một mệnh đề
    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7519. Nó sử dụng một cú pháp SQL đặc biệt không được hỗ trợ bởi tất cả các chương trình phụ trợ. Điều này thường mang lại hiệu suất tốt hơn cho các cơ sở dữ liệu phân tích như Presto và Redshift, nhưng lại có hiệu suất kém hơn đối với phần phụ trợ SQL truyền thống nếu bảng chứa nhiều cột. Để biết thêm thông tin, hãy kiểm tra tài liệu SQLAlchemy

  • có thể gọi được với chữ ký

    In [13]: import numpy as np
    
    In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"
    
    In [15]: print(data)
    a,b,c,d
    1,2,3,4
    5,6,7,8
    9,10,11
    
    In [16]: df = pd.read_csv(StringIO(data), dtype=object)
    
    In [17]: df
    Out[17]: 
       a   b   c    d
    0  1   2   3    4
    1  5   6   7    8
    2  9  10  11  NaN
    
    In [18]: df["a"][0]
    Out[18]: '1'
    
    In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})
    
    In [20]: df.dtypes
    Out[20]: 
    a      int64
    b     object
    c    float64
    d      Int64
    dtype: object
    
    7522. Điều này có thể được sử dụng để triển khai phương thức chèn hiệu quả hơn dựa trên các tính năng phương ngữ phụ trợ cụ thể

Ví dụ về một mệnh đề có thể gọi được bằng PostgreSQL COPY

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
575

Bảng đọc #

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7491 sẽ đọc một bảng cơ sở dữ liệu được cung cấp tên bảng và tùy chọn một tập hợp con các cột để đọc

Ghi chú

Để sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7491, bạn phải cài đặt phần phụ thuộc tùy chọn SQLAlchemy

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
576

Ghi chú

Lưu ý rằng gấu trúc suy ra các kiểu cột từ đầu ra truy vấn chứ không phải bằng cách tra cứu các loại dữ liệu trong lược đồ cơ sở dữ liệu vật lý. Ví dụ: giả sử

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7525 là một cột số nguyên trong bảng. Then, intuitively,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7526 will return integer-valued series, while
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7527 will return object-valued (str) series. Theo đó, nếu đầu ra truy vấn trống, thì tất cả các cột kết quả sẽ được trả về dưới dạng giá trị đối tượng (vì chúng là tổng quát nhất). Nếu bạn thấy trước rằng truy vấn của mình đôi khi sẽ tạo ra một kết quả trống, thì bạn có thể muốn đánh máy rõ ràng sau đó để đảm bảo tính toàn vẹn của dtype

Bạn cũng có thể chỉ định tên của cột là chỉ mục

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 và chỉ định một tập hợp con các cột sẽ được đọc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
577

Và bạn rõ ràng có thể buộc các cột được phân tích thành ngày

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
578

Nếu cần, bạn có thể chỉ định rõ ràng một chuỗi định dạng hoặc một lệnh của các đối số để chuyển tới

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7529

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
579

Bạn có thể kiểm tra xem một bảng có tồn tại hay không bằng cách sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7530

Hỗ trợ lược đồ #

Việc đọc và ghi vào các lược đồ khác nhau được hỗ trợ thông qua từ khóa

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4116 trong các hàm
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7491 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7497. Tuy nhiên, lưu ý rằng điều này phụ thuộc vào hương vị cơ sở dữ liệu (sqlite không có lược đồ). Ví dụ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
580

Querying#

Bạn có thể truy vấn bằng SQL thô trong hàm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7492. Trong trường hợp này, bạn phải sử dụng biến thể SQL phù hợp với cơ sở dữ liệu của mình. Khi sử dụng SQLAlchemy, bạn cũng có thể chuyển các cấu trúc ngôn ngữ Biểu thức SQLAlchemy, không liên quan đến cơ sở dữ liệu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
581

Tất nhiên, bạn có thể chỉ định một truy vấn “phức tạp” hơn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
582

Hàm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7492 hỗ trợ đối số
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90. Việc chỉ định điều này sẽ trả về một trình vòng lặp thông qua các đoạn kết quả truy vấn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
583

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
584

Bạn cũng có thể chạy một truy vấn đơn giản mà không cần tạo một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 với
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7538. Điều này hữu ích cho các truy vấn không trả về giá trị, chẳng hạn như INSERT. Điều này có chức năng tương đương với việc gọi
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7539 trên công cụ SQLAlchemy hoặc đối tượng kết nối db. Một lần nữa, bạn phải sử dụng biến thể cú pháp SQL phù hợp với cơ sở dữ liệu của mình

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
585

Ví dụ về kết nối động cơ#

Để kết nối với SQLAlchemy, bạn sử dụng hàm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7493 để tạo đối tượng công cụ từ URI cơ sở dữ liệu. Bạn chỉ cần tạo công cụ một lần cho mỗi cơ sở dữ liệu mà bạn đang kết nối

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
586

Để biết thêm thông tin, hãy xem các ví dụ về tài liệu SQLAlchemy

Advanced SQLAlchemy queries#

You can use SQLAlchemy constructs to describe your query

Sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7541 để chỉ định các tham số truy vấn theo cách trung lập với phụ trợ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
587

Nếu bạn có một mô tả SQLAlchemy về cơ sở dữ liệu của mình, bạn có thể biểu thị các điều kiện ở đâu bằng các biểu thức SQLAlchemy

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
588

Bạn có thể kết hợp các biểu thức SQLAlchemy với các tham số được truyền tới

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7490 bằng cách sử dụng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7543

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
589

Dự phòng Sqlite#

Việc sử dụng sqlite được hỗ trợ mà không cần sử dụng SQLAlchemy. Chế độ này yêu cầu bộ điều hợp cơ sở dữ liệu Python tôn trọng Python DB-API

Bạn có thể tạo các kết nối như vậy

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
590

Và sau đó đưa ra các truy vấn sau

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
591

Google BigQuery#

Cảnh báo

bắt đầu bằng 0. 20. 0, pandas đã tách hỗ trợ Google BigQuery thành gói riêng biệt

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7544. Bạn có thể
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7545 để lấy nó

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7544 package provides functionality to read/write from Google BigQuery

gấu trúc tích hợp với gói bên ngoài này. nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7544 được cài đặt, bạn có thể sử dụng các phương thức pandas
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7548 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7549, sẽ gọi các hàm tương ứng từ
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7544

Tài liệu đầy đủ có thể được tìm thấy ở đây

định dạng thống kê #

Ghi vào định dạng stata#

The method

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7551 will write a DataFrame into a . dta file. Phiên bản định dạng của tệp này luôn là 115 (Stata 12)

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
592

Các tệp dữ liệu Stata có hỗ trợ loại dữ liệu hạn chế; . Ngoài ra, Stata dự trữ các giá trị nhất định để biểu thị dữ liệu bị thiếu. Xuất một giá trị không bị thiếu nằm ngoài phạm vi cho phép trong Stata cho một loại dữ liệu cụ thể sẽ nhập lại biến có kích thước lớn hơn tiếp theo. Ví dụ: các giá trị

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7552 bị hạn chế nằm trong khoảng từ -127 đến 100 trong Stata và do đó, các biến có giá trị trên 100 sẽ kích hoạt chuyển đổi thành
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7553. Các giá trị
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7249 trong kiểu dữ liệu dấu phẩy động được lưu trữ dưới dạng kiểu dữ liệu bị thiếu cơ bản (
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7561 trong Stata)

Ghi chú

Không thể xuất giá trị dữ liệu bị thiếu cho kiểu dữ liệu số nguyên

Người viết Stata xử lý một cách duyên dáng các loại dữ liệu khác bao gồm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7562,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7563,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7564,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7565,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7566 bằng cách chuyển sang loại được hỗ trợ nhỏ nhất có thể biểu thị dữ liệu. Ví dụ: dữ liệu có loại
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7564 sẽ được chuyển thành
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7552 nếu tất cả các giá trị nhỏ hơn 100 (giới hạn trên đối với dữ liệu
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7552 không bị thiếu trong Stata) hoặc, nếu các giá trị nằm ngoài phạm vi này, biến sẽ được chuyển thành

Cảnh báo

Chuyển đổi từ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7562 sang
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7556 có thể dẫn đến mất độ chính xác nếu giá trị
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7562 lớn hơn 2**53

Cảnh báo

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7574 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7551 chỉ hỗ trợ các chuỗi có chiều rộng cố định chứa tối đa 244 ký tự, giới hạn do định dạng tệp dta phiên bản 115 áp đặt. Cố gắng ghi các tệp Stata dta với các chuỗi dài hơn 244 ký tự sẽ gây ra lỗi
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3327

Đọc từ định dạng Stata#

Hàm cấp cao nhất

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7577 sẽ đọc tệp dta và trả về
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7579 có thể được sử dụng để đọc tệp tăng dần

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
593

Specifying a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90 yields a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7579 instance that can be used to read
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90 lines from the file at a time. The
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7579 object can be used as an iterator

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
594

Để kiểm soát chi tiết hơn, hãy sử dụng

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2495 và chỉ định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90 với mỗi lệnh gọi tới
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
18

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
595

Hiện tại,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3342 được truy xuất dưới dạng một cột

Tham số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7588 cho biết có nên đọc và sử dụng nhãn giá trị để tạo biến
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0524 từ chúng hay không. Các nhãn giá trị cũng có thể được truy xuất bằng hàm
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7590, hàm này yêu cầu gọi ____
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
18 trước khi sử dụng

Tham số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7592 cho biết liệu các biểu diễn giá trị bị thiếu trong Stata có nên được giữ lại hay không. Nếu
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
61 (mặc định), các giá trị bị thiếu được biểu thị dưới dạng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7248. Nếu
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32, các giá trị bị thiếu được biểu diễn bằng các đối tượng
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7596 và các cột chứa các giá trị bị thiếu sẽ có kiểu dữ liệu
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
72

Ghi chú

Hỗ trợ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7598 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7579. định dạng dta 113-115 (Stata 10-12), 117 (Stata 13) và 118 (Stata 14)

Ghi chú

Cài đặt

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7600 sẽ upcast lên kiểu dữ liệu pandas tiêu chuẩn.
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7562 cho tất cả các loại số nguyên và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7556 cho dữ liệu dấu chấm động. Theo mặc định, kiểu dữ liệu Stata được giữ nguyên khi nhập

Dữ liệu phân loại#

Dữ liệu

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0524 có thể được xuất sang tệp dữ liệu Stata dưới dạng dữ liệu được gắn nhãn giá trị. Dữ liệu đã xuất bao gồm các mã danh mục cơ bản dưới dạng giá trị dữ liệu số nguyên và danh mục dưới dạng nhãn giá trị. Stata không có tương đương rõ ràng với
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0524 và thông tin về việc biến có được sắp xếp hay không bị mất khi xuất

Cảnh báo

Stata chỉ hỗ trợ các nhãn giá trị chuỗi và do đó,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
15 được gọi trên các danh mục khi xuất dữ liệu. Exporting
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0524 variables with non-string categories produces a warning, and can result a loss of information if the
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
15 representations of the categories are not unique

Tương tự, dữ liệu được gắn nhãn có thể được nhập từ các tệp dữ liệu Stata dưới dạng các biến

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0524 bằng cách sử dụng đối số từ khóa
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7588 (
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32 theo mặc định). Đối số từ khóa
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7611 (
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
32 theo mặc định) xác định xem các biến
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0524 đã nhập có được sắp xếp hay không

Ghi chú

Khi nhập dữ liệu phân loại, giá trị của các biến trong tệp dữ liệu Stata không được bảo toàn do các biến

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0524 luôn sử dụng các kiểu dữ liệu số nguyên trong khoảng từ
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7615 đến
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7616 trong đó
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7310 là số lượng phân loại. Nếu các giá trị gốc trong tệp dữ liệu Stata là bắt buộc, thì có thể nhập các giá trị này bằng cách đặt ____67618, thao tác này sẽ nhập dữ liệu gốc (chứ không phải nhãn biến). Các giá trị ban đầu có thể khớp với dữ liệu phân loại đã nhập vì có một ánh xạ đơn giản giữa các giá trị dữ liệu Stata ban đầu và mã danh mục của các biến Phân loại đã nhập. các giá trị bị thiếu được gán mã
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7615 và giá trị ban đầu nhỏ nhất được gán là
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
84, giá trị nhỏ thứ hai được gán là
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7621, v.v. cho đến khi giá trị ban đầu lớn nhất được gán mã
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7616

Ghi chú

Stata hỗ trợ sê-ri được dán nhãn một phần. Các chuỗi này có nhãn giá trị cho một số nhưng không phải tất cả các giá trị dữ liệu. Nhập chuỗi được gắn nhãn một phần sẽ tạo ra một

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
0524 với các danh mục chuỗi cho các giá trị được gắn nhãn và danh mục số cho các giá trị không có nhãn

định dạng SAS #

Hàm cấp cao nhất

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7624 có thể đọc (nhưng không ghi) SAS XPORT (. xpt) và (kể từ v0. 18. 0) SAS7BDAT (. sas7bdat) định dạng tập tin

Tệp SAS chỉ chứa hai loại giá trị. ASCII text and floating point values (usually 8 bytes but sometimes truncated). Đối với tệp xuất, không có chuyển đổi loại tự động thành số nguyên, ngày hoặc phân loại. Đối với các tệp SAS7BDAT, mã định dạng có thể cho phép các biến ngày được tự động chuyển đổi thành ngày. Theo mặc định, toàn bộ tệp được đọc và trả về dưới dạng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43

Chỉ định một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
90 hoặc sử dụng
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2495 để lấy các đối tượng người đọc (
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7628 hoặc
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7629) để đọc tệp dần dần. Các đối tượng người đọc cũng có các thuộc tính chứa thông tin bổ sung về tệp và các biến của nó

Đọc tệp SAS7BDAT

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
596

Lấy một trình vòng lặp và đọc một tệp XPORT 100.000 dòng cùng một lúc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
597

Thông số kỹ thuật cho định dạng tệp xport có sẵn trên trang web của SAS

Không có tài liệu chính thức nào cho định dạng SAS7BDAT

định dạng SPSS#

Mới trong phiên bản 0. 25. 0

Hàm cấp cao nhất

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7630 có thể đọc (nhưng không ghi) SPSS SAV (. sav) và ZSAV (. tệp định dạng zsav)

Tệp SPSS chứa tên cột. By default the whole file is read, categorical columns are converted into

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7631, and a
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 with all columns is returned

Chỉ định tham số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 để có được một tập hợp con các cột. Chỉ định
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7618 để tránh chuyển đổi các cột phân loại thành
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7631

Đọc một tệp SPSS

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
598

Trích xuất một tập hợp con các cột có trong

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
47 từ tệp SPSS và tránh chuyển đổi các cột phân loại thành
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7631

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
599

Thông tin thêm về các định dạng tệp SAV và ZSAV có tại đây

Các định dạng tệp khác#

bản thân gấu trúc chỉ hỗ trợ IO với một bộ định dạng tệp giới hạn ánh xạ rõ ràng tới mô hình dữ liệu dạng bảng của nó. Để đọc và ghi các định dạng tệp khác vào và từ gấu trúc, chúng tôi khuyên dùng các gói này từ cộng đồng rộng lớn hơn

netCDF#

xarray cung cấp cấu trúc dữ liệu lấy cảm hứng từ gấu trúc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
43 để làm việc với bộ dữ liệu đa chiều, tập trung vào định dạng tệp netCDF và chuyển đổi dễ dàng sang và từ gấu trúc

Cân nhắc về hiệu suất#

Đây là một so sánh không chính thức của các phương pháp IO khác nhau, sử dụng pandas 0. 24. 2. Thời gian phụ thuộc vào máy và nên bỏ qua những khác biệt nhỏ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
00

Các chức năng kiểm tra sau đây sẽ được sử dụng bên dưới để so sánh hiệu suất của một số phương pháp IO

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
01

Khi viết, ba chức năng hàng đầu về tốc độ là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7639,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7640 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7641

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
02

Khi đọc, ba chức năng hàng đầu về tốc độ là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7642,
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7643 và
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
7644