Làm cách nào để thay đổi độ rộng của cột trong excel bằng openpyxl?

Question

Tôi đang sử dụng OpenPyXL để tạo một số bản tải xuống Excel từ một trong các ứng dụng Django của mình. In general, I'm pretty happy with how it works. Đặt chiều rộng cột thật dễ dàng. Tìm ra cách để làm điều đó từ tài liệu là khó. Vì vậy, đây là, chỉ trong trường hợp người khác cần biết (hoặc tôi một tuần kể từ bây giờ khi tôi quên)

Nội dung chính Show

To change or modify column width size
Tệp CSV & văn bản#
Tùy chọn phân tích cú pháp #
Chỉ định kiểu dữ liệu cột#
Chỉ định phân loại dtype#
Đặt tên và sử dụng các cột#
Phân tích cú pháp tên trùng lặp#
Nhận xét và dòng trống#
Xử lý dữ liệu Unicode#
Cột chỉ mục và dấu phân cách #
Xử lý ngày#
Specifying method for floating-point conversion#
Thousand separators#
Returning Series#
Boolean values#
Handling “bad” lines#
Quoting and Escape Characters#
Files with fixed width columns#
Automatically “sniffing” the delimiter#
Reading multiple files to create a single DataFrame#
Iterating through files chunk by chunk#
Specifying the parser engine#
Reading/writing remote files#
Writing out data#
Writing JSON#
Reading JSON#
Normalization#
Line delimited json#
Table schema#
Reading HTML content#
Writing to HTML files#
HTML Table Parsing Gotchas#
Writing to LaTeX files#
Đọc XML#
Writing XML#
XML Final Notes#
Excel files#
Reading Excel files#
Writing Excel files#
Công cụ soạn thảo Excel#
Phong cách và định dạng#
Bảng tính OpenDocument#
Excel nhị phân (. tệp xlsb) #
Bảng nhớ tạm #
dưa muối#
Tập tin dưa nén #
HDF5 (PyTables)#
Đọc/ghi API#
Định dạng cố định #
Định dạng bảng #
Hierarchical keys#
Storing types#
Delete from a table#
Notes & caveats#
Loại dữ liệu#
Khả năng tương thích bên ngoài#
Performance#
Sàn gỗ #
Xử lý chỉ mục#
Phân vùng tệp Parquet#
truy vấn SQL#
Viết DataFrames#
Kiểu dữ liệu ngày giờ#
Bảng đọc #
Hỗ trợ lược đồ #
Ví dụ về kết nối động cơ#
Advanced SQLAlchemy queries#
Dự phòng Sqlite#
Google BigQuery#
định dạng thống kê #
Ghi vào định dạng stata#
Đọc từ định dạng Stata#
định dạng SAS #
định dạng SPSS#
Các định dạng tệp khác#
Cân nhắc về hiệu suất#

Vì một số lý do, việc đặt cài đặt chiều rộng bên trong

pip install openpyxl

647 hoạt động

using PyCall
xl = pyimport("openpyxl")
wb = xl.Workbook()
ws = wb.active
# ws.column_dimensions["A"].width = 25  ## ERROR: KeyError: key "A" not found
py"""
$ws.column_dimensions["A"].width = 25
"""
wb.save("foo.xlsx")

Worksheet objects have

pip install openpyxl

648 and

pip install openpyxl

649 attributes that control row heights and column widths. A sheet’s

pip install openpyxl

648 and

pip install openpyxl

649are dictionary-like values; row_dimensions contains RowDimension objects and column_dimensions contains ColumnDimension objects. Trong row_dimensions, người ta có thể truy cập một trong các đối tượng bằng cách sử dụng số của hàng (trong trường hợp này là 1 hoặc 2). Trong column_dimensions, người ta có thể truy cập một trong các đối tượng bằng chữ cái của cột (trong trường hợp này là A hoặc B)

Mã số 1. Chương trình thiết lập kích thước của các ô

pip install openpyxl

652

pip install openpyxl

653

pip install openpyxl

654

pip install openpyxl

655

pip install openpyxl

656

pip install openpyxl

657

pip install openpyxl

6490

pip install openpyxl

6491

pip install openpyxl

6492

pip install openpyxl

655

pip install openpyxl

6494

pip install openpyxl

6495

pip install openpyxl

6496

pip install openpyxl

6491

pip install openpyxl

6498

pip install openpyxl

655

pip install openpyxl

6480

pip install openpyxl

6481

pip install openpyxl

6491

pip install openpyxl

64823

pip install openpyxl

6484

pip install openpyxl

6491

pip install openpyxl

64823

pip install openpyxl

64960

pip install openpyxl

6491

pip install openpyxl

64962

pip install openpyxl

6491

pip install openpyxl

64964

pip install openpyxl

64843

pip install openpyxl

6491

pip install openpyxl

64845

pip install openpyxl

6538

First of all, you must make sure to install the openpyxl library. You can do the same by running the below command on your terminal

pip install openpyxl

How to install openpyxl in Python

To change or modify column width size

In order to change the column width size, you can make use of the column_dimesnsions method of the worksheet class.
Syntax. worksheet. column_dimensions[tên cột]. width=size

Hãy để chúng tôi xem xét tương tự với ví dụ dưới đây

Consider an existing excel file codespeedy. xlsx as shown below;

So, now let us change the column size of column A;

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

As you can see, we have modified the column size of A to 20 and saved the file after modification as codespeedy1. xlsx

Tương tự, bạn cũng có thể sửa đổi độ rộng cột của nhiều hàng như hình minh họa;

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
sheet.column_dimensions['C'].width = 20
sheet.column_dimensions['E'].width = 30
worksheet.save("codespeedy1.xlsx")

Well, isn’t it amazing how you can manage such significant changes with such simple, small lines of code? Well, that in itself is the beauty of Python

The pandas I/O API is a set of top level

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

05 functions accessed like

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

06 that generally return a pandas object. Các hàm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

07 tương ứng là các phương thức đối tượng được truy cập như

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

08. Below is a table containing available

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

09 and

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

10

Format Type

Data Description

Reader

Writer

chữ

CSV

read_csv

to_csv

chữ

Tệp văn bản có chiều rộng cố định

read_fwf

chữ

JSON

read_json

to_json

chữ

HTML

read_html

to_html

chữ

LaTeX

Styler. to_latex

chữ

XML

read_xml

to_xml

chữ

Bảng tạm cục bộ

read_clipboard

to_clipboard

nhị phân

MS Excel

read_excel

to_excel

nhị phân

tài liệu mở

read_excel

nhị phân

Định dạng HDF5

read_hdf

to_hdf

nhị phân

định dạng lông vũ

read_feather

to_feather

nhị phân

định dạng sàn gỗ

read_parquet

to_parquet

nhị phân

Định dạng ORC

read_orc

to_orc

nhị phân

trạng thái

read_stata

to_stata

nhị phân

SAS

read_sas

nhị phân

SPSS

read_spss

nhị phân

Định dạng dưa chua Python

read_pickle

to_pickle

SQL

read_sql

to_sql

SQL

Google BigQuery

read_gbq

to_gbq

Đây là so sánh hiệu suất không chính thức đối với một số phương thức IO này.

Ghi chú

Đối với các ví dụ sử dụng lớp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

11, hãy đảm bảo bạn nhập lớp đó bằng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

12 cho Python 3

Tệp CSV & văn bản#

Hàm đặc biệt để đọc tệp văn bản (a. k. a. tệp phẳng) là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

13. Xem sách dạy nấu ăn để biết một số chiến lược nâng cao.

Tùy chọn phân tích cú pháp #

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

13 chấp nhận các đối số phổ biến sau

Căn bản#

filepath_or_buffer khác nhau

Đường dẫn đến tệp (một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

15,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

16 hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

17), URL (bao gồm các vị trí http, ftp và S3) hoặc bất kỳ đối tượng nào có phương thức

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

18 (chẳng hạn như tệp đang mở hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

11)

sep str, mặc định là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

20 cho

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

13,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

22 cho

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

23

Dấu phân cách để sử dụng. Nếu sep là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24, công cụ C không thể tự động phát hiện dấu phân cách, nhưng công cụ phân tích cú pháp Python thì có thể, nghĩa là công cụ phân tích cú pháp sau sẽ được sử dụng và tự động phát hiện dấu phân cách bằng công cụ trình thám thính dựng sẵn của Python,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

25. Ngoài ra, dấu phân cách dài hơn 1 ký tự và khác với

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

26 sẽ được hiểu là biểu thức chính quy và cũng sẽ buộc sử dụng công cụ phân tích cú pháp Python. Lưu ý rằng các dấu phân cách regex có xu hướng bỏ qua dữ liệu được trích dẫn. Ví dụ về biểu thức chính quy.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

27

dấu phân cách str, mặc định là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Tên đối số thay thế cho sep

delim_whitespace boolean, default False

Chỉ định có hay không khoảng trắng (e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

29 or

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

30) will be used as the delimiter. Equivalent to setting

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

31. If this option is set to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32, nothing should be passed in for the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

33 parameter

Column and index locations and names#

header int or list of ints, default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

34

Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names. if no names are passed the behavior is identical to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

35 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

36. Explicitly pass

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

35 to be able to replace existing names

The header can be a list of ints that specify row locations for a MultiIndex on the columns e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

38. Intervening rows that are not specified will be skipped (e. g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

39, so header=0 denotes the first line of data rather than the first line of the file

tên dạng mảng, mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

List of column names to use. Nếu tệp không chứa hàng tiêu đề, thì bạn nên chuyển rõ ràng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

36. Bản sao trong danh sách này không được phép

index_col int, str, chuỗi int / str hoặc Sai, tùy chọn, mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

(Các) cột để sử dụng làm nhãn hàng của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43, được cung cấp dưới dạng tên chuỗi hoặc chỉ mục cột. Nếu một chuỗi int / str được đưa ra, Multi Index được sử dụng

Ghi chú

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

44 có thể được sử dụng để buộc gấu trúc không sử dụng cột đầu tiên làm chỉ mục, e. g. khi bạn có tệp không đúng định dạng với dấu phân cách ở cuối mỗi dòng

Giá trị mặc định của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24 hướng dẫn gấu trúc đoán. Nếu số trường trong hàng tiêu đề cột bằng với số trường trong phần thân của tệp dữ liệu thì chỉ mục mặc định được sử dụng. Nếu nó lớn hơn, thì các cột đầu tiên được sử dụng làm chỉ mục sao cho số trường còn lại trong phần nội dung bằng với số trường trong tiêu đề

Hàng đầu tiên sau tiêu đề được sử dụng để xác định số lượng cột sẽ được đưa vào chỉ mục. If the subsequent rows contain less columns than the first row, they are filled with

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

46

Điều này có thể tránh được thông qua

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

47. Điều này đảm bảo rằng các cột được lấy nguyên trạng và dữ liệu theo sau bị bỏ qua

usecols giống như danh sách hoặc có thể gọi được, mặc định là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Return a subset of the columns. If list-like, all elements must either be positional (i. e. integer indices into the document columns) or strings that correspond to column names provided either by the user in

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

49 or inferred from the document header row(s). Nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

49 được cung cấp, (các) hàng tiêu đề tài liệu không được tính đến. For example, a valid list-like

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

47 parameter would be

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

52 or

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

53

Element order is ignored, so

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

54 is the same as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

55. Để khởi tạo một DataFrame từ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

56 với thứ tự phần tử được giữ nguyên, hãy sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

57 cho các cột theo thứ tự

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

58 hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

59 cho thứ tự

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

60

Nếu có thể gọi được, hàm có thể gọi được sẽ được đánh giá dựa trên tên cột, trả về các tên mà hàm có thể gọi được đánh giá là True

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

0

Sử dụng tham số này dẫn đến thời gian phân tích cú pháp nhanh hơn nhiều và sử dụng bộ nhớ thấp hơn khi sử dụng công cụ c. The Python engine loads the data first before deciding which columns to drop

squeeze boolean, default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61

Nếu dữ liệu được phân tích cú pháp chỉ chứa một cột thì hãy trả về

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

62

Không dùng nữa kể từ phiên bản 1. 4. 0. Nối

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

63 vào lệnh gọi tới

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

64 để nén dữ liệu.

tiền tố str, mặc định là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Tiền tố để thêm vào số cột khi không có tiêu đề, e. g. 'X' cho X0, X1, ...

Không dùng nữa kể từ phiên bản 1. 4. 0. Sử dụng cách hiểu danh sách trên các cột của DataFrame sau khi gọi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

66.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7

mangle_dupe_cols boolean, mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32

Các cột trùng lặp sẽ được chỉ định là 'X', 'X. 1’…’X. N’, rather than ‘X’…’X’. Passing in

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61 will cause data to be overwritten if there are duplicate names in the columns

Không dùng nữa kể từ phiên bản 1. 5. 0. Đối số chưa bao giờ được triển khai và thay vào đó, một đối số mới có thể chỉ định mẫu đổi tên sẽ được thêm vào.

Cấu hình phân tích cú pháp chung#

dtype Nhập tên hoặc chính tả của cột -> loại, mặc định là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Kiểu dữ liệu cho dữ liệu hoặc cột. e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

70 Sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

15 hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

72 cùng với cài đặt

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

73 phù hợp để giữ nguyên và không diễn giải dtype. Nếu bộ chuyển đổi được chỉ định, chúng sẽ được áp dụng THAY THẾ cho chuyển đổi dtype

Mới trong phiên bản 1. 5. 0. Đã thêm hỗ trợ cho defaultdict. Chỉ định một defaultdict làm đầu vào trong đó mặc định xác định dtype của các cột không được liệt kê rõ ràng.

công cụ {

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

74,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

75,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

76}

Công cụ phân tích cú pháp để sử dụng. The C and pyarrow engines are faster, while the python engine is currently more feature-complete. Đa luồng hiện chỉ được hỗ trợ bởi công cụ pyarrow

New in version 1. 4. 0. Công cụ “pyarrow” đã được thêm làm công cụ thử nghiệm và một số tính năng không được hỗ trợ hoặc có thể không hoạt động chính xác với công cụ này.

bộ chuyển đổi chính tả, mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Dict của các chức năng để chuyển đổi các giá trị trong các cột nhất định. Các khóa có thể là số nguyên hoặc nhãn cột

true_values danh sách, mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Các giá trị được coi là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32

false_values danh sách, mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Values to consider as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61

skipinitialspace boolean, mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61

Bỏ qua khoảng trắng sau dấu phân cách

skiprows dạng danh sách hoặc số nguyên, mặc định là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Số dòng cần bỏ qua (được lập chỉ mục 0) hoặc số dòng cần bỏ qua (int) ở đầu tệp

Nếu có thể gọi được, hàm có thể gọi được sẽ được đánh giá dựa trên các chỉ số hàng, trả về True nếu hàng sẽ bị bỏ qua và Sai nếu không

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

5

skipfooter int, mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

84

Number of lines at bottom of file to skip (unsupported with engine=’c’)

nrows int, mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Số hàng của tập tin để đọc. Hữu ích để đọc các phần của tệp lớn

low_memory boolean, default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32

Xử lý nội bộ tệp theo khối, dẫn đến việc sử dụng bộ nhớ thấp hơn trong khi phân tích cú pháp, nhưng có thể suy luận kiểu hỗn hợp. Để đảm bảo không có loại hỗn hợp, hãy đặt

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61 hoặc chỉ định loại bằng tham số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

88. Note that the entire file is read into a single

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 regardless, use the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

90 or

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

91 parameter to return the data in chunks. (Chỉ hợp lệ với trình phân tích cú pháp C)

memory_map boolean, mặc định Sai

Nếu đường dẫn tệp được cung cấp cho ______ 492, ánh xạ đối tượng tệp trực tiếp vào bộ nhớ và truy cập dữ liệu trực tiếp từ đó. Sử dụng tùy chọn này có thể cải thiện hiệu suất vì không còn bất kỳ chi phí I/O nào nữa

NA và xử lý dữ liệu bị thiếu#

na_values vô hướng, str, dạng danh sách hoặc dict, mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Các chuỗi bổ sung để nhận dạng là NA/NaN. Nếu dict được thông qua, các giá trị NA cụ thể trên mỗi cột. Xem na values const bên dưới để biết danh sách các giá trị được hiểu là NaN theo mặc định.

keep_default_na boolean, default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32

Whether or not to include the default NaN values when parsing the data. Depending on whether

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

73 is passed in, the behavior is as follows

Nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

96 là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32 và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

73 được chỉ định, thì

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

73 được thêm vào các giá trị NaN mặc định được sử dụng để phân tích cú pháp

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

96 is

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32, and

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

73 are not specified, only the default NaN values are used for parsing

Nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

96 là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61 và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

73 được chỉ định, thì chỉ các giá trị NaN được chỉ định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

73 được sử dụng để phân tích cú pháp

Nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

96 là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61 và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

73 không được chỉ định, sẽ không có chuỗi nào được phân tích thành NaN

Note that if

pip install openpyxl

1210 is passed in as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61, the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

96 and

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

73 parameters will be ignored

na_filter boolean, default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32

Phát hiện các điểm đánh dấu giá trị bị thiếu (chuỗi trống và giá trị của na_values). Trong dữ liệu không có bất kỳ NA nào, việc chuyển

pip install openpyxl

1215 có thể cải thiện hiệu suất đọc một tệp lớn

verbose boolean, default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61

Cho biết số lượng giá trị NA được đặt trong các cột không phải là số

skip_blank_lines boolean, mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32

Nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32, hãy bỏ qua các dòng trống thay vì diễn giải dưới dạng giá trị NaN

Xử lý ngày giờ #

parse_dates boolean or list of ints or names or list of lists or dict, default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61.

Nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32 -> thử phân tích cú pháp chỉ mục

Nếu
```
pip install openpyxl
```
1221 -> thử phân tích từng cột 1, 2, 3 thành một cột ngày riêng biệt
Nếu
```
pip install openpyxl
```
1222 -> kết hợp cột 1 và 3 và phân tích dưới dạng một cột ngày
Nếu
```
pip install openpyxl
```
1223 -> phân tích cột 1, 3 thành ngày và gọi kết quả là 'foo'

Ghi chú

Đường dẫn nhanh tồn tại cho các ngày có định dạng iso8601

infer_datetime_format boolean, default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61

Nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32 và parse_dates được bật cho một cột, hãy thử suy ra định dạng ngày giờ để tăng tốc độ xử lý

keep_date_col boolean, mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32 and parse_dates specifies combining multiple columns then keep the original columns

date_parser , mặc định là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Hàm sử dụng để chuyển đổi một chuỗi các cột chuỗi thành một mảng các thể hiện thời gian. Mặc định sử dụng

pip install openpyxl

1229 để thực hiện chuyển đổi. gấu trúc sẽ cố gắng gọi date_parser theo ba cách khác nhau, chuyển sang cách tiếp theo nếu xảy ra ngoại lệ. 1) Chuyển một hoặc nhiều mảng (như được định nghĩa bởi parse_dates) làm đối số;

dayfirst boolean, default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61

Ngày định dạng DD/MM, định dạng quốc tế và châu Âu

cache_dates boolean, default True

Nếu Đúng, hãy sử dụng bộ nhớ cache của các ngày đã chuyển đổi, duy nhất để áp dụng chuyển đổi ngày giờ. Có thể tạo ra tốc độ tăng đáng kể khi phân tích chuỗi ngày trùng lặp, đặc biệt là các chuỗi có chênh lệch múi giờ

Mới trong phiên bản 0. 25. 0

Lần lặp #

trình lặp boolean, mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61

Trả về đối tượng

pip install openpyxl

1232 để lặp lại hoặc nhận khối với

pip install openpyxl

1233

chunksize int, default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Trả về đối tượng

pip install openpyxl

1232 để lặp lại. Xem lặp lại và phân đoạn bên dưới.

Định dạng trích dẫn, nén và tệp #

nén {

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

34,

pip install openpyxl

1237,

pip install openpyxl

1238,

pip install openpyxl

1239,

pip install openpyxl

1240,

pip install openpyxl

1241,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24,

pip install openpyxl

1243}, mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

34

Để giải nén dữ liệu trên đĩa nhanh chóng. Nếu 'suy ra', thì hãy sử dụng gzip, bz2, zip, xz hoặc zstandard nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

92 giống như đường dẫn kết thúc bằng '. gz', '. bz2', '. nén', '. xz', '. zst', tương ứng và không giải nén nếu không. Nếu sử dụng 'zip', tệp ZIP chỉ được chứa một tệp dữ liệu để đọc trong. Đặt thành

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24 để không giải nén. Can also be a dict with key

pip install openpyxl

1247 set to one of {

pip install openpyxl

1239,

pip install openpyxl

1237,

pip install openpyxl

1238,

pip install openpyxl

1241} and other key-value pairs are forwarded to

pip install openpyxl

1252,

pip install openpyxl

1253,

pip install openpyxl

1254, or

pip install openpyxl

1255. As an example, the following could be passed for faster compression and to create a reproducible gzip archive.

pip install openpyxl

1256

Đã thay đổi trong phiên bản 1. 1. 0. tùy chọn dict được mở rộng để hỗ trợ

pip install openpyxl

1257 và

pip install openpyxl

1258.

Đã thay đổi trong phiên bản 1. 2. 0. Các phiên bản trước đã chuyển tiếp các mục nhập chính tả cho ‘gzip’ tới

pip install openpyxl

1259.

nghìn str, mặc định là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Dấu phân cách hàng nghìn

thập phân str, mặc định

pip install openpyxl

1261

Ký tự để nhận dạng là dấu thập phân. e. g. sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

20 cho dữ liệu châu Âu

float_precision string, default None

Specifies which converter the C engine should use for floating-point values. Các tùy chọn là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24 cho bộ chuyển đổi thông thường,

pip install openpyxl

1264 cho bộ chuyển đổi có độ chính xác cao và

pip install openpyxl

1265 cho bộ chuyển đổi khứ hồi

lineterminator str (độ dài 1), mặc định là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Ký tự để chia tệp thành các dòng. Only valid with C parser

quotechar str (độ dài 1)

The character used to denote the start and end of a quoted item. Các mục được trích dẫn có thể bao gồm dấu phân cách và nó sẽ bị bỏ qua

quoting int or

pip install openpyxl

1267 instance, default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

84

Control field quoting behavior per

pip install openpyxl

1267 constants. Sử dụng một trong số

pip install openpyxl

1270 (0),

pip install openpyxl

1271 (1),

pip install openpyxl

1272 (2) hoặc

pip install openpyxl

1273 (3)

trích dẫn kép boolean, mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32

When

pip install openpyxl

1275 is specified and

pip install openpyxl

1276 is not

pip install openpyxl

1273, indicate whether or not to interpret two consecutive

pip install openpyxl

1275 elements inside a field as a single

pip install openpyxl

1275 element

escapechar str (độ dài 1), mặc định là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Chuỗi một ký tự được sử dụng để thoát khỏi dấu phân cách khi trích dẫn là

pip install openpyxl

1273

nhận xét str, mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Cho biết phần còn lại của dòng không nên được phân tích cú pháp. Nếu được tìm thấy ở đầu dòng, dòng đó sẽ bị bỏ qua hoàn toàn. Tham số này phải là một ký tự đơn. Giống như các dòng trống (miễn là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

39), các dòng nhận xét đầy đủ bị bỏ qua bởi tham số

pip install openpyxl

1284 chứ không phải bởi

pip install openpyxl

1285. For example, if

pip install openpyxl

1286, parsing ‘#empty\na,b,c\n1,2,3’ with

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

35 will result in ‘a,b,c’ being treated as the header

encoding str, default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Mã hóa để sử dụng cho UTF khi đọc/ghi (e. g.

pip install openpyxl

1289). Danh sách mã hóa tiêu chuẩn Python

phương ngữ str hoặc ví dụ

pip install openpyxl

1290, mặc định là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Nếu được cung cấp, thông số này sẽ ghi đè giá trị (mặc định hoặc không) cho các thông số sau.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

33,

pip install openpyxl

1293,

pip install openpyxl

1294,

pip install openpyxl

1295,

pip install openpyxl

1275 và

pip install openpyxl

1276. Nếu cần ghi đè các giá trị, Cảnh báo phân tích cú pháp sẽ được đưa ra. Xem tài liệu

pip install openpyxl

1290 để biết thêm chi tiết

Xử lý lỗi#

error_bad_lines boolean, tùy chọn, mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Dòng có quá nhiều trường (e. g. một dòng csv có quá nhiều dấu phẩy) theo mặc định sẽ gây ra một ngoại lệ và không có

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 nào được trả về. Nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61, thì những "dòng xấu" này sẽ bị loại bỏ khỏi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 được trả về. Xem đường xấu bên dưới.

Không dùng nữa kể từ phiên bản 1. 3. 0. Thay vào đó, nên sử dụng tham số

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0503 để chỉ định hành vi khi gặp phải một dòng xấu.

warn_bad_lines boolean, tùy chọn, mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24

Nếu error_bad_lines là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61, vàWarner_bad_lines là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32, một cảnh báo cho mỗi “dòng xấu” sẽ được xuất ra

Không dùng nữa kể từ phiên bản 1. 3. 0. Thay vào đó, nên sử dụng tham số

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0503 để chỉ định hành vi khi gặp phải một dòng xấu.

on_bad_lines ('lỗi', 'cảnh báo', 'bỏ qua'), 'lỗi' mặc định

Chỉ định những việc cần làm khi gặp phải một dòng xấu (một dòng có quá nhiều trường). Các giá trị được phép là

'lỗi', tăng ParserError khi gặp phải một dòng xấu
‘warn’, in cảnh báo khi gặp dòng xấu và bỏ qua dòng đó
'bỏ qua', bỏ qua các dòng xấu mà không báo trước hoặc cảnh báo khi gặp phải

Mới trong phiên bản 1. 3. 0

Chỉ định kiểu dữ liệu cột#

Bạn có thể chỉ định loại dữ liệu cho toàn bộ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 hoặc từng cột riêng lẻ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

May mắn thay, pandas cung cấp nhiều cách để đảm bảo rằng (các) cột của bạn chỉ chứa một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

88. Nếu chưa quen với những khái niệm này, bạn có thể xem tại đây để tìm hiểu thêm về dtypes và tại đâytại đây . tại đây . tại đây . tại đây . tại đây . tại đây . tại đây . tại đây . tại đây . to learn more about

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

72 conversion in pandas.

Chẳng hạn, bạn có thể sử dụng đối số

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0511 của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

13

pip install openpyxl

12

Hoặc bạn có thể sử dụng hàm

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0513 để ép buộc các dtypes sau khi đọc dữ liệu,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

05

sẽ chuyển đổi tất cả phân tích cú pháp hợp lệ thành float, để lại phân tích cú pháp không hợp lệ là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

46

Cuối cùng, cách bạn xử lý việc đọc trong các cột có chứa các kiểu dữ liệu hỗn hợp tùy thuộc vào nhu cầu cụ thể của bạn. Trong trường hợp trên, nếu bạn muốn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

46 loại bỏ các điểm bất thường về dữ liệu, thì

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0513 có lẽ là lựa chọn tốt nhất của bạn. Tuy nhiên, nếu bạn muốn tất cả dữ liệu được ép buộc, bất kể loại nào, thì việc sử dụng đối số

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0511 của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

13 chắc chắn sẽ đáng để thử

Ghi chú

Trong một số trường hợp, việc đọc dữ liệu bất thường với các cột chứa các kiểu dữ liệu hỗn hợp sẽ dẫn đến tập dữ liệu không nhất quán. Nếu bạn dựa vào gấu trúc để suy ra các kiểu dữ liệu của các cột, công cụ phân tích cú pháp sẽ đi và suy ra các kiểu dữ liệu cho các khối dữ liệu khác nhau, thay vì toàn bộ tập dữ liệu cùng một lúc. Do đó, bạn có thể kết thúc với (các) cột có các kiểu dữ liệu hỗn hợp. Ví dụ,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

24

sẽ dẫn đến

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0519 chứa một kiểu dữ liệu

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0520 cho một số đoạn nhất định của cột và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

15 cho những cột khác do các kiểu dữ liệu hỗn hợp từ dữ liệu được đọc trong. Điều quan trọng cần lưu ý là toàn bộ cột sẽ được đánh dấu bằng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

88 của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

72, được sử dụng cho các cột có kiểu chữ hỗn hợp

Chỉ định phân loại dtype#

Các cột

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0524 có thể được phân tích cú pháp trực tiếp bằng cách chỉ định

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0525 hoặc

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0526

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

33

Các cột riêng lẻ có thể được phân tích cú pháp dưới dạng

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0524 bằng cách sử dụng đặc tả chính tả

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

35

Việc chỉ định

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0525 sẽ dẫn đến một

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0524 không có thứ tự có

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0530 là các giá trị duy nhất được quan sát thấy trong dữ liệu. Để kiểm soát nhiều hơn đối với các danh mục và thứ tự, hãy tạo trước một

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0531 và chuyển mã đó cho

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

88 của cột đó

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

41

Khi sử dụng

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0533, các giá trị "bất ngờ" bên ngoài

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0534 được coi là giá trị bị thiếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

70

Điều này phù hợp với hành vi của

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0535

Ghi chú

Với

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0525, các danh mục kết quả sẽ luôn được phân tích thành chuỗi (đối tượng dtype). If the categories are numeric they can be converted using the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0513 function, or as appropriate, another converter such as

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0538

Khi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

88 là một

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0531 với đồng nhất

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0530 ( tất cả là số, tất cả ngày giờ, v.v. ), quá trình chuyển đổi được thực hiện tự động

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

71

Đặt tên và sử dụng các cột#

Xử lý tên cột#

Một tệp có thể có hoặc không có hàng tiêu đề. gấu trúc giả sử hàng đầu tiên nên được sử dụng làm tên cột

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

72

Bằng cách chỉ định đối số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

49 kết hợp với

pip install openpyxl

1284, bạn có thể chỉ ra các tên khác sẽ sử dụng và có nên loại bỏ hàng tiêu đề hay không (nếu có)

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

73

Nếu tiêu đề nằm trong một hàng khác với hàng đầu tiên, hãy chuyển số hàng cho

pip install openpyxl

1284. Điều này sẽ bỏ qua các hàng trước

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

74

Ghi chú

Hành vi mặc định là suy ra tên cột. nếu không có tên nào được chuyển thì hành vi giống hệt với

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

35 và tên cột được suy ra từ dòng không trống đầu tiên của tệp, nếu tên cột được chuyển rõ ràng thì hành vi giống hệt với

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

36

Phân tích cú pháp tên trùng lặp#

Không dùng nữa kể từ phiên bản 1. 5. 0. ______20547 chưa bao giờ được triển khai và thay vào đó, một đối số mới trong đó mẫu đổi tên có thể được chỉ định sẽ được thêm vào.

Nếu tệp hoặc tiêu đề chứa tên trùng lặp, theo mặc định, gấu trúc sẽ phân biệt giữa chúng để ngăn ghi đè dữ liệu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

75

Không còn dữ liệu trùng lặp vì theo mặc định,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0548 sẽ sửa đổi một loạt các cột trùng lặp 'X', ..., 'X' thành 'X', 'X. 1’, …, ‘X. N'

Lọc cột (

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

47)#

Đối số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

47 cho phép bạn chọn bất kỳ tập hợp con nào của các cột trong một tệp, bằng cách sử dụng tên cột, số vị trí hoặc có thể gọi được

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

76

Đối số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

47 cũng có thể được sử dụng để chỉ định cột nào không được sử dụng trong kết quả cuối cùng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

77

Trong trường hợp này, khả năng gọi được chỉ định rằng chúng tôi loại trừ các cột “a” và “c” khỏi đầu ra

Nhận xét và dòng trống#

Bỏ qua chú thích dòng và dòng trống#

Nếu tham số

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0552 được chỉ định, thì các dòng nhận xét hoàn toàn sẽ bị bỏ qua. Theo mặc định, các dòng hoàn toàn trống cũng sẽ bị bỏ qua

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

78

Nếu

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0553, thì

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

66 sẽ không bỏ qua các dòng trống

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

79

Cảnh báo

Sự hiện diện của các dòng bị bỏ qua có thể tạo ra sự mơ hồ liên quan đến số dòng;

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

50

Nếu cả

pip install openpyxl

1284 và

pip install openpyxl

1285 đều được chỉ định, thì

pip install openpyxl

1284 sẽ liên quan đến phần cuối của

pip install openpyxl

1285. Ví dụ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

51

Bình luận#

Đôi khi nhận xét hoặc dữ liệu meta có thể được bao gồm trong một tệp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

52

Theo mặc định, trình phân tích cú pháp bao gồm các nhận xét trong đầu ra

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

53

We can suppress the comments using the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0552 keyword

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

54

Xử lý dữ liệu Unicode#

Đối số

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0562 nên được sử dụng cho dữ liệu unicode được mã hóa, điều này sẽ dẫn đến kết quả là các chuỗi byte được giải mã thành unicode

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

55

Một số định dạng mã hóa tất cả các ký tự dưới dạng nhiều byte, chẳng hạn như UTF-16, sẽ không phân tích cú pháp chính xác nếu không chỉ định mã hóa. Danh sách đầy đủ các bảng mã tiêu chuẩn của Python

Cột chỉ mục và dấu phân cách #

Nếu một tệp có nhiều hơn một cột dữ liệu so với số lượng tên cột, thì cột đầu tiên sẽ được sử dụng làm tên hàng của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

56

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

57

Thông thường, bạn có thể đạt được hành vi này bằng cách sử dụng tùy chọn

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0564

Có một số trường hợp ngoại lệ khi một tệp đã được chuẩn bị với các dấu phân cách ở cuối mỗi dòng dữ liệu, gây nhầm lẫn cho trình phân tích cú pháp. Để vô hiệu hóa rõ ràng suy luận cột chỉ mục và loại bỏ cột cuối cùng, hãy vượt qua

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

44

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

58

Nếu một tập hợp con dữ liệu đang được phân tích cú pháp bằng tùy chọn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

47, thì thông số kỹ thuật của

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0564 dựa trên tập hợp con đó, không phải dữ liệu gốc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

59

Xử lý ngày#

Chỉ định cột ngày #

Để tạo điều kiện thuận lợi hơn khi làm việc với dữ liệu ngày giờ,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

13 sử dụng các đối số từ khóa

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0569 và

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0570 để cho phép người dùng chỉ định nhiều cột và định dạng ngày/giờ để biến dữ liệu văn bản đầu vào thành các đối tượng

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0571

Trường hợp đơn giản nhất là chỉ cần vượt qua trong

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0572

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

0

Thông thường, chúng tôi có thể muốn lưu trữ dữ liệu ngày và giờ riêng biệt hoặc lưu trữ các trường ngày khác nhau một cách riêng biệt. từ khóa

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0569 có thể được sử dụng để chỉ định tổ hợp các cột để phân tích ngày và/hoặc thời gian từ

Bạn có thể chỉ định một danh sách các danh sách cột thành

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0569, các cột ngày kết quả sẽ được thêm vào đầu ra (để không ảnh hưởng đến thứ tự cột hiện có) và các tên cột mới sẽ là phần nối của các tên cột thành phần

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

1

Theo mặc định, trình phân tích cú pháp loại bỏ các cột ngày của thành phần, nhưng bạn có thể chọn giữ lại chúng thông qua từ khóa

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0575

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

2

Lưu ý rằng nếu bạn muốn kết hợp nhiều cột thành một cột ngày, thì phải sử dụng danh sách lồng nhau. Nói cách khác,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0576 chỉ ra rằng mỗi cột thứ hai và thứ ba phải được phân tích cú pháp thành các cột ngày riêng biệt trong khi

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0577 có nghĩa là hai cột phải được phân tích cú pháp thành một cột

Bạn cũng có thể sử dụng lệnh để chỉ định các cột tên tùy chỉnh

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

3

Điều quan trọng cần nhớ là nếu nhiều cột văn bản được phân tích thành một cột ngày, thì một cột mới sẽ được thêm vào trước dữ liệu. Thông số kỹ thuật của

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0564 dựa trên tập hợp cột mới này thay vì các cột dữ liệu ban đầu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

4

Ghi chú

Nếu một cột hoặc chỉ mục chứa ngày không thể phân tích cú pháp, thì toàn bộ cột hoặc chỉ mục đó sẽ được trả về không thay đổi dưới dạng kiểu dữ liệu đối tượng. Để phân tích cú pháp ngày giờ không chuẩn, hãy sử dụng

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0538 sau

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0580

Ghi chú

read_csv có fast_path để phân tích chuỗi ngày giờ ở định dạng iso8601, e. g “2000-01-01T00. 01. 02+00. 00” và các biến thể tương tự. Nếu bạn có thể sắp xếp dữ liệu của mình để lưu trữ thời gian ở định dạng này, thì thời gian tải sẽ nhanh hơn đáng kể, đã quan sát được ~20 lần

Chức năng phân tích ngày #

Finally, the parser allows you to specify a custom

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0570 function to take full advantage of the flexibility of the date parsing API

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

5

gấu trúc sẽ cố gắng gọi hàm

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0570 theo ba cách khác nhau. Nếu một ngoại lệ được đưa ra, ngoại lệ tiếp theo sẽ được thử

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0570 được gọi đầu tiên với một hoặc nhiều mảng làm đối số, như được định nghĩa bằng cách sử dụng

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0569 (e. g. ,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0585)

Nếu #1 không thành công, thì

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0570 được gọi với tất cả các cột được nối theo hàng thành một mảng duy nhất (e. g. ,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0587)

Lưu ý rằng về hiệu suất, bạn nên thử các phương pháp phân tích ngày này theo thứ tự

Cố gắng suy ra định dạng bằng cách sử dụng

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0588 (xem phần bên dưới)

Nếu bạn biết định dạng, hãy sử dụng

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0589.

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0590

Nếu bạn có định dạng thực sự không chuẩn, hãy sử dụng hàm
```
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
```
0570 tùy chỉnh. Để có hiệu suất tối ưu, điều này nên được vector hóa, tôi. e. , nó sẽ chấp nhận mảng làm đối số

Phân tích cú pháp CSV với các múi giờ hỗn hợp#

pandas không thể đại diện cho một cột hoặc chỉ mục với các múi giờ hỗn hợp. Nếu tệp CSV của bạn chứa các cột có nhiều múi giờ khác nhau, thì kết quả mặc định sẽ là cột kiểu đối tượng có chuỗi, ngay cả với

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0569

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

6

Để phân tích cú pháp các giá trị múi giờ hỗn hợp dưới dạng cột ngày giờ, hãy chuyển một

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0538 được áp dụng một phần với

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0594 là

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0570

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7

Suy ra định dạng ngày giờ #

If you have

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0569 enabled for some or all of your columns, and your datetime strings are all formatted the same way, you may get a large speed up by setting

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0588. If set, pandas will attempt to guess the format of your datetime strings, and then use a faster means of parsing the strings. 5-10x parsing speeds have been observed. pandas will fallback to the usual parsing if either the format cannot be guessed or the format that was guessed cannot properly parse the entire column of strings. So in general,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0598 should not have any negative consequences if enabled

Here are some examples of datetime strings that can be guessed (All representing December 30th, 2011 at 00. 00. 00)

“20111230”
“2011/12/30”
“20111230 00. 00. 00”
“12/30/2011 00. 00. 00”
“30/Dec/2011 00. 00. 00”
“30/December/2011 00. 00. 00”

Note that

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0598 is sensitive to

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2400. With

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2401, it will guess “01/12/2011” to be December 1st. With

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2402 (default) it will guess “01/12/2011” to be January 12th

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

8

International date formats#

While US date formats tend to be MM/DD/YYYY, many international formats use DD/MM/YYYY instead. For convenience, a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2400 keyword is provided

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

9

Writing CSVs to binary file objects#

New in version 1. 2. 0

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2404 allows writing a CSV to a file object opened binary mode. In most cases, it is not necessary to specify

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2405 as Pandas will auto-detect whether the file object is opened in text or binary mode

pip install openpyxl

120

Specifying method for floating-point conversion#

The parameter

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2406 can be specified in order to use a specific floating-point converter during parsing with the C engine. The options are the ordinary converter, the high-precision converter, and the round-trip converter (which is guaranteed to round-trip values after writing to a file). For example

pip install openpyxl

121

Thousand separators#

For large numbers that have been written with a thousands separator, you can set the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2407 keyword to a string of length 1 so that integers will be parsed correctly

By default, numbers with a thousands separator will be parsed as strings

pip install openpyxl

122

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2407 keyword allows integers to be parsed correctly

pip install openpyxl

123

NA values#

To control which values are parsed as missing values (which are signified by

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

46), specify a string in

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

73. If you specify a list of strings, then all values in it are considered to be missing values. If you specify a number (a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2411, like

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2412 or an

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2413 like

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2414), the corresponding equivalent values will also imply a missing value (in this case effectively

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2415 are recognized as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

46)

To completely override the default values that are recognized as missing, specify

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2417

The default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

46 recognized values are

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2419

Let us consider some examples

pip install openpyxl

124

In the example above

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2414 and

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2412 will be recognized as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

46, in addition to the defaults. A string will first be interpreted as a numerical

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2414, then as a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

46

pip install openpyxl

125

Above, only an empty field will be recognized as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

46

pip install openpyxl

126

Above, both

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2426 and

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

84 as strings are

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

46

pip install openpyxl

127

The default values, in addition to the string

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2429 are recognized as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

46

Infinity#

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2431 like values will be parsed as

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2432 (positive infinity), and

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2433 as

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2434 (negative infinity). These will ignore the case of the value, meaning

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2435, will also be parsed as

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2432

Returning Series#

Using the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2437 keyword, the parser will return output with a single column as a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

62

Deprecated since version 1. 4. 0. Users should append

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

63 to the DataFrame returned by

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

66 instead.

pip install openpyxl

128

Boolean values#

The common values

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2443, and

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2444 are all recognized as boolean. Occasionally you might want to recognize other values as being boolean. To do this, use the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2445 and

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2446 options as follows

pip install openpyxl

129

Handling “bad” lines#

Some files may have malformed lines with too few fields or too many. Lines with too few fields will have NA values filled in the trailing fields. Lines with too many fields will raise an error by default

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

050

You can elect to skip bad lines

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

051

Or pass a callable function to handle the bad line if

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2447. The bad line will be a list of strings that was split by the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2448

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

052

You can also use the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

47 parameter to eliminate extraneous column data that appear in some lines but not others

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

053

In case you want to keep all data including the lines with too many fields, you can specify a sufficient number of

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

49. This ensures that lines with not enough fields are filled with

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

46

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

054

Dialect#

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2452 keyword gives greater flexibility in specifying the file format. By default it uses the Excel dialect but you can specify either the dialect name or a

pip install openpyxl

1290 instance

Suppose you had data with unenclosed quotes

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

055

By default,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

66 uses the Excel dialect and treats the double quote as the quote character, which causes it to fail when it finds a newline before it finds the closing double quote

We can get around this using

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2452

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

056

All of the dialect options can be specified separately by keyword arguments

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

057

Another common dialect option is

pip install openpyxl

1295, to skip any whitespace after a delimiter

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

058

The parsers make every attempt to “do the right thing” and not be fragile. Type inference is a pretty big deal. If a column can be coerced to integer dtype without altering the contents, the parser will do so. Any non-numeric columns will come through as object dtype as with the rest of pandas objects

Quoting and Escape Characters#

Quotes (and other escape characters) in embedded fields can be handled in any number of ways. One way is to use backslashes; to properly parse this data, you should pass the

pip install openpyxl

1294 option

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

059

Files with fixed width columns#

While

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

13 reads delimited data, the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2459 function works with data files that have known and fixed column widths. The function parameters to

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2460 are largely the same as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

66 with two extra parameters, and a different usage of the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

33 parameter

```
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
```
2463. A list of pairs (tuples) giving the extents of the fixed-width fields of each line as half-open intervals (i. e. , [from, to[ ). String value ‘infer’ can be used to instruct the parser to try detecting the column specifications from the first 100 rows of the data. Default behavior, if not specified, is to infer

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2464. A list of field widths which can be used instead of ‘colspecs’ if the intervals are contiguous

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

33. Characters to consider as filler characters in the fixed-width file. Can be used to specify the filler character of the fields if it is not spaces (e. g. , ‘~’)

Consider a typical fixed-width data file

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

240

In order to parse this file into a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43, we simply need to supply the column specifications to the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2460 function along with the file name

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

241

Note how the parser automatically picks column names X. when

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

36 argument is specified. Alternatively, you can supply just the column widths for contiguous columns:

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

242

The parser will take care of extra white spaces around the columns so it’s ok to have extra separation between the columns in the file

By default,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2460 will try to infer the file’s

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2463 by using the first 100 rows of the file. It can do it only in cases when the columns are aligned and correctly separated by the provided

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

33 (default delimiter is whitespace)

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

243

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2460 supports the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

88 parameter for specifying the types of parsed columns to be different from the inferred type

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

244

Indexes#

Files with an “implicit” index column#

Consider a file with one less entry in the header than the number of data column

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

245

In this special case,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

66 assumes that the first column is to be used as the index of the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

246

Note that the dates weren’t automatically parsed. In that case you would need to do as before

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

247

Reading an index with a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2476#

Suppose you have data indexed by two columns

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

248

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0564 argument to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

66 can take a list of column numbers to turn multiple columns into a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2476 for the index of the returned object

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

249

Reading columns with a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2476#

By specifying list of row locations for the

pip install openpyxl

1284 argument, you can read in a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2476 for the columns. Specifying non-consecutive rows will skip the intervening rows

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

330

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

66 is also able to interpret a more common format of multi-columns indices

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

331

Ghi chú

If an

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0564 is not specified (e. g. you don’t have an index, or wrote it with

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2485, then any

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

49 on the columns index will be lost

Automatically “sniffing” the delimiter#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

66 is capable of inferring delimited (not necessarily comma-separated) files, as pandas uses the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

25 class of the csv module. For this, you have to specify

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2489

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

332

Reading multiple files to create a single DataFrame#

It’s best to use

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2490 to combine multiple files. See the cookbook for an example.

Iterating through files chunk by chunk#

Suppose you wish to iterate through a (potentially very large) file lazily rather than reading the entire file into memory, such as the following

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

333

By specifying a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

90 to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

66, the return value will be an iterable object of type

pip install openpyxl

1232

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

334

Changed in version 1. 2.

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2494 return a context-manager when iterating through a file.

Specifying

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2495 will also return the

pip install openpyxl

1232 object

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

335

Specifying the parser engine#

Pandas currently supports three engines, the C engine, the python engine, and an experimental pyarrow engine (requires the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2497 package). In general, the pyarrow engine is fastest on larger workloads and is equivalent in speed to the C engine on most other workloads. The python engine tends to be slower than the pyarrow and C engines on most workloads. However, the pyarrow engine is much less robust than the C engine, which lacks a few features compared to the Python engine

Where possible, pandas uses the C parser (specified as

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2498), but it may fall back to Python if C-unsupported options are specified

Currently, options unsupported by the C and pyarrow engines include

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2448 other than a single character (e. g. regex separators)

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3300

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2489 with

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3302

Specifying any of the above options will produce a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3303 unless the python engine is selected explicitly using

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3304

Options that are unsupported by the pyarrow engine which are not covered by the list above include

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2406

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

90

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0552

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3308

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2407

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3310

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2452

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3312

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3313

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0503

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3315

```
pip install openpyxl
```
1276

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3317

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0511

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3319

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

91

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2400

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0598

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3323

```
pip install openpyxl
```
1295

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3325

Specifying these options with

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3326 will raise a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3327

Reading/writing remote files#

You can pass in a URL to read or write remote files to many of pandas’ IO functions - the following example shows reading a CSV file

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

336

Mới trong phiên bản 1. 3. 0

A custom header can be sent alongside HTTP(s) requests by passing a dictionary of header key value mappings to the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3328 keyword argument as shown below

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

337

All URLs which are not local files or HTTP(s) are handled by fsspec, if installed, and its various filesystem implementations (including Amazon S3, Google Cloud, SSH, FTP, webHDFS…). Some of these implementations will require additional packages to be installed, for example S3 URLs require the s3fs library

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

338

When dealing with remote storage systems, you might need extra configuration with environment variables or config files in special locations. For example, to access data in your S3 bucket, you will need to define credentials in one of the several ways listed in the S3Fs documentation. The same is true for several of the storage backends, and you should follow the links at fsimpl1 for implementations built into

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3329 and fsimpl2 for those not included in the main

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3329 distribution

You can also pass parameters directly to the backend driver. For example, if you do not have S3 credentials, you can still access public data by specifying an anonymous connection, such as

New in version 1. 2. 0

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

339

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3329 also allows complex URLs, for accessing data in compressed archives, local caching of files, and more. To locally cache the above example, you would modify the call to

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

350

where we specify that the “anon” parameter is meant for the “s3” part of the implementation, not to the caching implementation. Note that this caches to a temporary directory for the duration of the session only, but you can also specify a permanent store

Writing out data#

Writing to CSV format#

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

62 and

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 objects have an instance method

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3334 which allows storing the contents of the object as a comma-separated-values file. The function takes a number of arguments. Only the first is required

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3335. A string path to the file to write or a file object. If a file object it must be opened with

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3336

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2448 . Field delimiter for the output file (default “,”)

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3338. A string representation of a missing value (default ‘’)

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3339. Format string for floating point numbers

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3340. Columns to write (default None)

```
pip install openpyxl
```
1284. Whether to write out the column names (default True)

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3342. whether to write row (index) names (default True)

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3343. Column label(s) for index column(s) if desired. Nếu Không có (mặc định) và

pip install openpyxl

1284 và

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3342 là Đúng, thì tên chỉ mục được sử dụng. (A sequence should be given if the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 uses MultiIndex)

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2405 . Python write mode, default ‘w’

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0562. a string representing the encoding to use if the contents are non-ASCII, for Python versions prior to 3

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3317. Character sequence denoting line end (default

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3350)

```
pip install openpyxl
```
1276. Set quoting rules as in csv module (default csv. QUOTE_MINIMAL). Note that if you have set a
```
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
```
3339 then floats are converted to strings and csv. QUOTE_NONNUMERIC will treat them as non-numeric
```
pip install openpyxl
```
1275. Character used to quote fields (default ‘”’)
```
pip install openpyxl
```
1293. Control quoting of
```
pip install openpyxl
```
1275 in fields (default True)

pip install openpyxl

1294. Character used to escape

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2448 and

pip install openpyxl

1275 when appropriate (default None)

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

90. Number of rows to write at a time

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3360. Format string for datetime objects

Writing a formatted string#

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 object has an instance method

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3362 which allows control over the string representation of the object. All arguments are optional

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3363 default None, for example a StringIO object

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3340 default None, which columns to write

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3365 default None, minimum width of each column

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3338 default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

46, representation of NA value

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3368 default None, a dictionary (by column) of functions each of which takes a single argument and returns a formatted string

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3339 default None, a function which takes a single (float) argument and returns a formatted string; to be applied to floats in the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3371 default True, set to False for a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 with a hierarchical index to print every MultiIndex key at each row

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3373 default True, will print the names of the indices

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3342 default True, will print the index (ie, row labels)

```
pip install openpyxl
```
1284 default True, will print the column labels

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3376 default

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3377, will print column headers left- or right-justified

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

62 object also has a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3362 method, but with only the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3363,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3338,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3339 arguments. Ngoài ra còn có một đối số

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3383, nếu được đặt thành

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32, sẽ xuất thêm độ dài của Sê-ri

JSON#

Read and write

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3385 format files and strings

Writing JSON#

A

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

62 or

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 can be converted to a valid JSON string. Use

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3388 with optional parameters

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3335 . the pathname or buffer to write the output This can be

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24 in which case a JSON string is returned

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3391

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

62

default is

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3342

allowed values are {

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3394,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3395,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3342}

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43

default is

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3340

allowed values are {

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3394,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3395,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3342,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3340,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3503,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3504}

The format of the JSON string

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3394

dict like {index -> [index], columns -> [columns], data -> [values]}

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3395

list like [{column -> value}, … , {column -> value}]

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3342

dict like {index -> {column -> value}}

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3340

dict like {column -> {index -> value}}

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3503

just the values array

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3504

adhering to the JSON Table Schema

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3360 . string, type of date conversion, ‘epoch’ for timestamp, ‘iso’ for ISO8601

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3512 . The number of decimal places to use when encoding floating point values, default 10

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3513 . force encoded string to be ASCII, default True

```
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
```
3514 . The time unit to encode to, governs timestamp and ISO8601 precision. One of ‘s’, ‘ms’, ‘us’ or ‘ns’ for seconds, milliseconds, microseconds and nanoseconds respectively. Default ‘ms’
```
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
```
3515 . The handler to call if an object cannot otherwise be converted to a suitable format for JSON. Takes a single argument, which is the object to convert, and returns a serializable object

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3516 . If

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3395 orient, then will write each record per line as json

Note

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

46’s,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3519’s and

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24 will be converted to

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3521 and

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0571 objects will be converted based on the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3360 and

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3514 parameters

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

351

Orient options#

There are a number of different options for the format of the resulting JSON file / string. Consider the following

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 and

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

62

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

352

Column oriented (the default for

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43) serializes the data as nested JSON objects with column labels acting as the primary index

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

353

Index oriented (the default for

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

62) similar to column oriented but the index labels are now primary

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

354

Record oriented serializes the data to a JSON array of column -> value records, index labels are not included. This is useful for passing

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 data to plotting libraries, for example the JavaScript library

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3530

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

355

Value oriented is a bare-bones option which serializes to nested JSON arrays of values only, column and index labels are not included

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

356

Split oriented serializes to a JSON object containing separate entries for values, index and columns. Name is also included for

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

62

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

357

Table oriented serializes to the JSON Table Schema, allowing for the preservation of metadata including but not limited to dtypes and index names

Ghi chú

Any orient option that encodes to a JSON object will not preserve the ordering of index and column labels during round-trip serialization. If you wish to preserve label ordering use the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3394 option as it uses ordered containers

Date handling#

Writing in ISO date format

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

358

Writing in ISO date format, with microseconds

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

359

Epoch timestamps, in seconds

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

410

Writing to a file, with a date index and a date column

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

411

Fallback behavior#

If the JSON serializer cannot handle the container contents directly it will fall back in the following manner

if the dtype is unsupported (e. g.

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3533) then the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3515, if provided, will be called for each value, otherwise an exception is raised

if an object is unsupported it will attempt the following

check if the object has defined a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3535 method and call it. A

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3535 method should return a

pip install openpyxl

1243 which will then be JSON serialized

invoke the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3515 if one was provided

convert the object to a

pip install openpyxl

1243 by traversing its contents. However this will often fail with an

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3540 or give unexpected results

In general the best approach for unsupported objects or dtypes is to provide a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3515. For example

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

412

can be dealt with by specifying a simple

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3515

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

413

Reading JSON#

Reading a JSON string to pandas object can take a number of parameters. The parser will try to parse a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 if

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3544 is not supplied or is

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24. To explicitly force

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

62 parsing, pass

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3547

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

92 . a VALID JSON string or file handle / StringIO. The string could be a URL. Valid URL schemes include http, ftp, S3, and file. For file URLs, a host is expected. For instance, a local file could be file . //localhost/path/to/table. json

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3544 . type of object to recover (series or frame), default ‘frame’

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3391

Series

default is

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3342

allowed values are {

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3394,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3395,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3342}

DataFrame

default is

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3340

allowed values are {

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3394,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3395,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3342,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3340,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3503,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3504}

The format of the JSON string

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3394

dict like {index -> [index], columns -> [columns], data -> [values]}

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3395

list like [{column -> value}, … , {column -> value}]

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3342

dict like {index -> {column -> value}}

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3340

dict like {column -> {index -> value}}

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3503

just the values array

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3504

adhering to the JSON Table Schema

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

88 . if True, infer dtypes, if a dict of column to dtype, then use those, if

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61, then don’t infer dtypes at all, default is True, apply only to the data

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3570 . boolean, try to convert the axes to the proper dtypes, default is

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3572. một danh sách các cột để phân tích ngày tháng;

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3575 . boolean, default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32. If parsing dates, then parse the default date-like columns

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3577 . direct decoding to NumPy arrays. default is

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61; Supports numeric data only, although labels may be non-numeric. Also note that the JSON ordering MUST be the same for each term if

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3579

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3580 . boolean, default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61. Set to enable usage of higher precision (strtod) function when decoding string to double values. Default (

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61) is to use fast but less precise builtin functionality

```
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
```
3514 . string, the timestamp unit to detect if converting dates. Default None. By default the timestamp precision will be detected, if this is not desired then pass one of ‘s’, ‘ms’, ‘us’ or ‘ns’ to force timestamp precision to seconds, milliseconds, microseconds or nanoseconds respectively

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3516 . reads file as one json object per line

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0562 . The encoding to use to decode py3 bytes

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

90 . when used in combination with

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3587, return a JsonReader which reads in

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

90 lines per iteration

The parser will raise one of

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3589 if the JSON is not parseable

If a non-default

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3391 was used when encoding to JSON be sure to pass the same option here so that decoding produces sensible results, see Orient Options for an overview

Data conversion#

The default of

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3591,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3592, and

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3593 will try to parse the axes, and all of the data into appropriate types, including dates. If you need to override specific dtypes, pass a dict to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

88.

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3570 should only be set to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61 if you need to preserve string-like numbers (e. g. ‘1’, ‘2’) in an axes

Ghi chú

Large integer values may be converted to dates if

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3593 and the data and / or column labels appear ‘date-like’. The exact threshold depends on the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3514 specified. ‘date-like’ means that the column label meets one of the following criteria

it ends with

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3599

it ends with

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4100

it begins with

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4101

it is

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4102

it is

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4103

Cảnh báo

When reading JSON data, automatic coercing into dtypes has some quirks

an index can be reconstructed in a different order from serialization, that is, the returned order is not guaranteed to be the same as before serialization
a column that was
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2411 data will be converted to
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2413 if it can be done safely, e. g. a column of
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
4106
bool columns will be converted to
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
2413 on reconstruction

Thus there are times where you may want to specify specific dtypes via the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

88 keyword argument

Reading from a JSON string

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

414

Reading from a file

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

415

Don’t convert any data (but still convert axes and dates)

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

416

Specify dtypes for conversion

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

417

Preserve string indices

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

418

Dates written in nanoseconds need to be read back in nanoseconds

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

419

The Numpy parameter#

Ghi chú

This param has been deprecated as of version 1. 0. 0 and will raise a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4109

This supports numeric data only. Index and columns labels may be non-numeric, e. g. strings, dates etc

If

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3579 is passed to

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4111 an attempt will be made to sniff an appropriate dtype during deserialization and to subsequently decode directly to NumPy arrays, bypassing the need for intermediate Python objects

This can provide speedups if you are deserialising a large amount of numeric data

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

700

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

701

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

702

The speedup is less noticeable for smaller datasets

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

703

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

704

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

705

Cảnh báo

Direct NumPy decoding makes a number of assumptions and may fail or produce unexpected output if these assumptions are not satisfied

data is numeric
data is uniform. The dtype is sniffed from the first value decoded. A
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3327 may be raised, or incorrect output may be produced if this condition is not satisfied
labels are ordered. Labels are only read from the first container, it is assumed that each subsequent row / column has been encoded in the same order. This should be satisfied if the data was encoded using
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
3388 but may not be the case if the JSON is from another source

Normalization#

pandas provides a utility function to take a dict or list of dicts and normalize this semi-structured data into a flat table

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

706

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

707

The max_level parameter provides more control over which level to end normalization. With max_level=1 the following snippet normalizes until 1st nesting level of the provided dict

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

708

Line delimited json#

pandas is able to read and write line-delimited json files that are common in data processing pipelines using Hadoop or Spark

For line-delimited json files, pandas can also return an iterator which reads in

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

90 lines at a time. This can be useful for large files or to read from a stream

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

709

Table schema#

Table Schema is a spec for describing tabular datasets as a JSON object. The JSON includes information on the field names, types, and other attributes. You can use the orient

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3504 to build a JSON string with two fields,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4116 and

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

56

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

710

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4116 field contains the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4119 key, which itself contains a list of column name to type pairs, including the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4120 or

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2476 (see below for a list of types). The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4116 field also contains a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4123 field if the (Multi)index is unique

The second field,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

56, contains the serialized data with the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3395 orient. The index is included, and any datetimes are ISO 8601 formatted, as required by the Table Schema spec

The full list of types supported are described in the Table Schema spec. This table shows the mapping from pandas types

pandas type

Table Schema type

int64

integer

float64

number

bool

boolean

datetime64[ns]

datetime

timedelta64[ns]

duration

categorical

any

object

str

A few notes on the generated table schema

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4116 object contains a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4127 field. This contains the version of pandas’ dialect of the schema, and will be incremented with each revision

All dates are converted to UTC when serializing. Even timezone naive values, which are treated as UTC with an offset of 0

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

711

datetimes with a timezone (before serializing), include an additional field

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4128 with the time zone name (e. g.

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4129)

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

712

Periods are converted to timestamps before serialization, and so have the same behavior of being converted to UTC. In addition, periods will contain and additional field

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4130 with the period’s frequency, e. g.

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4131

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

713

Categoricals use the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4132 type and an

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4133 constraint listing the set of possible values. Additionally, an

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4134 field is included

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

714

A

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4123 field, containing an array of labels, is included if the index is unique

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

715

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4123 behavior is the same with MultiIndexes, but in this case the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4123 is an array

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

716

The default naming roughly follows these rules

For series, the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4138 is used. If that’s none, then the name is

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3503

For

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4140, the stringified version of the column name is used

For

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4120 (not

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2476),

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4143 is used, with a fallback to

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3342 if that is None

For

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2476,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4146 is used. If any level has no name, then

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4147 is used

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4111 also accepts

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4149 as an argument. This allows for the preservation of metadata such as dtypes and index names in a round-trippable manner

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

717

Please note that the literal string ‘index’ as the name of an

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4120 is not round-trippable, nor are any names beginning with

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4151 within a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2476. These are used by default in

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4153 to indicate missing values and the subsequent read cannot distinguish the intent

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

718

When using

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4149 along with user-defined

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4155, the generated schema will contain an additional

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4156 key in the respective

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4119 element. This extra key is not standard but does enable JSON roundtrips for extension types (e. g.

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4158)

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4156 key carries the name of the extension, if you have properly registered the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4160, pandas will use said name to perform a lookup into the registry and re-convert the serialized data into your custom dtype

HTML#

Reading HTML content#

Cảnh báo

We highly encourage you to read the HTML Table Parsing gotchas below regarding the issues surrounding the BeautifulSoup4/html5lib/lxml parsers.

The top-level

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4161 function can accept an HTML string/file/URL and will parse HTML tables into list of pandas

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4140. Let’s look at a few examples

Ghi chú

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4163 returns a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4164 of

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 objects, even if there is only a single table contained in the HTML content

Read a URL with no options

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

719

Ghi chú

The data from the above URL changes every Monday so the resulting data above may be slightly different

Read in the content of the file from the above URL and pass it to

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4163 as a string

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

720

You can even pass in an instance of

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

11 if you so desire

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

721

Ghi chú

The following examples are not run by the IPython evaluator due to the fact that having so many network-accessing functions slows down the documentation build. If you spot an error or an example that doesn’t run, please do not hesitate to report it over on pandas GitHub issues page

Read a URL and match a table that contains specific text

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

722

Specify a header row (by default

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4168 or

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4169 elements located within a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4170 are used to form the column index, if multiple rows are contained within

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4170 then a MultiIndex is created); if specified, the header row is taken from the data minus the parsed header elements (

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4168 elements)

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

723

Specify an index column

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

724

Specify a number of rows to skip

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

725

Specify a number of rows to skip using a list (

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4173 works as well)

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

726

Specify an HTML attribute

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

727

Specify values that should be converted to NaN

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

728

Specify whether to keep the default set of NaN values

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

729

Specify converters for columns. This is useful for numerical text data that has leading zeros. By default columns that are numerical are cast to numeric types and the leading zeros are lost. To avoid this, we can convert these columns to strings

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

730

Use some combination of the above

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

731

Read in pandas

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4174 output (with some loss of floating point precision)

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

732

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4175 backend will raise an error on a failed parse if that is the only parser you provide. If you only have a single parser you can provide just a string, but it is considered good practice to pass a list with one string if, for example, the function expects a sequence of strings. You may use

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

733

Or you could pass

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4176 without a list

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

734

However, if you have bs4 and html5lib installed and pass

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24 or

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4178 then the parse will most likely succeed. Note that as soon as a parse succeeds, the function will return

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

735

Links can be extracted from cells along with the text using

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4179

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

736

New in version 1. 5. 0

Writing to HTML files#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 objects have an instance method

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4174 which renders the contents of the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 as an HTML table. Các đối số của hàm như trong phương thức

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3362 được mô tả ở trên

Ghi chú

Không phải tất cả các tùy chọn có thể có cho

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4184 đều được hiển thị ở đây vì lý do ngắn gọn. See

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4185 for the full set of options

Ghi chú

In an HTML-rendering supported environment like a Jupyter Notebook,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4186 will render the raw HTML into the environment

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

737

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3340 argument will limit the columns shown

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

738

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3339 takes a Python callable to control the precision of floating point values

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

739

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4189 will make the row labels bold by default, but you can turn that off

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

740

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4190 argument provides the ability to give the resulting HTML table CSS classes. Note that these classes are appended to the existing

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4191 class

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

741

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4192 argument provides the ability to add hyperlinks to cells that contain URLs

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

742

Finally, the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4193 argument allows you to control whether the “<”, “>” and “&” characters escaped in the resulting HTML (by default it is

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32). So to get the HTML without escaped characters pass

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4195

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

743

Escaped

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

744

Not escaped

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

745

Ghi chú

Some browsers may not show a difference in the rendering of the previous two HTML tables

HTML Table Parsing Gotchas#

There are some versioning issues surrounding the libraries that are used to parse HTML tables in the top-level pandas io function

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4163

Issues with lxml

Benefits
- lxml is very fast
- lxml requires Cython to install correctly
Drawbacks
- lxml does not make any guarantees about the results of its parse unless it is given strictly valid markup
- In light of the above, we have chosen to allow you, the user, to use the lxml backend, but this backend will use html5lib if lxml fails to parse
- It is therefore highly recommended that you install both BeautifulSoup4 and html5lib, so that you will still get a valid result (provided everything else is valid) even if lxml fails

Issues with BeautifulSoup4 using lxml as a backend

The above issues hold here as well since BeautifulSoup4 is essentially just a wrapper around a parser backend

Issues with BeautifulSoup4 using html5lib as a backend

Benefits
- html5lib is far more lenient than lxml and consequently deals with real-life markup in a much saner way rather than just, e. g. , dropping an element without notifying you
- html5lib generates valid HTML5 markup from invalid markup automatically. This is extremely important for parsing HTML tables, since it guarantees a valid document. However, that does NOT mean that it is “correct”, since the process of fixing markup does not have a single definition
- html5lib is pure Python and requires no additional build steps beyond its own installation
Drawbacks
- The biggest drawback to using html5lib is that it is slow as molasses. However consider the fact that many tables on the web are not big enough for the parsing algorithm runtime to matter. It is more likely that the bottleneck will be in the process of reading the raw text from the URL over the web, i. e. , IO (input-output). For very large tables, this might not be true

LaTeX#

Mới trong phiên bản 1. 3. 0

Currently there are no methods to read from LaTeX, only output methods

Writing to LaTeX files#

Ghi chú

DataFrame and Styler objects currently have a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4197 method. We recommend using the Styler. to_latex() method over DataFrame. to_latex() due to the former’s greater flexibility with conditional styling, and the latter’s possible future deprecation.

Review the documentation for Styler. to_latex , which gives examples of conditional styling and explains the operation of its keyword arguments.

For simple application the following pattern is sufficient

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

746

To format values before output, chain the Styler. format method.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

747

XML#

Đọc XML#

Mới trong phiên bản 1. 3. 0

Hàm

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4198 cấp cao nhất có thể chấp nhận một chuỗi/tệp/URL XML và sẽ phân tích cú pháp các nút và thuộc tính thành một gấu trúc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43

Ghi chú

Since there is no standard XML structure where design types can vary in many ways,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7000 works best with flatter, shallow versions. If an XML document is deeply nested, use the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7001 feature to transform XML into a flatter version

Let’s look at a few examples

Read an XML string

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

748

Read a URL with no options

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

749

Read in the content of the “books. xml” file and pass it to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7000 as a string

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

750

Read in the content of the “books. xml” as instance of

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

11 or

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7004 and pass it to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7000

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

751

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

752

Even read XML from AWS S3 buckets such as NIH NCBI PMC Article Datasets providing Biomedical and Life Science Jorurnals

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

753

With lxml as default

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7006, you access the full-featured XML library that extends Python’s ElementTree API. One powerful tool is ability to query nodes selectively or conditionally with more expressive XPath

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

754

Specify only elements or only attributes to parse

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

755

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

756

XML documents can have namespaces with prefixes and default namespaces without prefixes both of which are denoted with a special attribute

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7007. In order to parse by node under a namespace context,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7008 must reference a prefix

For example, below XML contains a namespace with prefix,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7009, and URI at

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7010. In order to parse

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7011 nodes,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7012 must be used

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

757

Similarly, an XML document can have a default namespace without prefix. Failing to assign a temporary prefix will return no nodes and raise a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3327. But assigning any temporary name to correct URI allows parsing by nodes

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

758

However, if XPath does not reference node names such as default,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7014, then

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7012 is not required

With lxml as parser, you can flatten nested XML documents with an XSLT script which also can be string/file/URL types. As background, XSLT is a special-purpose language written in a special XML file that can transform original XML documents into other XML, HTML, even text (CSV, JSON, etc. ) using an XSLT processor

For example, consider this somewhat nested structure of Chicago “L” Rides where station and rides elements encapsulate data in their own sections. With below XSLT,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4175 can transform original nested document into a flatter output (as shown below for demonstration) for easier parse into

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

759

For very large XML files that can range in hundreds of megabytes to gigabytes,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7018 supports parsing such sizeable files using lxml’s iterparse and etree’s iterparse which are memory-efficient methods to iterate through an XML tree and extract specific elements and attributes. without holding entire tree in memory

New in version 1. 5. 0

To use this feature, you must pass a physical XML file path into

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7000 and use the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7020 argument. Files should not be compressed or point to online sources but stored on local disk. Also,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7020 should be a dictionary where the key is the repeating nodes in document (which become the rows) and the value is a list of any element or attribute that is a descendant (i. e. , child, grandchild) of repeating node. Since XPath is not used in this method, descendants do not need to share same relationship with one another. Below shows example of reading in Wikipedia’s very large (12 GB+) latest article data dump

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

760

Writing XML#

Mới trong phiên bản 1. 3. 0

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 objects have an instance method

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7023 which renders the contents of the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 as an XML document

Ghi chú

This method does not support special properties of XML including DTD, CData, XSD schemas, processing instructions, comments, and others. Only namespaces at the root level is supported. However,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7001 allows design changes after initial output

Let’s look at a few examples

Write an XML without options

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

761

Write an XML with new root and row name

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

762

Write an attribute-centric XML

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

763

Write a mix of elements and attributes

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

764

Any

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4140 with hierarchical columns will be flattened for XML element names with levels delimited by underscores

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

765

Write an XML with default namespace

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

766

Write an XML with namespace prefix

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

767

Write an XML without declaration or pretty print

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

768

Write an XML and transform with stylesheet

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

769

XML Final Notes#

All XML documents adhere to W3C specifications. Both

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7027 and

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4175 parsers will fail to parse any markup document that is not well-formed or follows XML syntax rules. Do be aware HTML is not an XML document unless it follows XHTML specs. However, other popular markup types including KML, XAML, RSS, MusicML, MathML are compliant XML schemas

For above reason, if your application builds XML prior to pandas operations, use appropriate DOM libraries like

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7027 and

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4175 to build the necessary document and not by string concatenation or regex adjustments. Always remember XML is a special text file with markup rules

With very large XML files (several hundred MBs to GBs), XPath and XSLT can become memory-intensive operations. Be sure to have enough available RAM for reading and writing to large XML files (roughly about 5 times the size of text)
Because XSLT is a programming language, use it with caution since such scripts can pose a security risk in your environment and can run large or infinite recursive operations. Always test scripts on small fragments before full run

The etree parser supports all functionality of both

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7000 and

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7023 except for complex XPath and any XSLT. Though limited in features,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7027 is still a reliable and capable parser and tree builder. Its performance may trail

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4175 to a certain degree for larger files but relatively unnoticeable on small to medium size files

Excel files#

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7035 method can read Excel 2007+ (

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7036) files using the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7037 Python module. Excel 2003 (

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7038) files can be read using

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7039. Binary Excel (

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7040) files can be read using

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7041. The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7042 instance method is used for saving a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 to Excel. Generally the semantics are similar to working with csv data. See the cookbook for some advanced strategies.

Cảnh báo

The xlwt package for writing old-style

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7038 excel files is no longer maintained. The xlrd package is now only for reading old-style

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7038 files

Before pandas 1. 3. 0, the default argument

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7046 to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7035 would result in using the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7039 engine in many cases, including new Excel 2007+ (

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7036) files. pandas will now default to using the openpyxl engine

It is strongly encouraged to install

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7037 to read Excel 2007+ (

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7036) files. Please do not report issues when using ``xlrd`` to read ``. xlsx`` files. This is no longer supported, switch to using

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7037 instead

Cố gắng sử dụng công cụ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7053 sẽ tăng

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4109 trừ khi tùy chọn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7055 được đặt thành

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7056. While this option is now deprecated and will also raise a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4109, it can be globally set and the warning suppressed. Users are recommended to write

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7036 files using the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7037 engine instead

Reading Excel files#

In the most basic use-case,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7060 takes a path to an Excel file, and the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7061 indicating which sheet to parse

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

770

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7062 class#

To facilitate working with multiple sheets from the same file, the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7062 class can be used to wrap the file and can be passed into

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7060 There will be a performance benefit for reading multiple sheets as the file is read into memory only once

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

771

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7062 class can also be used as a context manager

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

772

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7066 property will generate a list of the sheet names in the file

The primary use-case for an

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7062 is parsing multiple sheets with different parameters

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

773

Note that if the same parsing parameters are used for all sheets, a list of sheet names can simply be passed to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7060 with no loss in performance

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

774

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7062 can also be called with a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7070 object as a parameter. This allows the user to control how the excel file is read. For example, sheets can be loaded on demand by calling

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7071 with

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7072

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

775

Specifying sheets#

Ghi chú

The second argument is

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7061, not to be confused with

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7074

Ghi chú

An ExcelFile’s attribute

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7066 provides access to a list of sheets

The arguments

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7061 allows specifying the sheet or sheets to read

The default value for

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7061 is 0, indicating to read the first sheet

Pass a string to refer to the name of a particular sheet in the workbook
Pass an integer to refer to the index of a sheet. Indices follow Python convention, beginning at 0
Pass a list of either strings or integers, to return a dictionary of specified sheets

Pass a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24 to return a dictionary of all available sheets

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

776

Using the sheet index

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

777

Using all default values

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

778

Using None to get all sheets

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

779

Using a list to get multiple sheets

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

780

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7060 can read more than one sheet, by setting

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7061 to either a list of sheet names, a list of sheet positions, or

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24 to read all sheets. Sheets can be specified by sheet index or sheet name, using an integer or string, respectively

Reading a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2476#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7060 can read a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2476 index, by passing a list of columns to

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0564 and a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2476 column by passing a list of rows to

pip install openpyxl

1284. If either the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3342 or

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3340 have serialized level names those will be read in as well by specifying the rows/columns that make up the levels

For example, to read in a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2476 index without names

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

781

If the index has level names, they will parsed as well, using the same parameters

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

782

If the source file has both

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2476 index and columns, lists specifying each should be passed to

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0564 and

pip install openpyxl

1284

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

783

Các giá trị bị thiếu trong các cột được chỉ định trong

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0564 sẽ được điền chuyển tiếp để cho phép thực hiện vòng lặp với

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7095 cho

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7096. To avoid forward filling the missing values use

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7097 after reading the data instead of

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0564

Parsing specific columns#

It is often the case that users will insert columns to do temporary computations in Excel and you may not want to read in those columns.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7060 takes a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

47 keyword to allow you to specify a subset of columns to parse

Changed in version 1. 0. 0

Passing in an integer for

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

47 will no longer work. Please pass in a list of ints from 0 to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

47 inclusive instead

You can specify a comma-delimited set of Excel columns and ranges as a string

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

784

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

47 is a list of integers, then it is assumed to be the file column indices to be parsed

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

785

Element order is ignored, so

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

54 is the same as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

55

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

47 is a list of strings, it is assumed that each string corresponds to a column name provided either by the user in

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

49 or inferred from the document header row(s). Those strings define which columns will be parsed

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

786

Element order is ignored, so

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7108 is the same as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7109

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

47 is callable, the callable function will be evaluated against the column names, returning names where the callable function evaluates to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

787

Parsing dates#

Datetime-like values are normally automatically converted to the appropriate dtype when reading the excel file. But if you have a column of strings that look like dates (but are not actually formatted as dates in excel), you can use the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0569 keyword to parse those strings to datetimes

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

788

Cell converters#

It is possible to transform the contents of Excel cells via the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0511 option. For instance, to convert a column to boolean

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

789

This options handles missing values and treats exceptions in the converters as missing data. Transformations are applied cell by cell rather than to the column as a whole, so the array dtype is not guaranteed. For instance, a column of integers with missing values cannot be transformed to an array with integer dtype, because NaN is strictly a float. You can manually mask missing data to recover integer dtype

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

790

Dtype specifications#

As an alternative to converters, the type for an entire column can be specified using the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

88 keyword, which takes a dictionary mapping column names to types. To interpret data with no type inference, use the type

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

15 or

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

72

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

791

Writing Excel files#

Writing Excel files to disk#

To write a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 object to a sheet of an Excel file, you can use the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7095 instance method. The arguments are largely the same as

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3334 described above, the first argument being the name of the excel file, and the optional second argument the name of the sheet to which the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 should be written. For example

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

792

Files with a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7038 extension will be written using

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7053 and those with a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7036 extension will be written using

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7124 (if available) or

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7037

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 will be written in a way that tries to mimic the REPL output. The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3343 will be placed in the second row instead of the first. You can place it in the first row by setting the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7128 option in

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7042 to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

793

In order to write separate

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4140 to separate sheets in a single Excel file, one can pass an

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7132

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

794

Writing Excel files to memory#

pandas supports writing Excel files to buffer-like objects such as

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

11 or

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7004 using

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7132

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

795

Ghi chú

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7136 is optional but recommended. Setting the engine determines the version of workbook produced. Đặt

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7137 sẽ tạo sổ làm việc định dạng Excel 2003 (xls). Sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7138 hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7139 sẽ tạo sổ làm việc định dạng Excel 2007 (xlsx). Nếu bỏ qua, sổ làm việc có định dạng Excel 2007 sẽ được tạo

Công cụ soạn thảo Excel#

Không dùng nữa kể từ phiên bản 1. 2. 0. Vì gói xlwt không còn được duy trì, công cụ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7053 sẽ bị xóa khỏi phiên bản pandas trong tương lai. Đây là công cụ duy nhất trong gấu trúc hỗ trợ ghi vào tệp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7038.

gấu trúc chọn một trình soạn thảo Excel thông qua hai phương pháp

đối số từ khóa

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7136

phần mở rộng tên tệp (thông qua mặc định được chỉ định trong tùy chọn cấu hình)

Theo mặc định, gấu trúc sử dụng XlsxWriter cho tệp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7036, openpyxl cho tệp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7144 và xlwt cho tệp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7038. Nếu bạn đã cài đặt nhiều công cụ, bạn có thể đặt công cụ mặc định thông qua đặt tùy chọn cấu hình

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7146 và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7055. pandas will fall back on openpyxl for

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7036 files if Xlsxwriter is not available.

Để chỉ định bạn muốn sử dụng trình soạn thảo nào, bạn có thể chuyển đối số từ khóa công cụ tới

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7095 và tới

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7132. The built-in engines are

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7037. phiên bản 2. 4 hoặc cao hơn là bắt buộc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7124

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7053

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

796

Phong cách và định dạng#

Giao diện của bảng tính Excel được tạo từ gấu trúc có thể được sửa đổi bằng cách sử dụng các tham số sau trên phương pháp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7095 của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3339. Chuỗi định dạng cho số dấu phẩy động (mặc định là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24)

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7158. Một bộ gồm hai số nguyên đại diện cho hàng dưới cùng và cột ngoài cùng bên phải để đóng băng. Mỗi tham số này đều dựa trên một tham số, vì vậy (1, 1) sẽ đóng băng hàng đầu tiên và cột đầu tiên (mặc định là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24)

Using the Xlsxwriter engine provides many options for controlling the format of an Excel worksheet created with the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7095 method. Các ví dụ tuyệt vời có thể được tìm thấy trong tài liệu Xlsxwriter tại đây. https. //xlsxwriter. đọcthedocs. io/working_with_pandas. html

Bảng tính OpenDocument#

Mới trong phiên bản 0. 25

Phương pháp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7035 cũng có thể đọc bảng tính OpenDocument bằng mô-đun

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7162. Ngữ nghĩa và các tính năng để đọc bảng tính OpenDocument phù hợp với những gì có thể thực hiện được đối với các tệp Excel bằng cách sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7163

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

797

Ghi chú

Hiện tại pandas chỉ hỗ trợ đọc bảng tính OpenDocument. Viết không được thực hiện

Excel nhị phân (. tệp xlsb) #

Mới trong phiên bản 1. 0. 0

Phương pháp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7035 cũng có thể đọc các tệp Excel nhị phân bằng cách sử dụng mô-đun

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7041. Ngữ nghĩa và các tính năng để đọc tệp Excel nhị phân hầu hết khớp với những gì có thể thực hiện được đối với tệp Excel bằng cách sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7166.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7041 không nhận ra các loại ngày giờ trong tệp và thay vào đó sẽ trả về số float

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

798

Ghi chú

Hiện tại pandas chỉ hỗ trợ đọc các tệp Excel nhị phân. Viết không được thực hiện

Bảng nhớ tạm #

Một cách thuận tiện để lấy dữ liệu là sử dụng phương thức

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7168, lấy nội dung của bộ đệm clipboard và chuyển chúng đến phương thức

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

66. Chẳng hạn, bạn có thể sao chép văn bản sau vào khay nhớ tạm (CTRL-C trên nhiều hệ điều hành)

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

799

Và sau đó nhập dữ liệu trực tiếp vào

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 bằng cách gọi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

500

Phương thức

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7171 có thể được sử dụng để ghi nội dung của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 vào khay nhớ tạm. Sau đó, bạn có thể dán nội dung khay nhớ tạm vào các ứng dụng khác (CTRL-V trên nhiều hệ điều hành). Ở đây chúng tôi minh họa việc viết một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 vào khay nhớ tạm và đọc lại

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

501

Chúng tôi có thể thấy rằng chúng tôi đã lấy lại cùng một nội dung mà chúng tôi đã ghi vào khay nhớ tạm trước đó

Ghi chú

Bạn có thể cần cài đặt xclip hoặc xsel (với PyQt5, PyQt4 hoặc qtpy) trên Linux để sử dụng các phương pháp này

dưa muối#

Tất cả các đối tượng pandas đều được trang bị các phương thức

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7174 sử dụng mô-đun

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7175 của Python để lưu cấu trúc dữ liệu vào đĩa bằng định dạng dưa chua

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

502

Hàm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7176 trong không gian tên

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7177 có thể được sử dụng để tải bất kỳ đối tượng pickled pandas nào (hoặc bất kỳ đối tượng được ngâm nào khác) từ tệp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

503

Cảnh báo

Loading pickled data received from untrusted sources can be unsafe

Nhìn thấy. https. // tài liệu. con trăn. org/3/library/dưa chua. html

Cảnh báo

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7178 chỉ được đảm bảo tương thích ngược với pandas phiên bản 0. 20. 3

Tập tin dưa nén #

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7178,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7180 và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7181 có thể đọc và ghi các tệp dưa nén. Các kiểu nén của ________ 11257, ________ 11258, ________ 67184, _______ 67185 được hỗ trợ để đọc và ghi. Định dạng tệp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7186 chỉ hỗ trợ đọc và chỉ được chứa một tệp dữ liệu để đọc

Loại nén có thể là một tham số rõ ràng hoặc được suy ra từ phần mở rộng tệp. Nếu 'suy ra', thì lần lượt sử dụng

pip install openpyxl

1257,

pip install openpyxl

1258,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7186,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7184,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7185 nếu tên tệp kết thúc bằng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7192,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7193,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7194,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7195 hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7196

Tham số nén cũng có thể là một

pip install openpyxl

1243 để chuyển các tùy chọn cho giao thức nén. Nó phải có khóa

pip install openpyxl

1247 được đặt thành tên của giao thức nén, phải là một trong {

pip install openpyxl

1239,

pip install openpyxl

1237,

pip install openpyxl

1238,

pip install openpyxl

1240,

pip install openpyxl

1241}. Tất cả các cặp khóa-giá trị khác được chuyển đến thư viện nén cơ bản

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

504

Using an explicit compression type

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

505

Suy ra kiểu nén từ tiện ích mở rộng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

506

Mặc định là 'suy luận'

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

507

Chuyển các tùy chọn cho giao thức nén để tăng tốc độ nén

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

508

msgpack#

hỗ trợ gấu trúc cho

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7204 đã bị xóa trong phiên bản 1. 0. 0. Bạn nên sử dụng dưa chua để thay thế.

Ngoài ra, bạn cũng có thể định dạng tuần tự hóa Arrow IPC để truyền trực tuyến các đối tượng gấu trúc. Để biết tài liệu về pyarrow, xem tại đây

HDF5 (PyTables)#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7205 là một đối tượng giống như dict đọc và ghi pandas bằng định dạng HDF5 hiệu suất cao bằng thư viện PyTables xuất sắc. Xem sách dạy nấu ăn để biết một số chiến lược nâng cao

Cảnh báo

gấu trúc sử dụng PyTables để đọc và ghi các tệp HDF5, cho phép tuần tự hóa dữ liệu kiểu đối tượng bằng dưa chua. Loading pickled data received from untrusted sources can be unsafe

Nhìn thấy. https. // tài liệu. con trăn. org/3/library/dưa chua. html để biết thêm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

509

Các đối tượng có thể được ghi vào tệp giống như thêm các cặp khóa-giá trị vào một lệnh

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

510

Trong phiên Python hiện tại hoặc mới hơn, bạn có thể truy xuất các đối tượng được lưu trữ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

511

Xóa đối tượng được chỉ định bởi khóa

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

512

Đóng Cửa hàng và sử dụng trình quản lý ngữ cảnh

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

513

Đọc/ghi API#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7205 hỗ trợ API cấp cao nhất sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7207 để đọc và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7208 để viết, tương tự như cách hoạt động của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

66 và

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3334

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

514

HDFStore theo mặc định sẽ không loại bỏ các hàng bị thiếu. Hành vi này có thể được thay đổi bằng cách đặt

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7211

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

515

Định dạng cố định #

Các ví dụ trên cho thấy việc lưu trữ bằng cách sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7212, viết HDF5 thành

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7213 ở định dạng mảng cố định, được gọi là định dạng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7214. Các loại cửa hàng này không thể nối thêm sau khi được viết (mặc dù bạn có thể chỉ cần xóa chúng và viết lại). Chúng cũng không thể truy vấn được; . Họ cũng không hỗ trợ các khung dữ liệu có tên cột không phải là duy nhất. The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7214 format stores offer very fast writing and slightly faster reading than

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3504 stores. Định dạng này được chỉ định theo mặc định khi sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7212 hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7208 hoặc bởi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7219 hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7220

Cảnh báo

Định dạng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7214 sẽ tăng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7222 nếu bạn cố truy xuất bằng cách sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7223

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

516

Định dạng bảng #

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7205 hỗ trợ định dạng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7213 khác trên đĩa, định dạng

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3504. Về mặt khái niệm, một

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3504 có hình dạng rất giống một DataFrame, với các hàng và cột. Một

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3504 có thể được thêm vào trong cùng một phiên hoặc các phiên khác. Ngoài ra, các hoạt động loại truy vấn và xóa được hỗ trợ. Định dạng này được chỉ định bởi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7229 hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7230 đến

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7231 hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7212 hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7208

Định dạng này cũng có thể được đặt làm tùy chọn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7234 để cho phép

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7235 lưu trữ theo mặc định ở định dạng

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3504

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

517

Ghi chú

Bạn cũng có thể tạo một

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3504 bằng cách chuyển

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7229 hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7230 cho một hoạt động

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7212

Hierarchical keys#

Keys to a store can be specified as a string. Chúng có thể ở định dạng giống như tên đường dẫn phân cấp (e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7241), sẽ tạo ra một hệ thống phân cấp các cửa hàng phụ (hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7242 theo cách nói của PyTables). Keys can be specified without the leading ‘/’ and are always absolute (e. g. ‘foo’ refers to ‘/foo’). Removal operations can remove everything in the sub-store and below, so be careful

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

518

You can walk through the group hierarchy using the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7243 method which will yield a tuple for each group key along with the relative keys of its contents

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

519

Cảnh báo

Hierarchical keys cannot be retrieved as dotted (attribute) access as described above for items stored under the root node

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

520

Instead, use explicit string based keys

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

521

Storing types#

Storing mixed types in a table#

Storing mixed-dtype data is supported. Các chuỗi được lưu trữ dưới dạng chiều rộng cố định bằng cách sử dụng kích thước tối đa của cột được nối thêm. Subsequent attempts at appending longer strings will raise a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3327

Chuyển

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7245 làm tham số để nối thêm sẽ đặt giá trị tối thiểu lớn hơn cho các cột chuỗi. Storing

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7246 are currently supported. For string columns, passing

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7247 to append will change the default nan representation on disk (which converts to/from

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7248), this defaults to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7249

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

522

Storing MultiIndex DataFrames#

Storing MultiIndex

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4140 as tables is very similar to storing/selecting from homogeneous index

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4140

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

523

Ghi chú

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3342 keyword is reserved and cannot be use as a level name

Querying#

Truy vấn một bảng#

Các hoạt động của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7253 và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7254 có một tiêu chí tùy chọn có thể được chỉ định để chỉ chọn/xóa một tập hợp con của dữ liệu. Điều này cho phép một người có một bảng trên đĩa rất lớn và chỉ truy xuất một phần dữ liệu

A query is specified using the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7255 class under the hood, as a boolean expression

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3342 and

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3340 are supported indexers of

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4140

if

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7259 are specified, these can be used as additional indexers

tên cấp độ trong MultiIndex, với tên mặc định là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7260,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7261, … nếu không được cung cấp

Các toán tử so sánh hợp lệ là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7262

Biểu thức boolean hợp lệ được kết hợp với

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7263. or

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7264. và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7265 and

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7266 . để nhóm

These rules are similar to how boolean expressions are used in pandas for indexing

Ghi chú

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7267 sẽ được tự động mở rộng thành toán tử so sánh

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7268

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7269 là toán tử not, nhưng chỉ có thể được sử dụng trong một số trường hợp rất hạn chế

Nếu một danh sách/bộ biểu thức được thông qua, chúng sẽ được kết hợp thông qua

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7264

The following are valid expressions

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7271

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7272

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7273

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7274

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7275

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7276

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7277

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7278

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7279

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7280

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7281 nằm ở vế trái của biểu thức con

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3340,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7283,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7284

Vế phải của biểu thức con (sau toán tử so sánh) có thể là

functions that will be evaluated, e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7285

strings, e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7286

giống như ngày, e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7287, or

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7288

lists, e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7289

các biến được xác định trong không gian tên cục bộ, e. g.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7290

Ghi chú

Passing a string to a query by interpolating it into the query expression is not recommended. Chỉ cần gán chuỗi quan tâm cho một biến và sử dụng biến đó trong một biểu thức. For example, do this

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

524

instead of this

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

525

The latter will not work and will raise a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7291. Note that there’s a single quote followed by a double quote in the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7292 variable

Nếu bạn phải nội suy, hãy sử dụng công cụ xác định định dạng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7293

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

526

sẽ trích dẫn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7292

Here are some examples

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

527

Use boolean expressions, with in-line function evaluation

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

528

Sử dụng tham chiếu cột nội tuyến

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

529

Từ khóa

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3340 có thể được cung cấp để chọn danh sách các cột sẽ được trả về, điều này tương đương với việc chuyển một số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7296

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

530

Các tham số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7297 và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7298 có thể được chỉ định để giới hạn tổng không gian tìm kiếm. Đây là về tổng số hàng trong một bảng

Ghi chú

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7253 will raise a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3327 if the query expression has an unknown variable reference. Usually this means that you are trying to select on a column that is not a data_column

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7253 sẽ tăng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7291 nếu biểu thức truy vấn không hợp lệ

Query timedelta64[ns]#

Bạn có thể lưu trữ và truy vấn bằng cách sử dụng loại

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7303. Điều khoản có thể được chỉ định trong định dạng.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7304, where float may be signed (and fractional), and unit can be

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7305 for the timedelta. Đây là một ví dụ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

531

Truy vấn MultiIndex#

Selecting from a

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2476 can be achieved by using the name of the level

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

532

If the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2476 levels names are

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24, the levels are automatically made available via the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7309 keyword with

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7310 the level of the

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2476 you want to select from

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

533

Lập chỉ mục #

You can create/modify an index for a table with

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7312 after data is already in the table (after and

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7313 operation). Creating a table index is highly encouraged. This will speed your queries a great deal when you use a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7253 with the indexed dimension as the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7223

Ghi chú

Indexes are automagically created on the indexables and any data columns you specify. This behavior can be turned off by passing

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7316 to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7231

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

534

Oftentimes when appending large amounts of data to a store, it is useful to turn off index creation for each append, then recreate at the end

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

535

Then create the index when finished appending

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

536

See here for how to create a completely-sorted-index (CSI) on an existing store

Query via data columns#

You can designate (and index) certain columns that you want to be able to perform queries (other than the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7318 columns, which you can always query). For instance say you want to perform this common operation, on-disk, and return just the frame that matches this query. You can specify

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7319 to force all columns to be

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7259

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

537

There is some performance degradation by making lots of columns into

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7321, so it is up to the user to designate these. Ngoài ra, bạn không thể thay đổi các cột dữ liệu (cũng như không thể lập chỉ mục) sau thao tác thêm/đặt đầu tiên (Tất nhiên bạn có thể chỉ cần đọc dữ liệu và tạo một bảng mới. )

Iterator#

You can pass

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2495 or

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7323 to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7253 and

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7325 to return an iterator on the results. The default is 50,000 rows returned in a chunk

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

538

Ghi chú

You can also use the iterator with

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7207 which will open, then automatically close the store when finished iterating

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

539

Note, that the chunksize keyword applies to the source rows. So if you are doing a query, then the chunksize will subdivide the total rows in the table and the query applied, returning an iterator on potentially unequal sized chunks

Đây là một công thức để tạo một truy vấn và sử dụng nó để tạo các khối trả về có kích thước bằng nhau

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

540

Truy vấn nâng cao#

Chọn một cột #

Để truy xuất một cột dữ liệu hoặc có thể lập chỉ mục, hãy sử dụng phương thức

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7327. This will, for example, enable you to get the index very quickly. These return a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

62 of the result, indexed by the row number. Chúng hiện không chấp nhận bộ chọn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7223

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

541

Đang chọn tọa độ#

Đôi khi bạn muốn lấy tọa độ (a. k. a the index locations) of your query. This returns an

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7330 of the resulting locations. Các tọa độ này cũng có thể được chuyển cho các hoạt động tiếp theo của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7223

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

542

Chọn bằng cách sử dụng mặt nạ #

Đôi khi truy vấn của bạn có thể liên quan đến việc tạo danh sách các hàng để chọn. Thông thường,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7332 này sẽ là kết quả của

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3342 từ thao tác lập chỉ mục. Ví dụ này chọn các tháng của datetimeindex là 5

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

543

Đối tượng lưu trữ #

Nếu bạn muốn kiểm tra đối tượng được lưu trữ, hãy truy xuất qua

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7334. Bạn có thể sử dụng điều này theo lập trình để nói lấy số lượng hàng trong một đối tượng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

544

Nhiều truy vấn bảng#

The methods

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7335 and

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7325 can perform appending/selecting from multiple tables at once. Ý tưởng là có một bảng (gọi nó là bảng chọn) mà bạn lập chỉ mục cho hầu hết/tất cả các cột và thực hiện các truy vấn của mình. The other table(s) are data tables with an index matching the selector table’s index. Sau đó, bạn có thể thực hiện một truy vấn rất nhanh trên bảng bộ chọn nhưng vẫn nhận được nhiều dữ liệu. This method is similar to having a very wide table, but enables more efficient queries

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7335 method splits a given single DataFrame into multiple tables according to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7338, a dictionary that maps the table names to a list of ‘columns’ you want in that table. Nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24 được sử dụng thay cho danh sách, bảng đó sẽ có các cột không xác định còn lại của Khung dữ liệu đã cho. Đối số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7340 xác định bảng nào là bảng chọn (bạn có thể thực hiện truy vấn từ đó). The argument

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7341 will drop rows from the input

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 to ensure tables are synchronized. This means that if a row for one of the tables being written to is entirely

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7343, that row will be dropped from all tables

If

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7341 is False, THE USER IS RESPONSIBLE FOR SYNCHRONIZING THE TABLES. Remember that entirely

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7345 rows are not written to the HDFStore, so if you choose to call

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7346, some tables may have more rows than others, and therefore

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7325 may not work or it may return unexpected results

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

545

Delete from a table#

Bạn có thể xóa khỏi bảng một cách có chọn lọc bằng cách chỉ định một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7223. In deleting rows, it is important to understand the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7213 deletes rows by erasing the rows, then moving the following data. Thus deleting can potentially be a very expensive operation depending on the orientation of your data. To get optimal performance, it’s worthwhile to have the dimension you are deleting be the first of the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7350

Data is ordered (on the disk) in terms of the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7350. Đây là một trường hợp sử dụng đơn giản. You store panel-type data, with dates in the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7283 and ids in the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7353. The data is then interleaved like this

date_1
- id_1
- id_2
- .
- id_n
ngày_2
- id_1
- .
- id_n

Rõ ràng là thao tác xóa trên

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7283 sẽ khá nhanh, vì một đoạn được xóa, sau đó dữ liệu sau sẽ được di chuyển. Mặt khác, thao tác xóa trên

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7353 sẽ rất tốn kém. In this case it would almost certainly be faster to rewrite the table using a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7223 that selects all but the missing data

Cảnh báo

Xin lưu ý rằng HDF5 KHÔNG TỰ ĐỘNG ĐẶT LẠI KHÔNG GIAN trong các tệp h5. Do đó, liên tục xóa (hoặc loại bỏ các nút) và thêm lại, SẼ CÓ XU HƯỚNG TĂNG KÍCH THƯỚC TẬP TIN

Để đóng gói lại và xóa tệp, hãy sử dụng ptrepack .

Notes & caveats#

Nén#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7213 allows the stored data to be compressed. Điều này áp dụng cho tất cả các loại cửa hàng, không chỉ bàn. Hai tham số được sử dụng để kiểm soát nén.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7358 and

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7359

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7358 chỉ định nếu và mức độ cứng của dữ liệu được nén.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7361 and

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7362 disables compression and

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7363 enables compression

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7359 chỉ định sử dụng thư viện nén nào. Nếu không có gì được chỉ định, thư viện mặc định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7365 được sử dụng. Thư viện nén thường tối ưu hóa để có tốc độ hoặc tốc độ nén tốt và kết quả sẽ phụ thuộc vào loại dữ liệu. Lựa chọn kiểu nén nào tùy thuộc vào nhu cầu và dữ liệu cụ thể của bạn. Danh sách các thư viện nén được hỗ trợ

zlib. The default compression library. Cổ điển về mặt nén, đạt tốc độ nén tốt nhưng hơi chậm
lzo. Fast compression and decompression
bzip2. Tỷ lệ nén tốt
blosc. Fast compression and decompression
Support for alternative blosc compressors
- khối. blosclz This is the default compressor for
```
In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object
```
  7366
- khối. lz4. A compact, very popular and fast compressor
- khối. lz4hc. Phiên bản tinh chỉnh của LZ4, tạo ra tỷ lệ nén tốt hơn với chi phí tốc độ
- blosc. snappy. A popular compressor used in many places
- blosc. zlib. A classic; somewhat slower than the previous ones, but achieving better compression ratios
- blosc. zstd. An extremely well balanced codec; it provides the best compression ratios among the others above, and at reasonably fast speed

Nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7359 được định nghĩa là một cái gì đó khác với các thư viện được liệt kê, một ngoại lệ

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3327 sẽ được ban hành

Ghi chú

If the library specified with the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7359 option is missing on your platform, compression defaults to

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7365 without further ado

Kích hoạt tính năng nén cho tất cả các đối tượng trong tệp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

546

Hoặc tính năng nén nhanh (điều này chỉ áp dụng cho các bảng) trong các cửa hàng không bật tính năng nén

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

547

ptrepack#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7213 offers better write performance when tables are compressed after they are written, as opposed to turning on compression at the very beginning. Bạn có thể sử dụng tiện ích

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7213 được cung cấp

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7373. In addition,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7373 can change compression levels after the fact

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

548

Furthermore

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7375 will repack the file to allow you to reuse previously deleted space. Alternatively, one can simply remove the file and write again, or use the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7376 method

Hãy cẩn thận #

Cảnh báo

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7205 is not-threadsafe for writing. The underlying

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7213 only supports concurrent reads (via threading or processes). Nếu bạn cần đọc và ghi đồng thời, bạn cần tuần tự hóa các hoạt động này trong một chuỗi trong một quy trình duy nhất. Bạn sẽ làm hỏng dữ liệu của mình nếu không. See the (GH2397) for more information

Nếu bạn sử dụng khóa để quản lý quyền ghi giữa nhiều quy trình, bạn có thể muốn sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7379 trước khi giải phóng khóa ghi. For convenience you can use

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7380 to do this for you

Khi một

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3504 được tạo, các cột (DataFrame) được cố định;

Hãy nhận biết rằng múi giờ (e. g. ,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7382) không nhất thiết phải bằng nhau giữa các phiên bản múi giờ. So if data is localized to a specific timezone in the HDFStore using one version of a timezone library and that data is updated with another version, the data will be converted to UTC since these timezones are not considered equal. Sử dụng cùng một phiên bản thư viện múi giờ hoặc sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7383 với định nghĩa múi giờ được cập nhật

Cảnh báo

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7213 sẽ hiển thị

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7385 nếu không thể sử dụng tên cột làm bộ chọn thuộc tính. Định danh tự nhiên chỉ chứa các chữ cái, số và dấu gạch dưới và không được bắt đầu bằng số. Other identifiers cannot be used in a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7223 clause and are generally a bad idea

Loại dữ liệu#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7205 sẽ ánh xạ một dtype đối tượng tới dtype bên dưới của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7213. Điều này có nghĩa là các loại sau được biết là hoạt động

Loại

Represents missing values

floating .

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7389

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7248

số nguyên.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7391

boolean

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7392

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3519

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7303

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3519

categorical . see the section below

object .

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7396

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7248

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7398 columns are not supported, and WILL FAIL

Dữ liệu phân loại#

You can write data that contains

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7399 dtypes to a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7205. Queries work the same as if it was an object array. Tuy nhiên, dữ liệu dtyped

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7399 được lưu trữ theo cách hiệu quả hơn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

549

String columns#

min_itemsize

Việc triển khai cơ bản của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7205 sử dụng chiều rộng cột cố định (kích thước vật phẩm) cho các cột chuỗi. Kích thước cột chuỗi được tính bằng độ dài tối đa của dữ liệu (đối với cột đó) được chuyển đến

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7205, trong phần nối thêm đầu tiên. Các phần bổ sung tiếp theo, có thể giới thiệu một chuỗi cho một cột lớn hơn cột có thể chứa, một Ngoại lệ sẽ được đưa ra (nếu không, bạn có thể cắt ngắn các cột này, dẫn đến mất thông tin). In the future we may relax this and allow a user-specified truncation to occur

Vượt qua

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7404 trong lần tạo bảng đầu tiên để a-priori chỉ định độ dài tối thiểu của một cột chuỗi cụ thể.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7404 can be an integer, or a dict mapping a column name to an integer. Bạn có thể chuyển

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3503 làm khóa để cho phép tất cả các mục có thể lập chỉ mục hoặc cột dữ liệu có kích thước tối thiểu này

Passing a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7404 dict will cause all passed columns to be created as data_columns automatically

Ghi chú

If you are not passing any

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7259, then the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7404 will be the maximum of the length of any string passed

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

550

nan_rep

Các cột chuỗi sẽ tuần tự hóa một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7248 (một giá trị bị thiếu) với biểu diễn chuỗi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7411. Giá trị này mặc định là giá trị chuỗi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7249. Bạn có thể vô tình biến một giá trị thực tế của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7249 thành một giá trị bị thiếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

551

Khả năng tương thích bên ngoài#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7205 viết các đối tượng định dạng

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3504 ở các định dạng cụ thể phù hợp để tạo các chuyến khứ hồi không mất dữ liệu tới các đối tượng gấu trúc. Để tương thích với bên ngoài, ________ 67205 có thể đọc các bảng định dạng gốc của ________ 67213

Có thể viết một đối tượng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7205 để có thể dễ dàng nhập vào

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7419 bằng thư viện

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7420 (Trang web gói). Tạo một cửa hàng định dạng bảng như thế này

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

552

Trong R, tệp này có thể được đọc thành đối tượng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7421 bằng thư viện

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7420. Hàm ví dụ sau đọc tên cột và giá trị dữ liệu tương ứng từ các giá trị và tập hợp chúng thành một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7421

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

553

Bây giờ bạn có thể nhập

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 vào R

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

554

Ghi chú

Hàm R liệt kê toàn bộ nội dung của tệp HDF5 và tập hợp đối tượng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7421 từ tất cả các nút phù hợp, vì vậy chỉ sử dụng hàm này làm điểm bắt đầu nếu bạn đã lưu trữ nhiều đối tượng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 vào một tệp HDF5

Performance#

Định dạng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7427 đi kèm với hình phạt về hiệu suất viết so với các cửa hàng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7214. Lợi ích là khả năng nối thêm/xóa và truy vấn (có thể là lượng dữ liệu rất lớn). Thời gian viết thường dài hơn so với các cửa hàng thông thường. Thời gian truy vấn có thể khá nhanh, đặc biệt là trên trục được lập chỉ mục

Bạn có thể chuyển

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7429 đến

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7231, chỉ định khối lượng ghi (mặc định là 50000). Điều này sẽ làm giảm đáng kể mức sử dụng bộ nhớ của bạn khi viết

Bạn có thể chuyển

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7431 cho

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7231 đầu tiên, để đặt TỔNG số hàng mà

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7213 sẽ mong đợi. Điều này sẽ tối ưu hóa hiệu suất đọc/ghi

Các hàng trùng lặp có thể được ghi vào bảng, nhưng được lọc ra trong vùng chọn (với các mục cuối cùng được chọn; do đó, một bảng là duy nhất trên các cặp chính, phụ)

Một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7434 sẽ được nâng lên nếu bạn đang cố lưu trữ các loại sẽ được PyTables chọn (chứ không phải được lưu trữ dưới dạng các loại đặc hữu). Xem tại đây để biết thêm thông tin và một số giải pháp

Lông vũ#

Feather cung cấp tuần tự hóa cột nhị phân cho các khung dữ liệu. Nó được thiết kế để làm cho việc đọc và ghi các khung dữ liệu trở nên hiệu quả và giúp việc chia sẻ dữ liệu giữa các ngôn ngữ phân tích dữ liệu trở nên dễ dàng

Feather được thiết kế để tuần tự hóa và hủy tuần tự hóa DataFrames một cách trung thực, hỗ trợ tất cả các kiểu dữ liệu gấu trúc, bao gồm cả các kiểu mở rộng như phân loại và thời gian với tz

Một số lưu ý

The format will NOT write an

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4120, or

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2476 for the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 and will raise an error if a non-default one is provided. Bạn có thể

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7438 để lưu trữ chỉ mục hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7439 để bỏ qua nó

Tên cột trùng lặp và tên cột không phải chuỗi không được hỗ trợ
Các đối tượng Python thực tế trong các cột dtype đối tượng không được hỗ trợ. Những điều này sẽ đưa ra một thông báo lỗi hữu ích khi cố gắng tuần tự hóa

Xem tài liệu đầy đủ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

555

Ghi vào một tập tin lông vũ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

556

Đọc từ tệp lông vũ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

557

Sàn gỗ #

Apache Parquet cung cấp tuần tự hóa cột nhị phân được phân vùng cho các khung dữ liệu. Nó được thiết kế để làm cho việc đọc và ghi các khung dữ liệu trở nên hiệu quả và giúp việc chia sẻ dữ liệu giữa các ngôn ngữ phân tích dữ liệu trở nên dễ dàng. Sàn gỗ có thể sử dụng nhiều kỹ thuật nén khác nhau để thu nhỏ kích thước tệp càng nhiều càng tốt trong khi vẫn duy trì hiệu suất đọc tốt

Parquet được thiết kế để tuần tự hóa và hủy tuần tự hóa một cách trung thực

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 s, hỗ trợ tất cả các kiểu dữ liệu pandas, bao gồm các kiểu mở rộng như datetime với tz

Một số lưu ý

Tên cột trùng lặp và tên cột không phải chuỗi không được hỗ trợ

The

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2497 engine always writes the index to the output, but

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7442 only writes non-default indexes. Cột bổ sung này có thể gây ra sự cố cho những người tiêu dùng không phải là pandas không mong đợi điều đó. Bạn có thể buộc bao gồm hoặc bỏ qua các chỉ mục bằng đối số

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3342, bất kể công cụ cơ bản là gì

Tên cấp chỉ mục, nếu được chỉ định, phải là chuỗi
Trong công cụ
```
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
```
2497, các kiểu dữ liệu phân loại cho các loại không phải chuỗi có thể được đánh số thứ tự thành sàn gỗ, nhưng sẽ hủy đánh số thứ tự như kiểu dữ liệu nguyên thủy của chúng

Công cụ

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2497 duy trì cờ

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4134 của các kiểu dữ liệu phân loại với các loại chuỗi.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7442 không giữ cờ

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4134

Các loại không được hỗ trợ bao gồm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7449 và các loại đối tượng Python thực tế. Những điều này sẽ đưa ra một thông báo lỗi hữu ích khi cố gắng tuần tự hóa. Loại

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7450 được hỗ trợ với pyarrow >= 0. 16. 0

Công cụ
```
import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")
```
2497 bảo tồn các loại dữ liệu mở rộng, chẳng hạn như loại dữ liệu chuỗi và số nguyên có thể null (yêu cầu pyarrow >= 0. 16. 0 và yêu cầu loại tiện ích mở rộng triển khai các giao thức cần thiết, hãy xem tài liệu về loại tiện ích mở rộng ).

Bạn có thể chỉ định một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7136 để điều khiển quá trình lập số sê-ri. Đây có thể là một trong số

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2497 hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7442 hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7455. Nếu động cơ KHÔNG được chỉ định, thì tùy chọn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7456 sẽ được chọn;

Xem tài liệu về pyarrow và fastparquet

Ghi chú

These engines are very similar and should read/write nearly identical parquet format files.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7460 supports timedelta data,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7461 supports timezone aware datetimes. Các thư viện này khác nhau do có các phụ thuộc cơ bản khác nhau (

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7442 bằng cách sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7463, trong khi

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2497 sử dụng thư viện c)

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

558

Ghi vào một tập tin sàn gỗ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

559

Đọc từ một tập tin sàn gỗ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

560

Chỉ đọc một số cột nhất định của tệp sàn gỗ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

561

Xử lý chỉ mục#

Nối tiếp một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 thành sàn gỗ có thể bao gồm chỉ mục ẩn dưới dạng một hoặc nhiều cột trong tệp đầu ra. Như vậy, mã này

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

562

tạo một tệp sàn gỗ có ba cột nếu bạn sử dụng

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2497 để tuần tự hóa.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7467,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7468 và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7469. Nếu bạn đang sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7442, chỉ mục có thể được ghi hoặc không vào tệp

Cột bổ sung không mong muốn này khiến một số cơ sở dữ liệu như Amazon Redshift từ chối tệp vì cột đó không tồn tại trong bảng đích

Nếu bạn muốn bỏ qua các chỉ mục của khung dữ liệu khi viết, hãy chuyển

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7316 đến

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7472

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

563

Điều này tạo ra một tệp sàn gỗ chỉ với hai cột dự kiến,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7467 và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7468. Nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 của bạn có một chỉ mục tùy chỉnh, bạn sẽ không lấy lại được nó khi tải tệp này vào một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43

Vượt qua

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7477 sẽ luôn ghi chỉ mục, ngay cả khi đó không phải là hành vi mặc định của công cụ cơ bản

Phân vùng tệp Parquet#

Sàn gỗ hỗ trợ phân vùng dữ liệu dựa trên giá trị của một hoặc nhiều cột

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

564

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7478 chỉ định thư mục mẹ mà dữ liệu sẽ được lưu.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7479 là các tên cột mà tập dữ liệu sẽ được phân vùng. Các cột được phân vùng theo thứ tự chúng được cung cấp. Sự phân chia phân vùng được xác định bởi các giá trị duy nhất trong các cột phân vùng. Ví dụ trên tạo một tập dữ liệu được phân vùng có thể trông giống như

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

565

ORC#

Mới trong phiên bản 1. 0. 0

Tương tự như định dạng sàn gỗ , Định dạng ORC là tuần tự hóa cột nhị phân cho khung dữ liệu. Nó được thiết kế để làm cho việc đọc khung dữ liệu hiệu quả. gấu trúc cung cấp cả người đọc và người viết cho định dạng ORC,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7480 và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7481. This requires the pyarrow library.

Cảnh báo

Rất nên cài đặt pyarrow bằng conda do một số sự cố xảy ra bởi pyarrow

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7481 yêu cầu pyarrow>=7. 0. 0

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7480 và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7481 chưa được hỗ trợ trên Windows, bạn có thể tìm thấy các môi trường hợp lệ trên cài đặt các phần phụ thuộc tùy chọn .

Đối với các loại được hỗ trợ, vui lòng tham khảo các tính năng ORC được hỗ trợ trong Mũi tên
Các múi giờ hiện tại trong các cột ngày giờ không được giữ nguyên khi khung dữ liệu được chuyển đổi thành tệp ORC

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

566

Ghi vào tệp orc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

567

Đọc từ tệp orc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

568

Chỉ đọc một số cột nhất định của tệp orc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

569

truy vấn SQL#

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7485 module provides a collection of query wrappers to both facilitate data retrieval and to reduce dependency on DB-specific API. Trừu tượng hóa cơ sở dữ liệu được cung cấp bởi SQLAlchemy nếu được cài đặt. Ngoài ra, bạn sẽ cần một thư viện trình điều khiển cho cơ sở dữ liệu của mình. Ví dụ về các trình điều khiển như vậy là psycopg2 cho PostgreSQL hoặc pymysql cho MySQL. Đối với SQLite, điều này được bao gồm trong thư viện chuẩn của Python theo mặc định. Bạn có thể tìm thấy tổng quan về các trình điều khiển được hỗ trợ cho từng phương ngữ SQL trong tài liệu SQLAlchemy

Nếu SQLAlchemy chưa được cài đặt, dự phòng chỉ được cung cấp cho sqlite (và cho mysql để tương thích ngược, nhưng điều này không được dùng nữa và sẽ bị xóa trong phiên bản tương lai). Chế độ này yêu cầu bộ điều hợp cơ sở dữ liệu Python tôn trọng Python DB-API

Xem thêm một số ví dụ về sách dạy nấu ăn để biết một số chiến lược nâng cao.

Các chức năng chính là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7486(tên_bảng, con[, lược đồ,. ])

Đọc bảng cơ sở dữ liệu SQL vào DataFrame

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7487(sql, con[, index_col,. ])

Đọc truy vấn SQL vào DataFrame

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7488(sql, con[, index_col,. ])

Đọc truy vấn SQL hoặc bảng cơ sở dữ liệu vào DataFrame

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7489(tên, con[, sơ đồ,. ])

Ghi các bản ghi được lưu trữ trong DataFrame vào cơ sở dữ liệu SQL

Ghi chú

The function

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7490 is a convenience wrapper around

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7491 and

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7492 (and for backward compatibility) and will delegate to specific function depending on the provided input (database table name or sql query). Tên bảng không cần trích dẫn nếu có ký tự đặc biệt

Trong ví dụ sau, chúng tôi sử dụng công cụ cơ sở dữ liệu SQLite SQL. Bạn có thể sử dụng cơ sở dữ liệu SQLite tạm thời nơi dữ liệu được lưu trữ trong “bộ nhớ”

Để kết nối với SQLAlchemy, bạn sử dụng hàm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7493 để tạo đối tượng công cụ từ URI cơ sở dữ liệu. Bạn chỉ cần tạo công cụ một lần cho mỗi cơ sở dữ liệu mà bạn đang kết nối. Để biết thêm thông tin về

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7493 và định dạng URI, hãy xem các ví dụ bên dưới và tài liệu SQLAlchemy

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

570

Nếu bạn muốn quản lý các kết nối của riêng mình, bạn có thể chuyển một trong các kết nối đó. Ví dụ bên dưới mở một kết nối tới cơ sở dữ liệu bằng trình quản lý bối cảnh Python tự động đóng kết nối sau khi khối hoàn thành. Xem tài liệu SQLAlchemy để được giải thích về cách xử lý kết nối cơ sở dữ liệu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

571

Cảnh báo

Khi bạn mở một kết nối tới cơ sở dữ liệu, bạn cũng chịu trách nhiệm đóng nó. Tác dụng phụ của việc mở kết nối có thể bao gồm khóa cơ sở dữ liệu hoặc hành vi vi phạm khác

Viết DataFrames#

Giả sử dữ liệu sau nằm trong một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

56, chúng ta có thể chèn nó vào cơ sở dữ liệu bằng cách sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7497

Tôi

Ngày tháng

Cột_1

Cột_2

Cột_3

26

2012-10-18

X

25. 7

Thật

42

2012-10-19

Y

-12. 4

Sai

63

2012-10-20

Z

5. 73

Thật

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

572

Với một số cơ sở dữ liệu, việc ghi DataFrames lớn có thể dẫn đến lỗi do vượt quá giới hạn kích thước gói. Điều này có thể tránh được bằng cách đặt tham số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

90 khi gọi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7499. Ví dụ: phần sau ghi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

56 vào cơ sở dữ liệu theo lô 1000 hàng cùng một lúc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

573

Các kiểu dữ liệu SQL#

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7497 sẽ cố gắng ánh xạ dữ liệu của bạn sang loại dữ liệu SQL thích hợp dựa trên loại dữ liệu. Khi bạn có các cột dtype

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

72, gấu trúc sẽ cố gắng suy ra kiểu dữ liệu

Bạn luôn có thể ghi đè loại mặc định bằng cách chỉ định loại SQL mong muốn của bất kỳ cột nào bằng cách sử dụng đối số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

88. Đối số này cần tên cột ánh xạ từ điển tới các loại SQLAlchemy (hoặc chuỗi cho chế độ dự phòng sqlite3). Ví dụ: chỉ định sử dụng loại sqlalchemy

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7504 thay vì loại

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7505 mặc định cho các cột chuỗi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

574

Ghi chú

Do hỗ trợ hạn chế cho timedelta trong các hương vị cơ sở dữ liệu khác nhau, các cột có loại

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7506 sẽ được ghi dưới dạng giá trị số nguyên dưới dạng nano giây vào cơ sở dữ liệu và cảnh báo sẽ được đưa ra

Ghi chú

Các cột của

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7399 dtype sẽ được chuyển thành biểu diễn dày đặc như bạn sẽ nhận được với

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7508 (e. g. đối với các danh mục chuỗi, điều này mang lại một chuỗi các chuỗi). Do đó, việc đọc lại bảng cơ sở dữ liệu không tạo ra một phân loại

Kiểu dữ liệu ngày giờ#

Sử dụng SQLAlchemy,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7497 có khả năng ghi dữ liệu ngày giờ không biết múi giờ hoặc nhận biết múi giờ. Tuy nhiên, dữ liệu kết quả được lưu trữ trong cơ sở dữ liệu cuối cùng phụ thuộc vào loại dữ liệu được hỗ trợ cho dữ liệu ngày giờ của hệ thống cơ sở dữ liệu đang được sử dụng

Bảng sau đây liệt kê các kiểu dữ liệu được hỗ trợ cho dữ liệu ngày giờ đối với một số cơ sở dữ liệu phổ biến. Các phương ngữ cơ sở dữ liệu khác có thể có các loại dữ liệu khác nhau cho dữ liệu ngày giờ

cơ sở dữ liệu

SQL Datetime Types

Hỗ trợ múi giờ

SQLite

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7510

Không

mysql

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7511 hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7512

Không

PostgreSQL

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7511 hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7514

Đúng

Khi ghi dữ liệu nhận biết múi giờ vào cơ sở dữ liệu không hỗ trợ múi giờ, dữ liệu sẽ được ghi dưới dạng dấu thời gian ngây thơ múi giờ theo giờ địa phương đối với múi giờ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7491 cũng có khả năng đọc dữ liệu ngày giờ nhận biết múi giờ hoặc ngây thơ. Khi đọc các loại

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7514, gấu trúc sẽ chuyển đổi dữ liệu sang UTC

Phương pháp chèn #

Tham số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7517 kiểm soát mệnh đề chèn SQL được sử dụng. giá trị có thể là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

24. Sử dụng mệnh đề SQL

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7519 tiêu chuẩn (mỗi hàng một cái)

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7520. Truyền nhiều giá trị trong một mệnh đề

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7519. Nó sử dụng một cú pháp SQL đặc biệt không được hỗ trợ bởi tất cả các chương trình phụ trợ. Điều này thường mang lại hiệu suất tốt hơn cho các cơ sở dữ liệu phân tích như Presto và Redshift, nhưng lại có hiệu suất kém hơn đối với phần phụ trợ SQL truyền thống nếu bảng chứa nhiều cột. Để biết thêm thông tin, hãy kiểm tra tài liệu SQLAlchemy

có thể gọi được với chữ ký

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7522. Điều này có thể được sử dụng để triển khai phương thức chèn hiệu quả hơn dựa trên các tính năng phương ngữ phụ trợ cụ thể

Ví dụ về một mệnh đề có thể gọi được bằng PostgreSQL COPY

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

575

Bảng đọc #

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7491 sẽ đọc một bảng cơ sở dữ liệu được cung cấp tên bảng và tùy chọn một tập hợp con các cột để đọc

Ghi chú

Để sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7491, bạn phải cài đặt phần phụ thuộc tùy chọn SQLAlchemy

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

576

Ghi chú

Lưu ý rằng gấu trúc suy ra các kiểu cột từ đầu ra truy vấn chứ không phải bằng cách tra cứu các loại dữ liệu trong lược đồ cơ sở dữ liệu vật lý. Ví dụ: giả sử

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7525 là một cột số nguyên trong bảng. Then, intuitively,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7526 will return integer-valued series, while

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7527 will return object-valued (str) series. Theo đó, nếu đầu ra truy vấn trống, thì tất cả các cột kết quả sẽ được trả về dưới dạng giá trị đối tượng (vì chúng là tổng quát nhất). Nếu bạn thấy trước rằng truy vấn của mình đôi khi sẽ tạo ra một kết quả trống, thì bạn có thể muốn đánh máy rõ ràng sau đó để đảm bảo tính toàn vẹn của dtype

Bạn cũng có thể chỉ định tên của cột là chỉ mục

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 và chỉ định một tập hợp con các cột sẽ được đọc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

577

Và bạn rõ ràng có thể buộc các cột được phân tích thành ngày

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

578

Nếu cần, bạn có thể chỉ định rõ ràng một chuỗi định dạng hoặc một lệnh của các đối số để chuyển tới

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7529

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

579

Bạn có thể kiểm tra xem một bảng có tồn tại hay không bằng cách sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7530

Hỗ trợ lược đồ #

Việc đọc và ghi vào các lược đồ khác nhau được hỗ trợ thông qua từ khóa

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

4116 trong các hàm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7491 và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7497. Tuy nhiên, lưu ý rằng điều này phụ thuộc vào hương vị cơ sở dữ liệu (sqlite không có lược đồ). Ví dụ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

580

Querying#

Bạn có thể truy vấn bằng SQL thô trong hàm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7492. Trong trường hợp này, bạn phải sử dụng biến thể SQL phù hợp với cơ sở dữ liệu của mình. Khi sử dụng SQLAlchemy, bạn cũng có thể chuyển các cấu trúc ngôn ngữ Biểu thức SQLAlchemy, không liên quan đến cơ sở dữ liệu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

581

Tất nhiên, bạn có thể chỉ định một truy vấn “phức tạp” hơn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

582

Hàm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7492 hỗ trợ đối số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

90. Việc chỉ định điều này sẽ trả về một trình vòng lặp thông qua các đoạn kết quả truy vấn

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

583

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

584

Bạn cũng có thể chạy một truy vấn đơn giản mà không cần tạo một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 với

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7538. Điều này hữu ích cho các truy vấn không trả về giá trị, chẳng hạn như INSERT. Điều này có chức năng tương đương với việc gọi

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7539 trên công cụ SQLAlchemy hoặc đối tượng kết nối db. Một lần nữa, bạn phải sử dụng biến thể cú pháp SQL phù hợp với cơ sở dữ liệu của mình

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

585

Ví dụ về kết nối động cơ#

Để kết nối với SQLAlchemy, bạn sử dụng hàm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7493 để tạo đối tượng công cụ từ URI cơ sở dữ liệu. Bạn chỉ cần tạo công cụ một lần cho mỗi cơ sở dữ liệu mà bạn đang kết nối

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

586

Để biết thêm thông tin, hãy xem các ví dụ về tài liệu SQLAlchemy

Advanced SQLAlchemy queries#

You can use SQLAlchemy constructs to describe your query

Sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7541 để chỉ định các tham số truy vấn theo cách trung lập với phụ trợ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

587

Nếu bạn có một mô tả SQLAlchemy về cơ sở dữ liệu của mình, bạn có thể biểu thị các điều kiện ở đâu bằng các biểu thức SQLAlchemy

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

588

Bạn có thể kết hợp các biểu thức SQLAlchemy với các tham số được truyền tới

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7490 bằng cách sử dụng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7543

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

589

Dự phòng Sqlite#

Việc sử dụng sqlite được hỗ trợ mà không cần sử dụng SQLAlchemy. Chế độ này yêu cầu bộ điều hợp cơ sở dữ liệu Python tôn trọng Python DB-API

Bạn có thể tạo các kết nối như vậy

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

590

Và sau đó đưa ra các truy vấn sau

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

591

Google BigQuery#

Cảnh báo

bắt đầu bằng 0. 20. 0, pandas đã tách hỗ trợ Google BigQuery thành gói riêng biệt

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7544. Bạn có thể

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7545 để lấy nó

The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7544 package provides functionality to read/write from Google BigQuery

gấu trúc tích hợp với gói bên ngoài này. nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7544 được cài đặt, bạn có thể sử dụng các phương thức pandas

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7548 và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7549, sẽ gọi các hàm tương ứng từ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7544

Tài liệu đầy đủ có thể được tìm thấy ở đây

định dạng thống kê #

Ghi vào định dạng stata#

The method

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7551 will write a DataFrame into a . dta file. Phiên bản định dạng của tệp này luôn là 115 (Stata 12)

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

592

Các tệp dữ liệu Stata có hỗ trợ loại dữ liệu hạn chế; . Ngoài ra, Stata dự trữ các giá trị nhất định để biểu thị dữ liệu bị thiếu. Xuất một giá trị không bị thiếu nằm ngoài phạm vi cho phép trong Stata cho một loại dữ liệu cụ thể sẽ nhập lại biến có kích thước lớn hơn tiếp theo. Ví dụ: các giá trị

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7552 bị hạn chế nằm trong khoảng từ -127 đến 100 trong Stata và do đó, các biến có giá trị trên 100 sẽ kích hoạt chuyển đổi thành

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7553. Các giá trị

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7249 trong kiểu dữ liệu dấu phẩy động được lưu trữ dưới dạng kiểu dữ liệu bị thiếu cơ bản (

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7561 trong Stata)

Ghi chú

Không thể xuất giá trị dữ liệu bị thiếu cho kiểu dữ liệu số nguyên

Người viết Stata xử lý một cách duyên dáng các loại dữ liệu khác bao gồm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7562,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7563,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7564,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7565,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7566 bằng cách chuyển sang loại được hỗ trợ nhỏ nhất có thể biểu thị dữ liệu. Ví dụ: dữ liệu có loại

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7564 sẽ được chuyển thành

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7552 nếu tất cả các giá trị nhỏ hơn 100 (giới hạn trên đối với dữ liệu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7552 không bị thiếu trong Stata) hoặc, nếu các giá trị nằm ngoài phạm vi này, biến sẽ được chuyển thành

Cảnh báo

Chuyển đổi từ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7562 sang

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7556 có thể dẫn đến mất độ chính xác nếu giá trị

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7562 lớn hơn 2**53

Cảnh báo

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7574 và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7551 chỉ hỗ trợ các chuỗi có chiều rộng cố định chứa tối đa 244 ký tự, giới hạn do định dạng tệp dta phiên bản 115 áp đặt. Cố gắng ghi các tệp Stata dta với các chuỗi dài hơn 244 ký tự sẽ gây ra lỗi

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3327

Đọc từ định dạng Stata#

Hàm cấp cao nhất

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7577 sẽ đọc tệp dta và trả về

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7579 có thể được sử dụng để đọc tệp tăng dần

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

593

Specifying a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

90 yields a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7579 instance that can be used to read

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

90 lines from the file at a time. The

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7579 object can be used as an iterator

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

594

Để kiểm soát chi tiết hơn, hãy sử dụng

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2495 và chỉ định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

90 với mỗi lệnh gọi tới

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

18

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

595

Hiện tại,

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

3342 được truy xuất dưới dạng một cột

Tham số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7588 cho biết có nên đọc và sử dụng nhãn giá trị để tạo biến

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0524 từ chúng hay không. Các nhãn giá trị cũng có thể được truy xuất bằng hàm

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7590, hàm này yêu cầu gọi ____

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

18 trước khi sử dụng

Tham số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7592 cho biết liệu các biểu diễn giá trị bị thiếu trong Stata có nên được giữ lại hay không. Nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

61 (mặc định), các giá trị bị thiếu được biểu thị dưới dạng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7248. Nếu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32, các giá trị bị thiếu được biểu diễn bằng các đối tượng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7596 và các cột chứa các giá trị bị thiếu sẽ có kiểu dữ liệu

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

72

Ghi chú

Hỗ trợ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7598 và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7579. định dạng dta 113-115 (Stata 10-12), 117 (Stata 13) và 118 (Stata 14)

Ghi chú

Cài đặt

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7600 sẽ upcast lên kiểu dữ liệu pandas tiêu chuẩn.

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7562 cho tất cả các loại số nguyên và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7556 cho dữ liệu dấu chấm động. Theo mặc định, kiểu dữ liệu Stata được giữ nguyên khi nhập

Dữ liệu phân loại#

Dữ liệu

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0524 có thể được xuất sang tệp dữ liệu Stata dưới dạng dữ liệu được gắn nhãn giá trị. Dữ liệu đã xuất bao gồm các mã danh mục cơ bản dưới dạng giá trị dữ liệu số nguyên và danh mục dưới dạng nhãn giá trị. Stata không có tương đương rõ ràng với

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0524 và thông tin về việc biến có được sắp xếp hay không bị mất khi xuất

Cảnh báo

Stata chỉ hỗ trợ các nhãn giá trị chuỗi và do đó,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

15 được gọi trên các danh mục khi xuất dữ liệu. Exporting

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0524 variables with non-string categories produces a warning, and can result a loss of information if the

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

15 representations of the categories are not unique

Tương tự, dữ liệu được gắn nhãn có thể được nhập từ các tệp dữ liệu Stata dưới dạng các biến

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0524 bằng cách sử dụng đối số từ khóa

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7588 (

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32 theo mặc định). Đối số từ khóa

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7611 (

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

32 theo mặc định) xác định xem các biến

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0524 đã nhập có được sắp xếp hay không

Ghi chú

Khi nhập dữ liệu phân loại, giá trị của các biến trong tệp dữ liệu Stata không được bảo toàn do các biến

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0524 luôn sử dụng các kiểu dữ liệu số nguyên trong khoảng từ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7615 đến

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7616 trong đó

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7310 là số lượng phân loại. Nếu các giá trị gốc trong tệp dữ liệu Stata là bắt buộc, thì có thể nhập các giá trị này bằng cách đặt ____67618, thao tác này sẽ nhập dữ liệu gốc (chứ không phải nhãn biến). Các giá trị ban đầu có thể khớp với dữ liệu phân loại đã nhập vì có một ánh xạ đơn giản giữa các giá trị dữ liệu Stata ban đầu và mã danh mục của các biến Phân loại đã nhập. các giá trị bị thiếu được gán mã

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7615 và giá trị ban đầu nhỏ nhất được gán là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

84, giá trị nhỏ thứ hai được gán là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7621, v.v. cho đến khi giá trị ban đầu lớn nhất được gán mã

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7616

Ghi chú

Stata hỗ trợ sê-ri được dán nhãn một phần. Các chuỗi này có nhãn giá trị cho một số nhưng không phải tất cả các giá trị dữ liệu. Nhập chuỗi được gắn nhãn một phần sẽ tạo ra một

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

0524 với các danh mục chuỗi cho các giá trị được gắn nhãn và danh mục số cho các giá trị không có nhãn

định dạng SAS #

Hàm cấp cao nhất

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7624 có thể đọc (nhưng không ghi) SAS XPORT (. xpt) và (kể từ v0. 18. 0) SAS7BDAT (. sas7bdat) định dạng tập tin

Tệp SAS chỉ chứa hai loại giá trị. ASCII text and floating point values (usually 8 bytes but sometimes truncated). Đối với tệp xuất, không có chuyển đổi loại tự động thành số nguyên, ngày hoặc phân loại. Đối với các tệp SAS7BDAT, mã định dạng có thể cho phép các biến ngày được tự động chuyển đổi thành ngày. Theo mặc định, toàn bộ tệp được đọc và trả về dưới dạng

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43

Chỉ định một

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

90 hoặc sử dụng

import openpyxl
worksheet = openpyxl.load_workbook("codespeedy.xlsx")
sheet = worksheet.active
sheet.column_dimensions['A'].width = 20
worksheet.save("codespeedy1.xlsx")

2495 để lấy các đối tượng người đọc (

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7628 hoặc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7629) để đọc tệp dần dần. Các đối tượng người đọc cũng có các thuộc tính chứa thông tin bổ sung về tệp và các biến của nó

Đọc tệp SAS7BDAT

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

596

Lấy một trình vòng lặp và đọc một tệp XPORT 100.000 dòng cùng một lúc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

597

Thông số kỹ thuật cho định dạng tệp xport có sẵn trên trang web của SAS

Không có tài liệu chính thức nào cho định dạng SAS7BDAT

định dạng SPSS#

Mới trong phiên bản 0. 25. 0

Hàm cấp cao nhất

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7630 có thể đọc (nhưng không ghi) SPSS SAV (. sav) và ZSAV (. tệp định dạng zsav)

Tệp SPSS chứa tên cột. By default the whole file is read, categorical columns are converted into

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7631, and a

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 with all columns is returned

Chỉ định tham số

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

47 để có được một tập hợp con các cột. Chỉ định

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7618 để tránh chuyển đổi các cột phân loại thành

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7631

Đọc một tệp SPSS

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

598

Trích xuất một tập hợp con các cột có trong

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

47 từ tệp SPSS và tránh chuyển đổi các cột phân loại thành

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7631

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

599

Thông tin thêm về các định dạng tệp SAV và ZSAV có tại đây

Các định dạng tệp khác#

bản thân gấu trúc chỉ hỗ trợ IO với một bộ định dạng tệp giới hạn ánh xạ rõ ràng tới mô hình dữ liệu dạng bảng của nó. Để đọc và ghi các định dạng tệp khác vào và từ gấu trúc, chúng tôi khuyên dùng các gói này từ cộng đồng rộng lớn hơn

netCDF#

xarray cung cấp cấu trúc dữ liệu lấy cảm hứng từ gấu trúc

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

43 để làm việc với bộ dữ liệu đa chiều, tập trung vào định dạng tệp netCDF và chuyển đổi dễ dàng sang và từ gấu trúc

Cân nhắc về hiệu suất#

Đây là một so sánh không chính thức của các phương pháp IO khác nhau, sử dụng pandas 0. 24. 2. Thời gian phụ thuộc vào máy và nên bỏ qua những khác biệt nhỏ

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

00

Các chức năng kiểm tra sau đây sẽ được sử dụng bên dưới để so sánh hiệu suất của một số phương pháp IO

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

01

Khi viết, ba chức năng hàng đầu về tốc độ là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7639,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7640 và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7641

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

02

Khi đọc, ba chức năng hàng đầu về tốc độ là

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7642,

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7643 và

In [13]: import numpy as np

In [14]: data = "a,b,c,d\n1,2,3,4\n5,6,7,8\n9,10,11"

In [15]: print(data)
a,b,c,d
1,2,3,4
5,6,7,8
9,10,11

In [16]: df = pd.read_csv(StringIO(data), dtype=object)

In [17]: df
Out[17]: 
   a   b   c    d
0  1   2   3    4
1  5   6   7    8
2  9  10  11  NaN

In [18]: df["a"][0]
Out[18]: '1'

In [19]: df = pd.read_csv(StringIO(data), dtype={"b": object, "c": np.float64, "d": "Int64"})

In [20]: df.dtypes
Out[20]: 
a      int64
b     object
c    float64
d      Int64
dtype: object

7644

programming excel

Làm cách nào để thay đổi độ rộng của cột trong excel bằng openpyxl?

To change or modify column width size

Tệp CSV & văn bản#

Tùy chọn phân tích cú pháp #

Căn bản#

Column and index locations and names#

Cấu hình phân tích cú pháp chung#

NA và xử lý dữ liệu bị thiếu#

Xử lý ngày giờ #

Lần lặp #

Định dạng trích dẫn, nén và tệp #

Xử lý lỗi#

Chỉ định kiểu dữ liệu cột#

Chỉ định phân loại dtype#

Đặt tên và sử dụng các cột#

Xử lý tên cột#

Phân tích cú pháp tên trùng lặp#

Nhận xét và dòng trống#

Bỏ qua chú thích dòng và dòng trống#

Bình luận#

Xử lý dữ liệu Unicode#

Cột chỉ mục và dấu phân cách #

Xử lý ngày#

Chỉ định cột ngày #

Chức năng phân tích ngày #

Phân tích cú pháp CSV với các múi giờ hỗn hợp#

Suy ra định dạng ngày giờ #

International date formats#

Writing CSVs to binary file objects#

Specifying method for floating-point conversion#

Thousand separators#

NA values#

Infinity#

Returning Series#

Boolean values#

Handling “bad” lines#

Dialect#

Quoting and Escape Characters#

Files with fixed width columns#

Indexes#

Files with an “implicit” index column#

Reading an index with a import openpyxl worksheet = openpyxl.load_workbook("codespeedy.xlsx") sheet = worksheet.active sheet.column_dimensions['A'].width = 20 worksheet.save("codespeedy1.xlsx")2476#

Reading columns with a import openpyxl worksheet = openpyxl.load_workbook("codespeedy.xlsx") sheet = worksheet.active sheet.column_dimensions['A'].width = 20 worksheet.save("codespeedy1.xlsx")2476#

Automatically “sniffing” the delimiter#

Reading multiple files to create a single DataFrame#

Iterating through files chunk by chunk#

Specifying the parser engine#

Reading/writing remote files#

Writing out data#

Writing to CSV format#

Writing a formatted string#

JSON#

Writing JSON#

Orient options#

Date handling#

Fallback behavior#

Reading JSON#

Data conversion#

The Numpy parameter#

Normalization#

Line delimited json#

Table schema#

HTML#

Reading HTML content#

Writing to HTML files#

HTML Table Parsing Gotchas#

LaTeX#

Writing to LaTeX files#

XML#

Đọc XML#

Writing XML#

XML Final Notes#

Excel files#

Reading Excel files#

Specifying sheets#

Reading a import openpyxl worksheet = openpyxl.load_workbook("codespeedy.xlsx") sheet = worksheet.active sheet.column_dimensions['A'].width = 20 worksheet.save("codespeedy1.xlsx")2476#

Parsing specific columns#

Parsing dates#

Cell converters#

Dtype specifications#

Reading an index with a
import openpyxl worksheet = openpyxl.load_workbook("codespeedy.xlsx") sheet = worksheet.active sheet.column_dimensions['A'].width = 20 worksheet.save("codespeedy1.xlsx")
2476#

Reading columns with a
import openpyxl worksheet = openpyxl.load_workbook("codespeedy.xlsx") sheet = worksheet.active sheet.column_dimensions['A'].width = 20 worksheet.save("codespeedy1.xlsx")
2476#

Reading a
import openpyxl worksheet = openpyxl.load_workbook("codespeedy.xlsx") sheet = worksheet.active sheet.column_dimensions['A'].width = 20 worksheet.save("codespeedy1.xlsx")
2476#