Why encode () is used in python?

Question

In this tutorial, we will learn about the Python String encode() method with the help of examples.

Nội dung chính Show

Syntax of String encode()
String encode() Parameters
Example 1: Encode to Default Utf-8 Encoding
Example 2: Encoding with error parameter
String Encoding
Python String encode() Method Syntax:
Python String encode() Method Example:
Example 1: Code to print encoding schemes available
Example 2: Code to encode the string
Errors when using wrong encoding scheme
Example 1: Python String encode() method will raise UnicodeEncodeError if wrong encoding scheme is used
Example 2: Using ‘errors’ parameter to ignore errors while encoding
Why encode is used in Python?
What is encode used for?
What is the purpose of encode () and decode () method in Python?
Why do we need to encode and decode?

The encode() method returns an encoded version of the given string.

Example

title = 'Python Programming'

# change encoding to utf-8
print(title.encode())

# Output: b'Python Programming'

Syntax of String encode()

The syntax of encode() method is:

string.encode(encoding='UTF-8',errors='strict')

String encode() Parameters

By default, the encode() method doesn't require any parameters.

It returns an utf-8 encoded version of the string. In case of failure, it raises a UnicodeDecodeError exception.

However, it takes two parameters:

encoding - the encoding type a string has to be encoded to
errors - response when encoding fails. There are six types of error response
- strict - default response which raises a UnicodeDecodeError exception on failure
- ignore - ignores the unencodable unicode from the result
- replace - replaces the unencodable unicode to a question mark ?
- xmlcharrefreplace - inserts XML character reference instead of unencodable unicode
- backslashreplace - inserts a \uNNNN escape sequence instead of unencodable unicode
- namereplace - inserts a \N{...} escape sequence instead of unencodable unicode

Example 1: Encode to Default Utf-8 Encoding

# unicode string
string = 'pythön!'

# print string
print('The string is:', string)

# default encoding to utf-8
string_utf = string.encode()

# print result
print('The encoded version is:', string_utf)

Output

The string is: pythön!
The encoded version is: b'pyth\xc3\xb6n!'

Example 2: Encoding with error parameter

# unicode string
string = 'pythön!'

# print string
print('The string is:', string)

# ignore error
print('The encoded version (with ignore) is:', string.encode("ascii", "ignore"))

# replace error
print('The encoded version (with replace) is:', string.encode("ascii", "replace"))

Output

The string is: pythön!
The encoded version (with ignore) is: b'pythn!'
The encoded version (with replace) is: b'pyth?n!'

Note: Try different encoding and error parameters as well.

String Encoding

Since Python 3.0, strings are stored as Unicode, i.e. each character in the string is represented by a code point. So, each string is just a sequence of Unicode code points.

For efficient storage of these strings, the sequence of code points is converted into a set of bytes. The process is known as encoding.

There are various encodings present which treat a string differently. The popular encodings being utf-8, ascii, etc.

Using the string encode() method, you can convert unicode strings into any encodings supported by Python. By default, Python uses utf-8 encoding.

Python String encode() converts a string value into a collection of bytes, using an encoding scheme specified by the user.

Python String encode() Method Syntax:

Syntax: encode(encoding, errors)
Parameters:
encoding: Specifies the encoding on the basis of which encoding has to be performed.
errors: Decides how to handle the errors if they occur, e.g ‘strict’ raises Unicode error in case of exception and ‘ignore’ ignores the errors that occurred. There are six types of error response
strict – default response which raises a UnicodeDecodeError exception on failure
ignore – ignores the unencodable unicode from the result
replace – replaces the unencodable unicode to a question mark ?
xmlcharrefreplace – inserts XML character reference instead of unencodable unicode
backslashreplace – inserts a \uNNNN escape sequence instead of unencodable unicode
namereplace – inserts a \N{…} escape sequence instead of unencodable unicode
Return: Returns the string in the encoded form

Python String encode() Method Example:

Python3

print("¶".encode('utf-8'))

Output:

b'\xc2\xb6'

Example 1: Code to print encoding schemes available

There are certain encoding schemes supported by Python String encode() method. We can get the supported encodings using the Python code below.

Python3

from encodings.aliases import aliases

print("The available encodings are : ")

print(aliases.keys())

Output:

The available encodings are : 
dict_keys(['ibm039', 'iso_ir_226', '1140', 'iso_ir_110', '1252', 'iso_8859_8', 'iso_8859_3', 'iso_ir_166', 'cp367', 'uu', 'quotedprintable', 'ibm775', 'iso_8859_16_2001', 'ebcdic_cp_ch', 'gb2312_1980', 'ibm852', 'uhc', 'macgreek', '850', 'iso2022jp_2', 'hz_gb_2312', 'elot_928', 'iso8859_1', 'eucjp', 'iso_ir_199', 'ibm865', 'cspc862latinhebrew', '863', 'iso_8859_5', 'latin4', 'windows_1253', 'csisolatingreek', 'latin5', '855', 'windows_1256', 'rot13', 'ms1361', 'windows_1254', 'ibm863', 'iso_8859_14_1998', 'utf8_ucs2', '500', 'iso8859', '775', 'l7', 'l2', 'gb18030_2000', 'l9', 'utf_32be', 'iso_ir_100', 'iso_8859_4', 'iso_ir_157', 'csibm857', 'shiftjis2004', 'iso2022jp_1', 'iso_8859_2_1987', 'cyrillic', 'ibm861', 'ms950', 'ibm437', '866', 'csibm863', '932', 'iso_8859_14', 'cskoi8r', 'csptcp154', '852', 'maclatin2', 'sjis', 'korean', '865', 'u32', 'csshiftjis', 'dbcs', 'csibm037', 'csibm1026', 'bz2', 'quopri', '860', '1255', '861', 'iso_ir_127', 'iso_celtic', 'chinese', 'l8', '1258', 'u_jis', 'cspc850multilingual', 'iso_2022_jp_2', 'greek8', 'csibm861', '646', 'unicode_1_1_utf_7', 'ibm862', 'latin2', 'ecma_118', 'csisolatinarabic', 'zlib', 'iso2022jp_3', 'ksx1001', '858', 'hkscs', 'shiftjisx0213', 'base64', 'ibm857', 'maccentraleurope', 'latin7', 'ruscii', 'cp_is', 'iso_ir_101', 'us_ascii', 'hebrew', 'ansi_x3.4_1986', 'csiso2022jp', 'iso_8859_15', 'ibm860', 'ebcdic_cp_us', 'x_mac_simp_chinese', 'csibm855', '1250', 'maciceland', 'iso_ir_148', 'iso2022jp', 'u16', 'u7', 's_jisx0213', 'iso_8859_6_1987', 'csisolatinhebrew', 'csibm424', 'quoted_printable', 'utf_16le', 'tis260', 'utf', 'x_mac_trad_chinese', '1256', 'cp866u', 'jisx0213', 'csiso58gb231280', 'windows_1250', 'cp1361', 'kz_1048', 'asmo_708', 'utf_16be', 'ecma_114', 'eucjis2004', 'x_mac_japanese', 'utf8', 'iso_ir_6', 'cp_gr', '037', 'big5_tw', 'eucgb2312_cn', 'iso_2022_jp_3', 'euc_cn', 'iso_8859_13', 'iso_8859_5_1988', 'maccyrillic', 'ks_c_5601_1987', 'greek', 'ibm869', 'roman8', 'csibm500', 'ujis', 'arabic', 'strk1048_2002', '424', 'iso_8859_11_2001', 'l5', 'iso_646.irv_1991', '869', 'ibm855', 'eucjisx0213', 'latin1', 'csibm866', 'ibm864', 'big5_hkscs', 'sjis_2004', 'us', 'iso_8859_7', 'macturkish', 'iso_2022_jp_2004', '437', 'windows_1255', 's_jis_2004', 's_jis', '1257', 'ebcdic_cp_wt', 'iso2022jp_2004', 'ms949', 'utf32', 'shiftjis', 'latin', 'windows_1251', '1125', 'ks_x_1001', 'iso_8859_10_1992', 'mskanji', 'cyrillic_asian', 'ibm273', 'tis620', '1026', 'csiso2022kr', 'cspc775baltic', 'iso_ir_58', 'latin8', 'ibm424', 'iso_ir_126', 'ansi_x3.4_1968', 'windows_1257', 'windows_1252', '949', 'base_64', 'ms936', 'csisolatin2', 'utf7', 'iso646_us', 'macroman', '1253', '862', 'iso_8859_1_1987', 'csibm860', 'gb2312_80', 'latin10', 'ksc5601', 'iso_8859_10', 'utf8_ucs4', 'csisolatin4', 'ebcdic_cp_be', 'iso_8859_1', 'hzgb', 'ansi_x3_4_1968', 'ks_c_5601', 'l3', 'cspc8codepage437', 'iso_8859_7_1987', '8859', 'ibm500', 'ibm1026', 'iso_8859_6', 'csibm865', 'ibm866', 'windows_1258', 'iso_ir_138', 'l4', 'utf_32le', 'iso_8859_11', 'thai', '864', 'euc_jis2004', 'cp936', '1251', 'zip', 'unicodebigunmarked', 'csHPRoman8', 'csibm858', 'utf16', '936', 'ibm037', 'iso_8859_8_1988', '857', 'csibm869', 'ebcdic_cp_he', 'cp819', 'euccn', 'iso_8859_2', 'ms932', 'iso_2022_jp_1', 'iso_2022_kr', 'csisolatin6', 'iso_2022_jp', 'x_mac_korean', 'latin3', 'csbig5', 'hz_gb', 'csascii', 'u8', 'csisolatin5', 'csisolatincyrillic', 'ms_kanji', 'cspcp852', 'rk1048', 'iso2022jp_ext', 'csibm273', 'iso_2022_jp_ext', 'ibm858', 'ibm850', 'sjisx0213', 'tis_620_2529_1', 'l10', 'iso_ir_109', 'ibm1125', '1254', 'euckr', 'tis_620_0', 'l1', 'ibm819', 'iso2022kr', 'ibm367', '950', 'r8', 'hex', 'cp154', 'tis_620_2529_0', 'iso_8859_16', 'pt154', 'ebcdic_cp_ca', 'ibm1140', 'l6', 'csibm864', 'csisolatin1', 'csisolatin3', 'latin6', 'iso_8859_9_1989', 'iso_8859_3_1988', 'unicodelittleunmarked', 'macintosh', '273', 'latin9', 'iso_8859_4_1988', 'iso_8859_9', 'ebcdic_cp_nl', 'iso_ir_144'])

Example 2: Code to encode the string

Python3

string = "¶"

print(string.encode('utf-8'))

Output:

b'\xc2\xb6'

Errors when using wrong encoding scheme

Example 1: Python String encode() method will raise UnicodeEncodeError if wrong encoding scheme is used

Python3

string = "¶"

print(string.encode('ascii'))

Output:

UnicodeEncodeError: 'ascii' codec can't encode character '\xb6' in position 0: ordinal not in range(128)

Example 2: Using ‘errors’ parameter to ignore errors while encoding

Python String encode() method with errors parameter set to ‘ignore’ will ignore the errors in conversion of characters into specified encoding scheme.

Python3

string = "123-¶"

print(string.encode('ascii', errors='ignore'))

Output:

b'123-'

Why encode is used in Python?

Since Python 3.0, strings are stored as Unicode, i.e. each character in the string is represented by a code point. So, each string is just a sequence of Unicode code points. For efficient storage of these strings, the sequence of code points is converted into a set of bytes. The process is known as encoding.

What is encode used for?

Encoding involves the use of a code to change original data into a form that can be used by an external process. The type of code used for converting characters is known as American Standard Code for Information Interchange (ASCII), the most commonly used encoding scheme for files that contain text.

What is the purpose of encode () and decode () method in Python?

Since encode() converts a string to bytes, decode() simply does the reverse. This shows that decode() converts bytes to a Python string. Similar to those of encode() , the decoding parameter decides the type of encoding from which the byte sequence is decoded.

Why do we need to encode and decode?

Encoding and decoding are used in many forms of communications, including computing, data communications, programming, digital electronics and human communications. These two processes involve changing the format of content for optimal transmission or storage.