How do you convert bytes to text in python?

If you don't know the encoding, then to read binary input into string in Python 3 and Python 2 compatible way, use the ancient MS-DOS CP437 encoding:

PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('cp437'))

Because encoding is unknown, expect non-English symbols to translate to characters of cp437 (English characters are not translated, because they match in most single byte encodings and UTF-8).

Decoding arbitrary binary input to UTF-8 is unsafe, because you may get this:

>>> b'\x00\x01\xffsd'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 2: invalid
start byte

The same applies to latin-1, which was popular (the default?) for Python 2. See the missing points in Codepage Layout - it is where Python chokes with infamous ordinal not in range.

UPDATE 20150604: There are rumors that Python 3 has the surrogateescape error strategy for encoding stuff into binary data without data loss and crashes, but it needs conversion tests, [binary] -> [str] -> [binary], to validate both performance and reliability.

UPDATE 20170116: Thanks to comment by Nearoo - there is also a possibility to slash escape all unknown bytes with backslashreplace error handler. That works only for Python 3, so even with this workaround you will still get inconsistent output from different Python versions:

PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('utf-8', 'backslashreplace'))

See Python’s Unicode Support for details.

UPDATE 20170119: I decided to implement slash escaping decode that works for both Python 2 and Python 3. It should be slower than the cp437 solution, but it should produce identical results on every Python version.

# --- preparation

import codecs

def slashescape(err):
    """ codecs error handler. err is UnicodeDecode instance. return
    a tuple with a replacement for the unencodable part of the input
    and a position where encoding should continue"""
    #print err, dir(err), err.start, err.end, err.object[:err.start]
    thebyte = err.object[err.start:err.end]
    repl = u'\\x'+hex(ord(thebyte))[2:]
    return (repl, err.end)

codecs.register_error('slashescape', slashescape)

# --- processing

stream = [b'\x80abc']

lines = []
for line in stream:
    lines.append(line.decode('utf-8', 'slashescape'))

Introduction

In this article, we'll take a look at how to convert Bytes to a String in Python. By the end of this article you will have a clear idea of what these types are and how to effectively handle data using them.

Depending on the version of Python you're using, this task will differ. Although Python 2 has reached its end of life, many projects still use it, so we'll include both the Python 2 and Python 3 approaches.

Convert Bytes to String in Python 3

Since Python 3, the old ASCII way of doing things had to go, and Python became completely Unicode.

This means that we lost the explicit unicode type: u"string" - every string is a u"string"!

To differentiate these strings from good old bytestrings, we're introduced to a new specifier for them - the b"string".

This was added in Python 2.6, but it served no real purpose other than to prepare for Python 3 as all strings were bytestrings in 2.6.

Bytestrings in Python 3 are officially called bytes, an immutable sequence of integers in the range 0 <= x < 256. Another bytes-like object added in 2.6 is the bytearray - similar to bytes, but mutable.

Convert Bytes to String with decode()

Let's take a look at how we can convert bytes to a String, using the built-in decode() method for the bytes class:

>>> b = b"Lets grab a \xf0\x9f\x8d\x95!"
# Let's check the type
>>> type(b)
<class 'bytes'>

# Now, let's decode/convert them into a string
>>> s = b.decode('UTF-8')
>>> s
"Let's grab a 🍕!"

Passing the encoding format, we've decoded the bytes object into a string and printed it.

Convert Bytes to String with codecs

Alternatively, we can use the built-in codecs module for this purpose as well:

>>> import codecs
>>> b = b'Lets grab a \xf0\x9f\x8d\x95!'

>>> codecs.decode(b, 'UTF-8')
"Let's grab a 🍕!"

You don't really need to pass in the encoding parameter, though, it is advised to pass it in:

>>> codecs.decode(b)
"Let's grab a 🍕!"

Convert Bytes to String with str()

Finally, you can use the str() function, which accepts various values and converts them into strings:

>>> b = b'Lets grab a \xf0\x9f\x8d\x95!'
>>> str(b, 'UTF-8')
"Let's grab a 🍕!"

Make sure to provide the encoding argument to str() though, otherwise you might get some unexpected results:

>>> str(b)
b'Lets grab a \xf0\x9f\x8d\x95!'

This brings us to encodings once again. If you specify the wrong encoding, the best case is your program crashing because it can't decode the data. For example, if we tried using the str() function with UTF-16, we'd be greeted with:

>>> str(b, 'UTF-16')
'敌❴\u2073牧扡愠\uf020趟↕'

This is even more important given that Python 3 likes to assume Unicode - so if you are working with files or data sources that use an obscure encoding, make sure to pay extra attention.

Convert Bytes to String in Python 2

In Python 2, a bundle of bytes and a string are practically the same thing - strings are objects consisting of 1-byte long characters, meaning that each character can store 256 values. That's why they are sometimes called bytestrings.

This is great when working with byte data - we just load it into a variable and we are ready to print:

>>> s = "Hello world!"

>>> s
'Hello world!'

>>> len(s)
12

Using Unicode characters in bytestrings does change this behavior a bit though:

>>> s = "Let's grab a 🍕!"

>>> s
'Lets grab a \xf0\x9f\x8d\x95!'
# Where has the pizza gone to?

>>> len(s)
17
# Shouldn't that be 15?

Convert Bytes to Unicode (Python 2)

Here, we'll have to use Python 2's Unicode type, which is assumed and automatically used in Python 3. This stores strings as a series of code points, rather than bytes.

The \xf0\x9f\x8d\x95 represents bytes as two-digit hex numbers as Python doesn't know how to represent them as ASCII characters:

>>> u = u"Let's grab a 🍕!"
u"Let's grab a \U0001f355!""

>>> u
"Let's grab a 🍕!"
# Yum.

>>> len(u)
15

As you can see above, the Unicode string contains \U0001f355 - a Unicode escaped character which our terminal now knows how to print out as a slice of pizza! Setting this was as easy as using the u specifier before the value of the bytestring.

So, how do I switch between the two?

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

You can get the Unicode string by decoding your bytestring. This can be done by constructing a Unicode object, providing the bytestring and a string containing the encoding name as arguments or by calling .decode(encoding) on a bytestring.

Convert Bytes to String Using decode() (Python 2)

You can also use the codecs.encode(s, encoding) from the codecs module.

>>> s = "Let's grab a \xf0\x9f\x8d\x95!"
>>> u = unicode(s, 'UTF-8')

>>> u
"Let's grab a 🍕!"

>>> s.decode('UTF-8')
"Let's grab a 🍕!"

Convert Bytes to String Using codecs (Python 2)

Or, using the codecs module:

import codecs

>>> codecs.decode(s, 'UTF-8')
"Let's grab a 🍕!"

Be Mindful of your Encoding

A word of caution here - bytes can be interpreted differently in different encodings. With around 80 different encodings available out of the box, it might not be easy to know if you've got the right one!

s = '\xf8\xe7'

# This one will let us know we used the wrong encoding

>>> s.decode('UTF-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf8 in position 0:
invalid start byte

# These two overlaps and this is a valid string in both

>>> s.decode('latin1')
øç

s.decode('iso8859_5')
јч

The original message was either øç or јч, and both appear to be valid conversions.

Conclusion

As programmers, there are some things we must constantly think about and actively prepare for in order to avoid pitfalls. This holds especially true on the lower levels, where we seldom go when we use a high-level language like Python as our daily driver.

Things like charsets, encodings and binary are there to remind us that our job is to code - to encode our thoughts into working solutions. Thankfully, a lot of this thinking becomes part of our routine after a few rounds at the keyboard.

In this article, we've gone over how to convert bytes to Strings in Python.

How do you convert a byte to a string in Python?

Different ways to convert Bytes to string in Python:.
Using decode() method..
Using str() function..
Using codecs. decode() method..
Using map() without using the b prefix..
Using pandas to convert bytes to strings..

How would you convert bytes to string?

One method is to create a string variable and then append the byte value to the string variable with the help of + operator. This will directly convert the byte value to a string and add it in the string variable. The simplest way to do so is using valueOf() method of String class in java.

How do you decode bytes in Python?

decode() is used to decode bytes to a string object. Decoding to a string object depends on the specified arguments. It also allows us to mention an error handling scheme to use for seconding errors.

How do you write bytes in a text file?

Use open() and file. Open a file for writing in binary mode using open(file, mode) with file as the file name and mode as "wb" . Use file. write(text) with file as the opened file and text as the bytes to write data to a file. Once finished, close the file using file.