Download file requests python 3

Motivation

Sometimes, we are want to get the picture but not need to download it to real files,

i.e., download the data and keep it on memory.

For example, If I use the machine learning method, train a model that can recognize an image with the number (bar code).

When I spider some websites and that have those images so I can use the model to recognize it,

and I don't want to save those pictures on my disk drive,

then you can try the below method to help you keep download data on memory.

Points

import requests
from io import BytesIO
response = requests.get(url)
with BytesIO as io_obj:
    for chunk in response.iter_content(chunk_size=4096):
        io_obj.write(chunk)

basically, is like to @Ranvijay Kumar

An Example

import requests
from typing import NewType, TypeVar
from io import StringIO, BytesIO
import matplotlib.pyplot as plt
import imageio

URL = NewType('URL', str)
T_IO = TypeVar('T_IO', StringIO, BytesIO)


def download_and_keep_on_memory(url: URL, headers=None, timeout=None, **option) -> T_IO:
    chunk_size = option.get('chunk_size', 4096)  # default 4KB
    max_size = 1024 ** 2 * option.get('max_size', -1)  # MB, default will ignore.
    response = requests.get(url, headers=headers, timeout=timeout)
    if response.status_code != 200:
        raise requests.ConnectionError(f'{response.status_code}')

    instance_io = StringIO if isinstance(next(response.iter_content(chunk_size=1)), str) else BytesIO
    io_obj = instance_io()
    cur_size = 0
    for chunk in response.iter_content(chunk_size=chunk_size):
        cur_size += chunk_size
        if 0 < max_size < cur_size:
            break
        io_obj.write(chunk)
    io_obj.seek(0)
    """ save it to real file.
    with open('temp.png', mode='wb') as out_f:
        out_f.write(io_obj.read())
    """
    return io_obj


def main():
    headers = {
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
        'Accept-Encoding': 'gzip, deflate',
        'Accept-Language': 'zh-TW,zh;q=0.9,en-US;q=0.8,en;q=0.7',
        'Cache-Control': 'max-age=0',
        'Connection': 'keep-alive',
        'Host': 'statics.591.com.tw',
        'Upgrade-Insecure-Requests': '1',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
    }
    io_img = download_and_keep_on_memory(URL('http://statics.591.com.tw/tools/showPhone.php?info_data=rLsGZe4U%2FbphHOimi2PT%2FhxTPqI&type=rLEFMu4XrrpgEw'),
                                         headers,  # You may need this. Otherwise, some websites will send the 404 error to you.
                                         max_size=4)  # max loading < 4MB
    with io_img:
        plt.rc('axes.spines', top=False, bottom=False, left=False, right=False)
        plt.rc(('xtick', 'ytick'), color=(1, 1, 1, 0))  # same of plt.axis('off')
        plt.imshow(imageio.imread(io_img, as_gray=False, pilmode="RGB"))
        plt.show()


if __name__ == '__main__':
    main()

Using Python to download files from the internet is super easy and possible using only standard library functions

Download file requests python 3

Photo from OverCoded.

Other libraries, most notably the Python Requests library, can provide a clearer API for those more concerned with higher-level operations. This article outlines three ways to download a file using Python with a short discussion of each.

1. urllib.request.urlretrieve

Python’s urllib library offers a range of functions designed to handle common URL-related tasks. This includes parsing, requesting, and — you guessed it — downloading files. Let’s consider a basic example of downloading the robots.txt file from www.google.com:

Note: urllib’s urlretrieve is considered “legacy” from Python 2 and, in the words of the Python documentation, “might become deprecated at some point in the future.” In my opinion, there’s a big divide between “might” become deprecated and “will” become deprecated. In other words, this is probably a safe approach for the foreseeable future.

2. requests.get + manual save

The Python Requests module is a super-friendly library billed as “HTTP for humans.” Offering very simplified APIs, Requests lives up to its motto for even high-throughput HTTP-related demands. However, it doesn’t feature a one-liner for downloading files. Instead, one must manually save streamed file data as follows:

There are some important aspects of this approach to keep in mind — most notably the binary format of data transfer. When a web browser loads a page (or file), it encodes it using the specified encoding from the host.

Common encodings include UTF-8 and Latin-1. This is a directive aimed at web browsers that are receiving and displaying data that isn’t immediately applicable to downloading files.

Note: Downloaded files may require encoding in order to display properly. That’s beyond the scope of this tutorial.

3. wget.download

The wget Python library offers a method similar to urllib and attracts a lot of attention due to its name being identical to the Linux wget command. This module was last updated in 2015.

Note: The wget.download function uses a combination of urllib, tempfile, and shutil to retrieve the downloaded data, save to a temporary file, and then move that file (and rename it) to the specified location.

Final Thoughts

Downloading files with Python is super simple and can be accomplished using the standard urllib functions. I’ve found the Requests library to offer the easiest and most versatile APIs for common HTTP-related tasks. One notable exception is the URL parsing features of urllib. Those are strictly HTTP-related, though, so I don’t take points away from requests!

The article How to Download Files with Python was originally published on OverCoded and has been republished here with permission. Updates to the original article may not be reflected in this posting.