Which method is used to read a single line from a file python?

Question

View Discussion

Nội dung chính Show

Introduction
Basic File IO in Python
Read a File Line-by-Line with readlines()
Read a File Line-by-Line with a for Loop - Most Pythonic Approach
Applications of Reading Files Line-by-Line
How do you read a single line in Python?
Which method reads one line from a file?

Improve Article

Save Article

Read

Discuss

View Discussion

Improve Article

Save Article

Text files are composed of plain text content. Text files are also known as flat files or plain files. Python provides easy support to read and access the content within the file. Text files are first opened and then the content is accessed from it in the order of lines. By default, the line numbers begin with the 0th index. There are various ways to read specific lines from a text file in python, this article is aimed at discussing them.

File in use: test.txt

Method 1: fileobject.readlines()

A file object can be created in Python and then readlines() method can be invoked on this object to read lines into a stream. This method is preferred when a single line or a range of lines from a file needs to be accessed simultaneously. It can be easily used to print lines from any random starting index to some ending index. It initially reads the entire content of the file and keep a copy of it in memory. The lines at the specified indices are then accessed.

Example:

Python3

file = open('test.txt')

content = file.readlines()

print("tenth line")

print(content[9])

print("first three lines")

print(content[0:3])

Output

tenth line

This is line 10.
first three lines

This is line 1.This is line 2.This is line 3.

Method 2: linecache package

The linecache package can be imported in Python and then be used to extract and access specific lines in Python. The package can be used to read multiple lines simultaneously. It makes use of cache storage to perform optimization internally. This package opens the file on its own and gets to the particular line. This package has getline() method which is used for the same.

Syntax:

getLine(txt-file, line_number)

Example:

Python3

import linecache

particular_line = linecache.getline('test.txt', 4)

print(particular_line)

Output :

This is line 5.

Method 3: enumerate()

The enumerate() method is used to convert a string or a list object to a sequence of data indexed by numbers. It is then used in the listing of the data in combination with for loop. Lines at particular indexes can be accessed by specifying the index numbers required in an array.

Example:

Python3

file = open("test.txt")

specified_lines = [0, 7, 11]

for pos, l_num in enumerate(file):

if pos in specified_lines:

print(l_num)

Output

This is line 1.
This is line 8.
This is line 12.

Introduction

A common task in programming is opening a file and parsing its contents. What do you do when the file you are trying to process is quite large, like several GB of data or larger? The answer to this problem is to read in chunks of a file at a time, process it, then free it from memory so you can process another chunk until the whole massive file has been processed. While it's up to you to determine a suitable size for the chunks of data you're processing, for many applications, it's suitable to process a file one line at a time.

Throughout this article, we'll be covering a number of code examples that demonstrate how to read files line by line. In case you want to try out some of these examples by yourself, the code used in this article can be found at the following GitHub repo.

Basic File IO in Python
Read a File Line-by-Line in Python with readline()
Read a File Line-by-Line in Python with readlines()
Read a File Line-by-Line with a for Loop - Best Approach!
Applications of Reading Files Line-by-Line

Basic File IO in Python

Python is a great general-purpose programming language, and it has a number of very useful file IO functionality in its standard library of built-in functions and modules.

The built-in open() function is what you use to open a file object for either reading or writing purposes. Here's how you can use it to open a file:

fp = open('path/to/file.txt', 'r')

As demonstrated above, the open() function takes in multiple arguments. We will be focusing on two arguments, with the first being a positional string parameter representing the path to the file you want to open. The second (optional) parameter is also a string, and it specifies the mode of interaction you intend to be used on the file object being returned by the function call. The most common modes are listed in the table below, with the default being 'r' for reading:

Mode	Description
`r`	Open for reading plain text
`w`	Open for writing plain text
`a`	Open an existing file for appending plain text
`rb`	Open for reading binary data
`wb`	Open for writing binary data

Once you have written or read all of the desired data in a file object, you need to close the file so that resources can be reallocated on the operating system that the code is running on.

fp.close()

Note: It's always good practice to close a file object resource, but it's a task that's easy to forget.

While you can always remember to call close() on a file object, there's an alternate and more elegant way to open a file object and ensure that the Python interpreter cleans up after its use:

with open('path/to/file.txt') as fp:
    # Do stuff with fp

By simply using the with keyword (introduced in Python 2.5) to the code we use to open a file object, Python will do something similar to the following code. This ensures that no matter what the file object is closed after use:

try:
    fp = open('path/to/file.txt')
    # Do stuff with fp
finally:
    fp.close()

Either of these two methods is suitable, with the first example being more Pythonic.

The file object returned from the open() function has three common explicit methods (read(), readline(), and readlines()) to read in data. The read() method reads all the data into a single string. This is useful for smaller files where you would like to do text manipulation on the entire file. Then there is readline(), which is a useful way to only read in individual lines, in incremental amounts at a time, and return them as strings. The last explicit method, readlines(), will read all the lines of a file and return them as a list of strings.

Note: For the remainder of this article we will be working with the text of the book The "Iliad of Homer", which can be found at gutenberg.org, as well as in the GitHub repo where the code is for this article.

Let's start off with the readline() method, which reads a single line, which will require us to use a counter and increment it:

filepath = 'Iliad.txt'
with open(filepath) as fp:
   line = fp.readline()
   cnt = 1
   while line:
       print("Line {}: {}".format(cnt, line.strip()))
       line = fp.readline()
       cnt += 1

This code snippet opens a file object whose reference is stored in fp, then reads in a line one at a time by calling readline() on that file object iteratively in a while loop. It then simply prints the line to the console.

Running this code, you should see something like the following:

...
Line 567: exceedingly trifling. We have no remaining inscription earlier than the
Line 568: fortieth Olympiad, and the early inscriptions are rude and unskilfully
Line 569: executed; nor can we even assure ourselves whether Archilochus, Simonides
Line 570: of Amorgus, Kallinus, Tyrtaeus, Xanthus, and the other early elegiac and
Line 571: lyric poets, committed their compositions to writing, or at what time the
Line 572: practice of doing so became familiar. The first positive ground which
Line 573: authorizes us to presume the existence of a manuscript of Homer, is in the
Line 574: famous ordinance of Solon, with regard to the rhapsodies at the
Line 575: Panathenaea: but for what length of time previously manuscripts had
Line 576: existed, we are unable to say.
...

Though, this approach is crude and explicit. Most certainly not very Pythonic. We can utilize the readlines() method to make this code much more succinct.

Read a File Line-by-Line with readlines()

The readlines() method reads all the lines and stores them into a List. We can then iterate over that list and using enumerate(), make an index for each line for our convenience:

file = open('Iliad.txt', 'r')
lines = file.readlines()

for index, line in enumerate(lines):
    print("Line {}: {}".format(index, line.strip()))
    
file.close()

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

This results in:

...
Line 160: INTRODUCTION.
Line 161:
Line 162:
Line 163: Scepticism is as much the result of knowledge, as knowledge is of
Line 164: scepticism. To be content with what we at present know, is, for the most
Line 165: part, to shut our ears against conviction; since, from the very gradual
Line 166: character of our education, we must continually forget, and emancipate
Line 167: ourselves from, knowledge previously acquired; we must set aside old
Line 168: notions and embrace fresh ones; and, as we learn, we must be daily
Line 169: unlearning something which it has cost us no small labour and anxiety to
Line 170: acquire.
...

Now, although much better, we don't even need to call the readlines() method to achieve this same functionality. This is the traditional way of reading a file line-by-line, but there's a more modern, shorter one.

Read a File Line-by-Line with a for Loop - Most Pythonic Approach

The returned File itself is an iterable. We don't need to extract the lines via readlines() at all - we can iterate the returned object itself. This also makes it easy to enumerate() it so we can write the line number in each print() statement.

This is the shortest, most Pythonic approach to solving the problem, and the approach favored by most:

with open('Iliad.txt') as f:
    for index, line in enumerate(f):
        print("Line {}: {}".format(index, line.strip()))

This results in:

...
Line 277: Mentes, from Leucadia, the modern Santa Maura, who evinced a knowledge and
Line 278: intelligence rarely found in those times, persuaded Melesigenes to close
Line 279: his school, and accompany him on his travels. He promised not only to pay
Line 280: his expenses, but to furnish him with a further stipend, urging, that,
Line 281: "While he was yet young, it was fitting that he should see with his own
Line 282: eyes the countries and cities which might hereafter be the subjects of his
Line 283: discourses." Melesigenes consented, and set out with his patron,
Line 284: "examining all the curiosities of the countries they visited, and
...

Here, we're taking advantage of the built-in functionalities of Python that allow us to effortlessly iterate over an iterable object, simply using a for loop. If you'd like to read more about Python's built-in functionalities on iterating objects, we've got you covered:

Python's itertools – count(), cycle() and chain()
Python's itertools: filter(), islice(), map() and zip()

Applications of Reading Files Line-by-Line

How can you use this practically? Most NLP applications deal with large corpora of data. Most of the time, it won't be wise to read the entire corpora into memory. While rudimentary, you can write a from-scratch solution to count the frequency of certain words, without using any external libraries. Let's write a simple script that loads in a file, reads it line-by-line, and counts the frequency of words, printing the 10 most frequent words and the number of their occurrences:

import sys
import os

def main():
   filepath = sys.argv[1]
   if not os.path.isfile(filepath):
       print("File path {} does not exist. Exiting...".format(filepath))
       sys.exit()
  
   bag_of_words = {}
   with open(filepath) as fp:
       for line in fp:
           record_word_cnt(line.strip().split(' '), bag_of_words)
   sorted_words = order_bag_of_words(bag_of_words, desc=True)
   print("Most frequent 10 words {}".format(sorted_words[:10]))
  
def order_bag_of_words(bag_of_words, desc=False):
   words = [(word, cnt) for word, cnt in bag_of_words.items()]
   return sorted(words, key=lambda x: x[1], reverse=desc)

def record_word_cnt(words, bag_of_words):
    for word in words:
        if word != '':
            if word.lower() in bag_of_words:
                bag_of_words[word.lower()] += 1
            else:
                bag_of_words[word.lower()] = 1

if __name__ == '__main__':
    main()

The script uses the os module to make sure that the file we're attempting to read actually exists. If so, its read line-by-line and each line is passed on into the record_word_cnt() function. It delimits the spaces between words and adds the word to the dictionary - bag_of_words. Once all the lines are recorded into the dictionary, we order it via order_bag_of_words() which returns a list of tuples in the (word, word_count) format, sorted by the word count.

Finally, we print the top ten most common words.

Typically, for this, you'd create a Bag of Words Model, using libraries like NLTK, though, this implementation will suffice. Let's run the script and provide our Iliad.txt to it:

$ python app.py Iliad.txt

This results in:

Most frequent 10 words [('the', 15633), ('and', 6959), ('of', 5237), ('to', 4449), ('his', 3440), ('in', 3158), ('with', 2445), ('a', 2297), ('he', 1635), ('from', 1418)]

Conclusion

In this article, we've explored multiple ways to read a file line-by-line in Python, as well as created a rudimentary Bag of Words model to calculate the frequency of words in a given file.

How do you read a single line in Python?

Use the linecache. getline() method to read specific line from a file. Get line lineno from a file named filename . This function will not return any error if the line is not present in a file instead, it will return an empty string.