Most frequently used words in a text python

Simple Python script without the use of heavy text processing libraries to extract most common words from a corpus.

What is the most used word in all of Shakespeare plays? Was ‘king’ more often used than ‘Lord’ or vice versa?

To answer these type of fun questions, one often needs to quickly examine and plot most frequent words in a text file (often downloaded from open source portals such as Project Gutenberg). However, if you search on the web or on Stackoverflow, you will most probably see examples of nltk and use of CountVectorizer. While they are incredibly powerful and fun to use, the matter of the fact is, you don’t need them if the only thing you want is to extract most common words appearing in a single text corpus.

Below, I am showing a very simple Python 3 code snippet to do just that — using only a dictionary and simple string manipulation methods.

Feel free to copy the code and use your own stopwords to make it better!

import collections
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline# Read input file, note the encoding is specified here
# It may be different in your text file
file = open('PrideandPrejudice.txt', encoding="utf8")
a= file.read()# Stopwords
stopwords = set(line.strip() for line in open('stopwords.txt'))
stopwords = stopwords.union(set(['mr','mrs','one','two','said']))# Instantiate a dictionary, and for every word in the file,
# Add to the dictionary if it doesn't exist. If it does, increase the count.
wordcount = {}# To eliminate duplicates, remember to split by punctuation, and use case demiliters.
for word in a.lower().split():
word = word.replace(".","")
word = word.replace(",","")
word = word.replace(":","")
word = word.replace("\"","")
word = word.replace("!","")
word = word.replace("â€œ","")
word = word.replace("â€˜","")
word = word.replace("*","")
if word not in stopwords:
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1# Print most common word
n_print = int(input("How many most common words to print: "))
print("\nOK. The {} most common words are as follows\n".format(n_print))
word_counter = collections.Counter(wordcount)
for word, count in word_counter.most_common(n_print):
print(word, ": ", count)# Close the file
file.close()# Create a data frame of the most common words
# Draw a bar chart
lst = word_counter.most_common(n_print)
df = pd.DataFrame(lst, columns = ['Word', 'Count'])
df.plot.bar(x='Word',y='Count')

An example of the code output and plot of the 10 most frequently used words in the corpus. The text is ‘Pride and Prejudice’ and you can see the familiar names of Elizabeth and Mr. Darcy! :)

Loved the article? Become a Medium member to continue learning without limits. I’ll receive a portion of your membership fee if you use the following link, with no extra cost to you.

Given the data set, we can find k number of most frequent words.

The solution of this problem already present as Find the k most frequent words from a file. But we can solve this problem very efficiently in Python with the help of some high performance modules.

In order to do this, we’ll use a high performance data type module, which is collections. This module got some specialized container datatypes and we will use counter class from this module.

Examples :

Input : "John is the son of John second. Second son of John second is William second." Output : [('second', 4), ('John', 3), ('son', 2), ('is', 2)] Explanation : 1. The string will converted into list like this : ['John', 'is', 'the', 'son', 'of', 'John', 'second', 'Second', 'son', 'of', 'John', 'second', 'is', 'William', 'second'] 2. Now 'most_common(4)' will return four most frequent words and its count in tuple. Input : "geeks for geeks is for geeks. By geeks and for the geeks." Output : [('geeks', 5), ('for', 3)] Explanation : most_common(2) will return two most frequent words and their count.

Recommended: Please try your approach on {IDE} first, before moving on to the solution.

Approach :

Import Counter class from collections module.
Split the string into list using split(), it will return the lists of words.
Now pass the list to the instance of Counter class
The function 'most-common()' inside Counter will return the list of most frequent words from list and its count.

Below is Python implementation of above approach :

from collections import Counter

data_set = "Welcome to the world of Geeks " \

"This portal has been created to provide well written well" \

"thought and well explained solutions for selected questions " \

"If you like Geeks for Geeks and would like to contribute " \

"here is your chance You can write article and mail your article " \

" to contribute at geeksforgeeks org See your article appearing on " \

"the Geeks for Geeks main page and help thousands of other Geeks. " \

split_it = data_set.split()

Counter = Counter(split_it)

most_occur = Counter.most_common(4)

print(most_occur)

Output :

[('Geeks', 5), ('to', 4), ('and', 4), ('article', 3)]

How do I find the most common word in a text Python?

Approach :.

Import Counter class from collections module..

Split the string into list using split(), it will return the lists of words..

Now pass the list to the instance of Counter class..

The function 'most-common()' inside Counter will return the list of most frequent words from list and its count..

How do I find the most repeated words in a string in python?

Use the Counter() function (which gives the frequency of words as a key-value pairs), to calculate the frequency (number of times the word has occurred) of all the words. Create a variable to store the maximum frequency. Loop in the above words frequency dictionary using the for loop.

How do you find common words in Python?

Practical Data Science using Python.

convert s0 and s1 into lowercase..

s0List := a list of words in s0..

s1List := a list of words in s1..

convert set from words in s0List and s1List, then intersect them to get common words, and return the count of the intersection result..

How do I find the most frequently used words?

WordCounter is a web tool where you can cut and paste a body of text to the text boxes and counts the most frequently used words in the text. This is a quick tool for making sure the text does not contains any overuse words. Wordcounter ranks the most frequently used words in any given body of text.

Most frequently used words in a text python

Simple Python script without the use of heavy text processing libraries to extract most common words from a corpus.

Recommended: Please try your approach on {IDE} first, before moving on to the solution.

How do I find the most common word in a text Python?

How do I find the most repeated words in a string in python?

How do you find common words in Python?

How do I find the most frequently used words?

Bài Viết Liên Quan

Những quán cafe đẹp ở sài gòn 2023

How do you print special symbols in python?

Hướng dẫn matlab close excel file

Hướng dẫn php ascii table

Hướng dẫn set memory limit php

Can i replace multiple characters in a string python?

What is color code in css?

How do you ensure form validation in javascript?

Hướng dẫn class to string python

Đơn vị % trong css

Toplist

Top 30 bài tập bổ trợ tiếng anh 6 i learn smart world 2022

Top 10 giáo án tự nhiên xã hội lớp 3 cả năm môi nhất violet 2022

Top 9 download mẫu phong bì mừng đám cưới 2022

Top 9 gia đình và con cái ông nguyễn phú trọng 2022

Top 29 lời dân chương trình bài hát gửi về quan họ 2022

Top 10 giáo án i learn smart world violet 2022

Top 9 đề thi vào lớp 6 trường lê lợi hà đông môn toán 2022

Top 10 thủ tục giám đốc thẩm và tái thẩm trong tố tụng hành chính 2022

Top 9 lễ cô sáu ở công viên tuổi trẻ 2022

Bài mới nhất

Cách bố trí hàng hóa trên các kệ hàng năm 2024

Các dạng toán chương 1 kinh tế vĩ mô năm 2024

Câu bạn làm bài tập chưa có chức năng gì năm 2024

Caâu hỏi trắc nghiệm về văn hóa tổ chức năm 2024

Bài tập tự luận unit 12 tiếng anh 8 năm 2024

724 22 lê văn lương.phước kiểng nhà bè năm 2024

Làm thế nào để hết quầng thâm mắt bẩm sinh năm 2024

Backtory nghĩa là gì câu chuyện hậu trường năm 2024

Kho chuyển hàng đi quốc tế tiếng anh là gì năm 2024

Số điện thoại của lazada là gì năm 2024

Chủ đề