View Discussion
Improve Article
Save Article
View Discussion
Improve Article
Save Article
Data preprocessing is an important task in text classification. With the emergence of Python in the field of data science, it is essential to have certain shorthands to have the upper hand among others. This article discusses ways to count words in a sentence, it starts with space-separated words but also includes ways to in presence of special characters as well. Let’s discuss certain ways to perform this.
Quick Ninja Methods: One line Code to find count words in a sentence with Static and Dynamic Inputs.
Python3
countOfWords = len("Geeksforgeeks is best Computer Science Portal".split())
print("Count of Words in the given Sentence:", countOfWords)
print(len("Geeksforgeeks is best Computer Science Portal".split()))
print(len(input("Enter Input:").split()))
Output:
Method #1: Using split() split function is quite useful and usually quite generic method to get words out of the list, but this approach fails once we introduce special characters in the list.
Python3
test_string = "Geeksforgeeks is best Computer Science Portal"
print ("The original string is : " + test_string)
res = len(test_string.split())
print ("The number of words in string are : " + str(res))
Output:The original string is : Geeksforgeeks is best Computer Science Portal The number of words in string are : 6
Method #2 : Using regex(findall()) Regular expressions have to be used in case we require to handle the cases of punctuation marks or special characters in the string. This is the most elegant way in which this task can be performed.
Example
Python3
import re
test_string = "Geeksforgeeks, is best @
print ("The original string is : " + test_string)
res = len(re.findall(r'\w+', test_string))
print ("The number of words in string are : " + str(res))
Output:The original string is : Geeksforgeeks, is best @# Computer Science Portal.!!! The number of words in string are : 6
Method #3 : Using sum() + strip() + split() This method performs this particular task without using regex. In this method we first check all the words consisting of all the alphabets, if so they are added to sum and then
returned.
Python3
import string
test_string = "Geeksforgeeks, is best @
print ("The original string is : " + test_string)
res = sum([i.strip(string.punctuation).isalpha() for i in test_string.split()])
print ("The number of words in string are : " + str(res))
Output:The original string is : Geeksforgeeks, is best @# Computer Science Portal.!!! The number of words in string are : 6
Method #4: Using count() method
Python3
test_string = "Geeksforgeeks is best Computer Science Portal"
print ("The original string is : " + test_string)
res = test_string.count(" ")+1
print ("The number of words in string are : " + str(res))
Output
The original string is : Geeksforgeeks is best Computer Science Portal The number of words in string are : 6