Hướng dẫn how do you find duplicate words in a list python? - làm cách nào để bạn tìm thấy các từ trùng lặp trong danh sách python?

Tôi có thể thấy nơi bạn sẽ sắp xếp, vì bạn có thể biết khi nào bạn đã nhấn một từ mới và theo dõi số lượng cho mỗi từ duy nhất. Tuy nhiên, những gì bạn thực sự muốn làm là sử dụng hàm băm (từ điển) để theo dõi số lượng vì các khóa từ điển là duy nhất. Ví dụ:

words = sentence.split() counts = {} for word in words: if word not in counts: counts[word] = 0 counts[word] += 1

Bây giờ sẽ cung cấp cho bạn một từ điển trong đó khóa là từ và giá trị là số lần nó xuất hiện. Có những điều bạn có thể làm như sử dụng collections.defaultdict(int) để bạn chỉ có thể thêm giá trị:

counts = collections.defaultdict(int) for word in words: counts[word] += 1

Nhưng thậm chí còn có một cái gì đó tốt hơn thế ... collections.Counter sẽ lấy danh sách các từ của bạn và biến nó thành một từ điển (một phần mở rộng của từ điển thực sự) có chứa số lượng.

counts = collections.Counter(words)

Từ đó bạn muốn danh sách các từ theo thứ tự được sắp xếp với số lượng của chúng để bạn có thể in chúng. counts = collections.defaultdict(int) for word in words: counts[word] += 1 0 sẽ cung cấp cho bạn một danh sách các bộ dữ liệu và counts = collections.defaultdict(int) for word in words: counts[word] += 1 1 sẽ sắp xếp (theo mặc định) theo mục đầu tiên của mỗi tuple (từ trong trường hợp này) ... đó chính xác là những gì bạn muốn.

import collections sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality""" words = sentence.split() word_counts = collections.Counter(words) for word, count in sorted(word_counts.items()): print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else ""))

Đầu ra

"As" is repeated 1 time. "are" is repeated 2 times. "as" is repeated 3 times. "certain" is repeated 2 times. "do" is repeated 1 time. "far" is repeated 2 times. "laws" is repeated 1 time. "mathematics" is repeated 1 time. "not" is repeated 2 times. "of" is repeated 1 time. "reality" is repeated 2 times. "refer" is repeated 2 times. "the" is repeated 1 time. "they" is repeated 3 times. "to" is repeated 2 times.

Cải thiện bài viết

Lưu bài viết

Đôi khi, trong khi làm việc với danh sách Python, chúng ta có thể gặp vấn đề trong đó chúng ta cần thực hiện xóa các từ trùng lặp khỏi danh sách chuỗi. Điều này có thể có ứng dụng khi chúng ta ở trong miền dữ liệu. Hãy để thảo luận về những cách nhất định trong đó nhiệm vụ này có thể được thực hiện. & NBSP;

Phương thức số 1: Sử dụng Set () + Split () + Vòng lặp Kết hợp các phương thức trên có thể được sử dụng để thực hiện tác vụ này. Trong đó, trước tiên chúng tôi chia từng danh sách thành các từ kết hợp và sau đó sử dụng Set () để thực hiện nhiệm vụ loại bỏ trùng lặp. & NBSP; The combination of above methods can be used to perform this task. In this, we first split each list into combined words and then employ set() to perform the task of duplicate removal. 

Python3

counts = collections.defaultdict(int) for word in words: counts[word] += 1 2counts = collections.defaultdict(int) for word in words: counts[word] += 1 3 counts = collections.defaultdict(int) for word in words: counts[word] += 1 4counts = collections.defaultdict(int) for word in words: counts[word] += 1 5counts = collections.defaultdict(int) for word in words: counts[word] += 1 6counts = collections.defaultdict(int) for word in words: counts[word] += 1 7counts = collections.defaultdict(int) for word in words: counts[word] += 1 6counts = collections.defaultdict(int) for word in words: counts[word] += 1 9counts = collections.Counter(words) 0

counts = collections.Counter(words) 1counts = collections.Counter(words) 2counts = collections.Counter(words) 3 counts = collections.Counter(words) 4 counts = collections.Counter(words) 5counts = collections.Counter(words) 6

counts = collections.Counter(words) 7counts = collections.defaultdict(int) for word in words: counts[word] += 1 3 counts = collections.Counter(words) 9

import collections sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality""" words = sentence.split() word_counts = collections.Counter(words) for word, count in sorted(word_counts.items()): print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else "")) 0 import collections sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality""" words = sentence.split() word_counts = collections.Counter(words) for word, count in sorted(word_counts.items()): print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else "")) 1import collections sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality""" words = sentence.split() word_counts = collections.Counter(words) for word, count in sorted(word_counts.items()): print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else "")) 2 import collections sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality""" words = sentence.split() word_counts = collections.Counter(words) for word, count in sorted(word_counts.items()): print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else "")) 3

import collections sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality""" words = sentence.split() word_counts = collections.Counter(words) for word, count in sorted(word_counts.items()): print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else "")) 4import collections sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality""" words = sentence.split() word_counts = collections.Counter(words) for word, count in sorted(word_counts.items()): print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else "")) 5import collections sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality""" words = sentence.split() word_counts = collections.Counter(words) for word, count in sorted(word_counts.items()): print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else "")) 6import collections sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality""" words = sentence.split() word_counts = collections.Counter(words) for word, count in sorted(word_counts.items()): print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else "")) 7import collections sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality""" words = sentence.split() word_counts = collections.Counter(words) for word, count in sorted(word_counts.items()): print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else "")) 8import collections sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality""" words = sentence.split() word_counts = collections.Counter(words) for word, count in sorted(word_counts.items()): print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else "")) 9

counts = collections.Counter(words) 1counts = collections.Counter(words) 2"As" is repeated 1 time. "are" is repeated 2 times. "as" is repeated 3 times. "certain" is repeated 2 times. "do" is repeated 1 time. "far" is repeated 2 times. "laws" is repeated 1 time. "mathematics" is repeated 1 time. "not" is repeated 2 times. "of" is repeated 1 time. "reality" is repeated 2 times. "refer" is repeated 2 times. "the" is repeated 1 time. "they" is repeated 3 times. "to" is repeated 2 times. 2 counts = collections.Counter(words) 4 counts = collections.Counter(words) 5"As" is repeated 1 time. "are" is repeated 2 times. "as" is repeated 3 times. "certain" is repeated 2 times. "do" is repeated 1 time. "far" is repeated 2 times. "laws" is repeated 1 time. "mathematics" is repeated 1 time. "not" is repeated 2 times. "of" is repeated 1 time. "reality" is repeated 2 times. "refer" is repeated 2 times. "the" is repeated 1 time. "they" is repeated 3 times. "to" is repeated 2 times. 5

Đầu ra: & nbsp;

The original list is : ['gfg, best, gfg', 'I, am, I', 'two, two, three'] The list after duplicate words removal is : [{'best', 'gfg'}, {'I', 'am'}, {'three', 'two'}]

Phương thức số 2: Sử dụng danh sách hiểu + set () + split () Đây là phương thức tương tự như ở trên. Sự khác biệt là chúng tôi sử dụng khả năng hiểu danh sách thay vì các vòng lặp để thực hiện phần lặp. & NBSP; This is similar method to above. The difference is that we employ list comprehension instead of loops to perform the iteration part. 

Python3

counts = collections.defaultdict(int) for word in words: counts[word] += 1 2counts = collections.defaultdict(int) for word in words: counts[word] += 1 3 counts = collections.defaultdict(int) for word in words: counts[word] += 1 4counts = collections.defaultdict(int) for word in words: counts[word] += 1 5counts = collections.defaultdict(int) for word in words: counts[word] += 1 6counts = collections.defaultdict(int) for word in words: counts[word] += 1 7counts = collections.defaultdict(int) for word in words: counts[word] += 1 6counts = collections.defaultdict(int) for word in words: counts[word] += 1 9counts = collections.Counter(words) 0

counts = collections.Counter(words) 1counts = collections.Counter(words) 2counts = collections.Counter(words) 3 counts = collections.Counter(words) 4 counts = collections.Counter(words) 5counts = collections.Counter(words) 6

counts = collections.Counter(words) 7counts = collections.defaultdict(int) for word in words: counts[word] += 1 3 counts = collections.Counter(words) 9

counts = collections.Counter(words) 1counts = collections.Counter(words) 2"As" is repeated 1 time. "are" is repeated 2 times. "as" is repeated 3 times. "certain" is repeated 2 times. "do" is repeated 1 time. "far" is repeated 2 times. "laws" is repeated 1 time. "mathematics" is repeated 1 time. "not" is repeated 2 times. "of" is repeated 1 time. "reality" is repeated 2 times. "refer" is repeated 2 times. "the" is repeated 1 time. "they" is repeated 3 times. "to" is repeated 2 times. 2 counts = collections.Counter(words) 4 counts = collections.Counter(words) 5"As" is repeated 1 time. "are" is repeated 2 times. "as" is repeated 3 times. "certain" is repeated 2 times. "do" is repeated 1 time. "far" is repeated 2 times. "laws" is repeated 1 time. "mathematics" is repeated 1 time. "not" is repeated 2 times. "of" is repeated 1 time. "reality" is repeated 2 times. "refer" is repeated 2 times. "the" is repeated 1 time. "they" is repeated 3 times. "to" is repeated 2 times. 5

Đầu ra: & nbsp;

The original list is : ['gfg, best, gfg', 'I, am, I', 'two, two, three'] The list after duplicate words removal is : [{'best', 'gfg'}, {'I', 'am'}, {'three', 'two'}]

Phương thức số 2: Sử dụng danh sách hiểu + set () + split () Đây là phương thức tương tự như ở trên. Sự khác biệt là chúng tôi sử dụng khả năng hiểu danh sách thay vì các vòng lặp để thực hiện phần lặp. & NBSP; Using sorted()+index()+split()

Python3

counts = collections.Counter(words) 7counts = collections.defaultdict(int) for word in words: counts[word] += 1 3 counts = collections.defaultdict(int) for word in words: counts[word] += 1 4import collections sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality""" words = sentence.split() word_counts = collections.Counter(words) for word, count in sorted(word_counts.items()): print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else "")) 6import collections sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality""" words = sentence.split() word_counts = collections.Counter(words) for word, count in sorted(word_counts.items()): print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else "")) 7import collections sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality""" words = sentence.split() word_counts = collections.Counter(words) for word, count in sorted(word_counts.items()): print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else "")) 8The original list is : ['gfg, best, gfg', 'I, am, I', 'two, two, three'] The list after duplicate words removal is : [{'best', 'gfg'}, {'I', 'am'}, {'three', 'two'}]7__

Phương thức: Sử dụng Sắp xếp ()+Index ()+Split ()

counts = collections.defaultdict(int) for word in words: counts[word] += 1 2counts = collections.defaultdict(int) for word in words: counts[word] += 1 3 counts = collections.defaultdict(int) for word in words: counts[word] += 1 4collections.defaultdict(int)1counts = collections.defaultdict(int) for word in words: counts[word] += 1 6collections.defaultdict(int)3counts = collections.defaultdict(int) for word in words: counts[word] += 1 6collections.defaultdict(int)5 collections.defaultdict(int)6counts = collections.defaultdict(int) for word in words: counts[word] += 1 3counts = collections.Counter(words) 9

collections.Counter3counts = collections.Counter(words) 1counts = collections.Counter(words) 2counts = collections.defaultdict(int) for word in words: counts[word] += 1 00counts = collections.defaultdict(int) for word in words: counts[word] += 1 01counts = collections.defaultdict(int) for word in words: counts[word] += 1 1counts = collections.Counter(words) 2import collections sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality""" words = sentence.split() word_counts = collections.Counter(words) for word, count in sorted(word_counts.items()): print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else "")) 6counts = collections.defaultdict(int) for word in words: counts[word] += 1 05counts = collections.defaultdict(int) for word in words: counts[word] += 1 3counts = collections.defaultdict(int) for word in words: counts[word] += 1 07counts = collections.defaultdict(int) for word in words: counts[word] += 1 3counts = collections.defaultdict(int) for word in words: counts[word] += 1 00counts = collections.defaultdict(int) for word in words: counts[word] += 1 10

Đầu ra

gfg best I am two three


Chủ đề