Hướng dẫn how do you find duplicate words in a list python? - làm cách nào để bạn tìm thấy các từ trùng lặp trong danh sách python?

Tôi có thể thấy nơi bạn sẽ sắp xếp, vì bạn có thể biết khi nào bạn đã nhấn một từ mới và theo dõi số lượng cho mỗi từ duy nhất. Tuy nhiên, những gì bạn thực sự muốn làm là sử dụng hàm băm (từ điển) để theo dõi số lượng vì các khóa từ điển là duy nhất. Ví dụ:

Show
    words = sentence.split()
    counts = {}
    for word in words:
        if word not in counts:
            counts[word] = 0
        counts[word] += 1
    

    Bây giờ sẽ cung cấp cho bạn một từ điển trong đó khóa là từ và giá trị là số lần nó xuất hiện. Có những điều bạn có thể làm như sử dụng collections.defaultdict(int) để bạn chỉ có thể thêm giá trị:

    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    

    Nhưng thậm chí còn có một cái gì đó tốt hơn thế ... collections.Counter sẽ lấy danh sách các từ của bạn và biến nó thành một từ điển (một phần mở rộng của từ điển thực sự) có chứa số lượng.

    counts = collections.Counter(words)
    

    Từ đó bạn muốn danh sách các từ theo thứ tự được sắp xếp với số lượng của chúng để bạn có thể in chúng.

    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    0 sẽ cung cấp cho bạn một danh sách các bộ dữ liệu và
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    1 sẽ sắp xếp (theo mặc định) theo mục đầu tiên của mỗi tuple (từ trong trường hợp này) ... đó chính xác là những gì bạn muốn.

    import collections
    sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
    words = sentence.split()
    word_counts = collections.Counter(words)
    for word, count in sorted(word_counts.items()):
        print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else ""))
    

    Đầu ra

    "As" is repeated 1 time.
    "are" is repeated 2 times.
    "as" is repeated 3 times.
    "certain" is repeated 2 times.
    "do" is repeated 1 time.
    "far" is repeated 2 times.
    "laws" is repeated 1 time.
    "mathematics" is repeated 1 time.
    "not" is repeated 2 times.
    "of" is repeated 1 time.
    "reality" is repeated 2 times.
    "refer" is repeated 2 times.
    "the" is repeated 1 time.
    "they" is repeated 3 times.
    "to" is repeated 2 times.
    

    Cải thiện bài viết

    Lưu bài viết

    Đôi khi, trong khi làm việc với danh sách Python, chúng ta có thể gặp vấn đề trong đó chúng ta cần thực hiện xóa các từ trùng lặp khỏi danh sách chuỗi. Điều này có thể có ứng dụng khi chúng ta ở trong miền dữ liệu. Hãy để thảo luận về những cách nhất định trong đó nhiệm vụ này có thể được thực hiện. & NBSP;

    Phương thức số 1: Sử dụng Set () + Split () + Vòng lặp Kết hợp các phương thức trên có thể được sử dụng để thực hiện tác vụ này. Trong đó, trước tiên chúng tôi chia từng danh sách thành các từ kết hợp và sau đó sử dụng Set () để thực hiện nhiệm vụ loại bỏ trùng lặp. & NBSP; The combination of above methods can be used to perform this task. In this, we first split each list into combined words and then employ set() to perform the task of duplicate removal. 

    Python3

    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    2
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    3
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    4
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    5
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    6
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    7
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    6
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    9
    counts = collections.Counter(words)
    
    0

    counts = collections.Counter(words)
    
    1
    counts = collections.Counter(words)
    
    2
    counts = collections.Counter(words)
    
    3
    counts = collections.Counter(words)
    
    4
    counts = collections.Counter(words)
    
    5
    counts = collections.Counter(words)
    
    6

    counts = collections.Counter(words)
    
    7
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    3
    counts = collections.Counter(words)
    
    9

    import collections
    sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
    words = sentence.split()
    word_counts = collections.Counter(words)
    for word, count in sorted(word_counts.items()):
        print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else ""))
    
    0
    import collections
    sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
    words = sentence.split()
    word_counts = collections.Counter(words)
    for word, count in sorted(word_counts.items()):
        print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else ""))
    
    1
    import collections
    sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
    words = sentence.split()
    word_counts = collections.Counter(words)
    for word, count in sorted(word_counts.items()):
        print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else ""))
    
    2
    import collections
    sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
    words = sentence.split()
    word_counts = collections.Counter(words)
    for word, count in sorted(word_counts.items()):
        print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else ""))
    
    3

    import collections
    sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
    words = sentence.split()
    word_counts = collections.Counter(words)
    for word, count in sorted(word_counts.items()):
        print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else ""))
    
    4
    import collections
    sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
    words = sentence.split()
    word_counts = collections.Counter(words)
    for word, count in sorted(word_counts.items()):
        print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else ""))
    
    5
    import collections
    sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
    words = sentence.split()
    word_counts = collections.Counter(words)
    for word, count in sorted(word_counts.items()):
        print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else ""))
    
    6
    import collections
    sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
    words = sentence.split()
    word_counts = collections.Counter(words)
    for word, count in sorted(word_counts.items()):
        print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else ""))
    
    7
    import collections
    sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
    words = sentence.split()
    word_counts = collections.Counter(words)
    for word, count in sorted(word_counts.items()):
        print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else ""))
    
    8
    import collections
    sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
    words = sentence.split()
    word_counts = collections.Counter(words)
    for word, count in sorted(word_counts.items()):
        print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else ""))
    
    9

    counts = collections.Counter(words)
    
    1
    counts = collections.Counter(words)
    
    2
    "As" is repeated 1 time.
    "are" is repeated 2 times.
    "as" is repeated 3 times.
    "certain" is repeated 2 times.
    "do" is repeated 1 time.
    "far" is repeated 2 times.
    "laws" is repeated 1 time.
    "mathematics" is repeated 1 time.
    "not" is repeated 2 times.
    "of" is repeated 1 time.
    "reality" is repeated 2 times.
    "refer" is repeated 2 times.
    "the" is repeated 1 time.
    "they" is repeated 3 times.
    "to" is repeated 2 times.
    
    2
    counts = collections.Counter(words)
    
    4
    counts = collections.Counter(words)
    
    5
    "As" is repeated 1 time.
    "are" is repeated 2 times.
    "as" is repeated 3 times.
    "certain" is repeated 2 times.
    "do" is repeated 1 time.
    "far" is repeated 2 times.
    "laws" is repeated 1 time.
    "mathematics" is repeated 1 time.
    "not" is repeated 2 times.
    "of" is repeated 1 time.
    "reality" is repeated 2 times.
    "refer" is repeated 2 times.
    "the" is repeated 1 time.
    "they" is repeated 3 times.
    "to" is repeated 2 times.
    
    5

    Đầu ra: & nbsp;

    The original list is : ['gfg, best, gfg', 'I, am, I', 'two, two, three']
    The list after duplicate words removal is : [{'best', 'gfg'}, {'I', 'am'}, {'three', 'two'}]

    Phương thức số 2: Sử dụng danh sách hiểu + set () + split () Đây là phương thức tương tự như ở trên. Sự khác biệt là chúng tôi sử dụng khả năng hiểu danh sách thay vì các vòng lặp để thực hiện phần lặp. & NBSP; This is similar method to above. The difference is that we employ list comprehension instead of loops to perform the iteration part. 

    Python3

    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    2
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    3
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    4
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    5
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    6
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    7
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    6
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    9
    counts = collections.Counter(words)
    
    0

    counts = collections.Counter(words)
    
    1
    counts = collections.Counter(words)
    
    2
    counts = collections.Counter(words)
    
    3
    counts = collections.Counter(words)
    
    4
    counts = collections.Counter(words)
    
    5
    counts = collections.Counter(words)
    
    6

    counts = collections.Counter(words)
    
    7
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    3
    counts = collections.Counter(words)
    
    9

    counts = collections.Counter(words)
    
    1
    counts = collections.Counter(words)
    
    2
    "As" is repeated 1 time.
    "are" is repeated 2 times.
    "as" is repeated 3 times.
    "certain" is repeated 2 times.
    "do" is repeated 1 time.
    "far" is repeated 2 times.
    "laws" is repeated 1 time.
    "mathematics" is repeated 1 time.
    "not" is repeated 2 times.
    "of" is repeated 1 time.
    "reality" is repeated 2 times.
    "refer" is repeated 2 times.
    "the" is repeated 1 time.
    "they" is repeated 3 times.
    "to" is repeated 2 times.
    
    2
    counts = collections.Counter(words)
    
    4
    counts = collections.Counter(words)
    
    5
    "As" is repeated 1 time.
    "are" is repeated 2 times.
    "as" is repeated 3 times.
    "certain" is repeated 2 times.
    "do" is repeated 1 time.
    "far" is repeated 2 times.
    "laws" is repeated 1 time.
    "mathematics" is repeated 1 time.
    "not" is repeated 2 times.
    "of" is repeated 1 time.
    "reality" is repeated 2 times.
    "refer" is repeated 2 times.
    "the" is repeated 1 time.
    "they" is repeated 3 times.
    "to" is repeated 2 times.
    
    5

    Đầu ra: & nbsp;

    The original list is : ['gfg, best, gfg', 'I, am, I', 'two, two, three']
    The list after duplicate words removal is : [{'best', 'gfg'}, {'I', 'am'}, {'three', 'two'}]

    Phương thức số 2: Sử dụng danh sách hiểu + set () + split () Đây là phương thức tương tự như ở trên. Sự khác biệt là chúng tôi sử dụng khả năng hiểu danh sách thay vì các vòng lặp để thực hiện phần lặp. & NBSP; Using sorted()+index()+split()

    Python3

    counts = collections.Counter(words)
    
    7
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    3
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    4
    import collections
    sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
    words = sentence.split()
    word_counts = collections.Counter(words)
    for word, count in sorted(word_counts.items()):
        print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else ""))
    
    6
    import collections
    sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
    words = sentence.split()
    word_counts = collections.Counter(words)
    for word, count in sorted(word_counts.items()):
        print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else ""))
    
    7
    import collections
    sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
    words = sentence.split()
    word_counts = collections.Counter(words)
    for word, count in sorted(word_counts.items()):
        print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else ""))
    
    8
    The original list is : ['gfg, best, gfg', 'I, am, I', 'two, two, three']
    The list after duplicate words removal is : [{'best', 'gfg'}, {'I', 'am'}, {'three', 'two'}]
    7__

    Phương thức: Sử dụng Sắp xếp ()+Index ()+Split ()

    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    2
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    3
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    4collections.defaultdict(int)1
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    6collections.defaultdict(int)3
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    6collections.defaultdict(int)5 collections.defaultdict(int)6
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    3
    counts = collections.Counter(words)
    
    9

    collections.Counter3

    counts = collections.Counter(words)
    
    1
    counts = collections.Counter(words)
    
    2
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    00
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    01
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    1
    counts = collections.Counter(words)
    
    2
    import collections
    sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
    words = sentence.split()
    word_counts = collections.Counter(words)
    for word, count in sorted(word_counts.items()):
        print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else ""))
    
    6
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    05
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    3
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    07
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    3
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    00
    counts = collections.defaultdict(int)
    for word in words:
        counts[word] += 1
    
    10

    Đầu ra

    gfg best I am two three