Hướng dẫn cosine similarity between two words python - cosine tương tự giữa hai từ python

Question

Bạn có thể xác định hai chức năng này

Nội dung chính Show

Làm thế nào để bạn tìm thấy sự tương đồng cosine giữa các từ?
Làm thế nào để bạn tìm thấy sự tương đồng giữa hai từ trong Python?
Làm thế nào để tôi tìm thấy văn bản tương tự trong Python?
Làm thế nào để bạn kiểm tra xem hai câu có giống nhau trong Python không?

def word2vec(word):
    from collections import Counter
    from math import sqrt

    # count the characters in word
    cw = Counter(word)
    # precomputes a set of the different characters
    sw = set(cw)
    # precomputes the "length" of the word vector
    lw = sqrt(sum(c*c for c in cw.values()))

    # return a tuple
    return cw, sw, lw

def cosdis(v1, v2):
    # which characters are common to the two words?
    common = v1[1].intersection(v2[1])
    # by definition of cosine distance we have
    return sum(v1[0][ch]*v2[0][ch] for ch in common)/v1[2]/v2[2]

và sử dụng chúng như trong ví dụ này

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

BTW, word2vec mà bạn đề cập trong một thẻ là một doanh nghiệp khá khác, đòi hỏi một người trong chúng ta mất rất nhiều thời gian và cam kết để nghiên cứu nó và đoán xem, tôi không phải là ...

Xem thảo luận

Cải thiện bài viết

Lưu bài viết

Đọc

Bàn luận

Xem thảo luận

Cải thiện bài viết

Lưu bài viết

Đọc is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them.
Similarity = (A.B) / (||A||.||B||) where A and B are vectors.

Bàn luận

1. Open terminal(Linux).
2. sudo pip3 install nltk
3. python3
4. import nltk
5. nltk.download(‘all’)

Độ tương tự cosine là thước đo sự tương đồng giữa hai vectơ khác không của không gian sản phẩm bên trong đo cosin của góc giữa chúng. B là vectơ.

Tương tự cosine và mô -đun công cụ NLTK được sử dụng trong chương trình này. Để thực hiện chương trình này, NLTK phải được cài đặt trong hệ thống của bạn. Để cài đặt mô -đun NLTK, hãy làm theo các bước bên dưới - It is used for tokenization. Tokenization is the process by which big quantity of text is divided into smaller parts called tokens. word_tokenize(X) split the given sentence X into words and return list.
Các chức năng được sử dụng: In this program, it is used to get a list of stopwords. A stop word is a commonly used word (such as “the”, “a”, “an”, “in”).

nltk.tokenize: Nó được sử dụng để mã hóa. Mã thông báo là quá trình mà số lượng lớn văn bản được chia thành các phần nhỏ hơn được gọi là mã thông báo. word_tokenize(X) Chia câu X đã cho thành các từ và danh sách trả về.

nltk.corpus: Trong chương trình này, nó được sử dụng để có được một danh sách các từ dừng. Một từ dừng là một từ thường được sử dụng (chẳng hạn như là The The The, một, A A, một, một trong những người khác).

Dưới đây là triển khai Python -

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

4

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

5

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

6

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

7

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

5

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

9

from nltk.corpus import stopwords

from

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

1import

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

3

1. Open terminal(Linux).
2. sudo pip3 install nltk
3. python3
4. import nltk
5. nltk.download(‘all’)

0

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

5

1. Open terminal(Linux).
2. sudo pip3 install nltk
3. python3
4. import nltk
5. nltk.download(‘all’)

2

similarity:  0.2886751345948129

1

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

5

similarity:  0.2886751345948129

3

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

5

similarity:  0.2886751345948129

5

1. Open terminal(Linux).
2. sudo pip3 install nltk
3. python3
4. import nltk
5. nltk.download(‘all’)

3

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

5

1. Open terminal(Linux).
2. sudo pip3 install nltk
3. python3
4. import nltk
5. nltk.download(‘all’)

5

1. Open terminal(Linux).
2. sudo pip3 install nltk
3. python3
4. import nltk
5. nltk.download(‘all’)

6

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

5

1. Open terminal(Linux).
2. sudo pip3 install nltk
3. python3
4. import nltk
5. nltk.download(‘all’)

8

1. Open terminal(Linux).
2. sudo pip3 install nltk
3. python3
4. import nltk
5. nltk.download(‘all’)

9

similarity:  0.2886751345948129

0

similarity:  0.2886751345948129

6

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

5

similarity:  0.2886751345948129

8

similarity:  0.2886751345948129

9 word2vec0____41

word2vec8

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

5

similarity:  0.2886751345948129

8

similarity:  0.2886751345948129

9 word2vec0word2vec1

1. Open terminal(Linux).
2. sudo pip3 install nltk
3. python3
4. import nltk
5. nltk.download(‘all’)

3__

from0

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

5 from2

from7nltk.corpus 5nltk.corpus 6nltk.corpus 7nltk.corpus 8

similarity:  0.2886751345948129

9 word2vec0word2vec1 from6

from7nltk.corpus 5import8nltk.corpus 7nltk.corpus 8

from7word2vec3 word2vec0word2vec1 nltk.corpus 1nltk.corpus 2

from7word2vec3 word2vec0word2vec1 import3nltk.corpus 2nltk.corpus 8

stopwords1

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

5 nltk.corpus 7

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

08

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

5 stopwords1

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

11

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

12___

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

23stopwords8

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

25

>>> a = 'safasfeqefscwaeeafweeaeawaw'
>>> b = 'tsafdstrdfadsdfdswdfafdwaed'
>>> c = 'optykop;lvhopijresokpghwji7'
>>> 
>>> va = word2vec(a)
>>> vb = word2vec(b)
>>> vc = word2vec(c)
>>> 
>>> print cosdis(va,vb)
0.551843662321
>>> print cosdis(vb,vc)
0.113746579656
>>> print cosdis(vc,va)
0.153494378078

26

Output:

similarity:  0.2886751345948129

Làm thế nào để bạn tìm thấy sự tương đồng cosine giữa các từ?

Sự tương tự cosine là thước đo sự tương đồng giữa hai vectơ khác không của không gian sản phẩm bên trong đo cosin của góc giữa chúng. Sự tương đồng = (A.B) / (|| a ||. || b ||) trong đó a và b là vectơ.Similarity = (A.B) / (||A||. ||B||) where A and B are vectors.

Làm thế nào để bạn tìm thấy sự tương đồng giữa hai từ trong Python?

Sự tương đồng từ là một số từ 0 đến 1 cho chúng ta biết hai từ gần như thế nào, về mặt ngữ nghĩa.Điều này được thực hiện bằng cách tìm sự tương đồng giữa các vectơ từ trong không gian vectơ.Spacy, một trong những thư viện NLP nhanh nhất được sử dụng rộng rãi ngày hôm nay, cung cấp một phương pháp đơn giản cho nhiệm vụ này.finding similarity between word vectors in the vector space. spaCy, one of the fastest NLP libraries widely used today, provides a simple method for this task.

Làm thế nào để tôi tìm thấy văn bản tương tự trong Python?

Sự giống nhau của các chuỗi đang được kiểm tra theo tiêu chí chênh lệch tần số của từng ký tự phải lớn hơn ngưỡng ở đây được biểu thị bằng K. Giải thích: 'A' xảy ra 4 lần trong Str1 và 2 lần trong Str2, 4 - 2 = 2,Trong phạm vi, tương tự, tất cả các ký tự trong phạm vi, do đó đúng.on the criteria of frequency difference of each character which should be greater than a threshold here represented by K. Explanation : 'a' occurs 4 times in str1, and 2 times in str2, 4 – 2 = 2, in range, similarly, all chars in range, hence true.