Hướng dẫn cosine similarity python

View Discussion

Nội dung chính Show

Improve Article

Save Article

Read

Discuss

View Discussion

Improve Article

Save Article

In this article, we calculate the Cosine Similarity between the two non-zero vectors. A vector is a single dimesingle-dimensional signal NumPy array. Cosine similarity is a measure of similarity, often used to measure document similarity in text analysis. We use the below formula to compute the cosine similarity.

Similarity = (A.B) / (||A||.||B||)

where A and B are vectors:

A.B is dot product of A and B: It is computed as sum of element-wise product of A and B.
||A|| is L2 norm of A: It is computed as square root of the sum of squares of elements of the vector A.

Example 1:

In the example below we compute the cosine similarity between the two vectors (1-d NumPy arrays). To define a vector here we can also use the Python Lists.

Python

import numpy as np

from numpy.linalg import norm

A = np.array([2,1,2,3,2,9])

B = np.array([3,4,2,4,5,5])

print("A:", A)

print("B:", B)

cosine = np.dot(A,B)/(norm(A)*norm(B))

print("Cosine Similarity:", cosine)

Output:

Example 2:

In the below example we compute the cosine similarity between a batch of three vectors (2D NumPy array) and a vector(1-D NumPy array).

Python

import numpy as np

from numpy.linalg import norm

A = np.array([[2,1,2],[3,2,9], [-1,2,-3]])

B = np.array([3,4,2])

print("A:\n", A)

print("B:\n", B)

cosine = np.dot(A,B)/(norm(A, axis=1)*norm(B))

print("Cosine Similarity:\n", cosine)

Output:

Notice that A has three vectors and B is a single vector. In the above output, we get three elements in the cosine similarity array. The first element corresponds to the cosine similarity between the first vector (first row) of A and the second vector (B). The second element corresponds to the cosine similarity between the second vector (second row ) of A and the second vector (B). And similarly for the third element.

Example 3:

In the below example we compute the cosine similarity between the two 2-d arrays. Here each array has three vectors. Here to compute the dot product using the m of element-wise product.

Python

import numpy as np

from numpy.linalg import norm

A = np.array([[1,2,2],

[3,2,2],

[-2,1,-3]])

B = np.array([[4,2,4],

[2,-2,5],

[3,4,-4]])

print("A:\n", A)

print("B:\n", B)

cosine = np.sum(A*B, axis=1)/(norm(A, axis=1)*norm(B, axis=1))

print("Cosine Similarity:\n", cosine)

Output:

The first element of the cosine similarity array is a similarity between the first rows of A and B. Similarly second element is the cosine similarity between the second rows of A and B. Similarly for the third element.