HarmonyHu 多思不如养志,多言不如守静,多才不如蓄德

统计距离

2021-02-21
AI

概述

以二维空间举例如图:

求A点与B点的欧氏距离和余弦相似度

欧式距离

在二维空间欧式距离就是两点的直线距离:

\[E(A,B) = c = \sqrt{(b_1-a_1)^2+(b_2-a_2)^2}\]

多维空间同理:

\[E(p,q)= \sqrt{(p_1 - q_1)^2 + (p_2 - q_2)^2 + ... + (p_n - q_n)^2} = \sqrt{\sum_{i=1}^{n}(p_i - q_i)^2}\]

欧式相似度:

\[euclidean\_similarity = \frac{1}{1 + E(p,q)}\]

意义:用于对数值差异敏感的场景。

余弦相似度

余弦定理:

\(c^2 = a^2 + b^2 - 2ab \:cos(\theta) \\\) 进一步推导:

\[cos(\theta) = \frac{a^2+b^2 - c^2}{2ab} = \frac{a_1b_1 + a_2b_2}{\sqrt{a_1^2 + a_2^2} \times \sqrt{b_1^2+b_2^2}}\]

多维空间同理:

\[cosine\_similary(A,B) = cos(\theta) = \frac{A\times B}{||A|| \times ||B||} = \frac{\sum_i^n{(A_i \times B_i)}}{\sqrt{\sum_i^n(A_i^2)} \times \sqrt{\sum_i^n{(B_i^2)}}}\]

意义:值越趋于1越相似。一般用于数值不敏感,方向差异敏感的场景。

python实现

#!/usr/bin/env python
 
from math import*
 
def euclidean_distance(x,y):
  return sqrt(sum(pow(a-b,2) for a, b in zip(x, y)))
 
print euclidean_distance([0,3,4,5],[7,6,3,-1]) #9.74679434481

def square_rooted(x):
   return round(sqrt(sum([a*a for a in x])),3)
 
def cosine_similarity(x,y):
  numerator = sum(a*b for a,b in zip(x,y))
  denominator = square_rooted(x)*square_rooted(y)
  return round(numerator/float(denominator),3)
 
print cosine_similarity([3, 45, 7, 2], [2, 54, 13, 15]) # 0.972

参考:

IMPLEMENTING THE FIVE MOST POPULAR SIMILARITY MEASURES IN PYTHON


Similar Posts

上一篇 Deformable Conv

Content