Hello Friends
Today as I was going through what we have done so
far and found one thing which we use when we calculate Hierarchical Cluster
Analysis Method. It is Jaccard distance method
which is used when value of the variable is in binary form. So I thought I should learn more about
Jaccard distance.
Jaccard Distance
Jaccard distance is basically dissimilarity between
sample sets. This distance is obtained
by subtracting Jaccard coefficient from 1.
It is used to for comparing similarity and diversity of the sample sets. Jaccard Coefficient measures the similarity
between sample sets.
Example
There are two objects A and B, both
has n binary attributes. Each attribute
of A and B can either be 0 or 1. The total number of each combination of
attributes for both A and B can be as follows:
·
M11
represents the total number of attributes where A and B both have a value of 1.
·
M01
represents the total number of attributes where the attribute of A is 0 and the
attribute of B is 1.
·
M10
represents the total number of attributes where the attribute of A is 1 and the attribute of B is 0.
·
M00
represents the total number of attributes where A and B both
have a value of 0.
Each attribute must fall into one of
these four categories, meaning that
M11 + M01 + M10 + M00 = n.
The Jaccard similarity coefficient, J, is given as
J = (M11) / (M11 + M01 + M10)
The Jaccard distance, J', is given as
J’
= (M01 + M10) / (M11
+ M01 + M10)
So basically,
Jaccard distance shows dissimilarity distance between sample sets and this was
the actual method how it is calculated. In
software we just run the program and we get the answer but actual method or
formula for it is mentioned above.
I hope this might help you to understand well that how
Jaccard distance is calculated.
Thank you…
No comments:
Post a Comment