Friday, 13 January 2012

Jaccard Distance Method



Hello Friends

Today as I was going through what we have done so far and found one thing which we use when we calculate Hierarchical Cluster Analysis Method.  It is Jaccard distance method which is used when value of the variable is in binary form.  So I thought I should learn more about Jaccard distance.

Jaccard Distance
Jaccard distance is basically dissimilarity between sample sets.  This distance is obtained by subtracting Jaccard coefficient from 1.  It is used to for comparing similarity and diversity of the sample sets.  Jaccard Coefficient measures the similarity between sample sets.

Example
There are two objects A and B, both has n binary attributes.  Each attribute of A and B can either be 0 or 1. The total number of each combination of attributes for both A and B can be as follows:
·         M11 represents the total number of attributes where A and B both have a value of 1.
·         M01 represents the total number of attributes where the attribute of A is 0 and the attribute of B is 1.
·         M10 represents the total number of attributes where the attribute of A is 1 and the attribute of B is 0.
·         M00 represents the total number of attributes where A and B both have a value of 0.
Each attribute must fall into one of these four categories, meaning that
M11 + M01 + M10 + M00 = n.
The Jaccard similarity coefficient, J, is given as
J = (M11) / (M11 + M01 + M10)
The Jaccard distance, J', is given as
                J’ = (M01 + M10) / (M11 + M01 + M10)

So basically, Jaccard distance shows dissimilarity distance between sample sets and this was the actual method how it is calculated.  In software we just run the program and we get the answer but actual method or formula for it is mentioned above.

I hope this might help you to understand well that how Jaccard distance is calculated.

Thank you…

No comments:

Post a Comment