Wednesday 11 January 2012

An insight on Clustering Analysis

Hi friends,

Yesterday in our BA session we learnt a new concept called cluster analysis. So what is cluster analysis? It is primarily identifying groups of individuals or objects that are similar to each other but different from other individuals or groups. Identification of such clusters or groups can be intellectually satisfying, profitable, or sometimes both. For e.g. using your customer base, you may be able to form clusters of customers who have similar buying habits or demographics. You can take advantage of these similarities to target offers to subgroups that are most likely to be receptive to them. One similar exercise we did in class yesterday where in we studied, classified and clustered various variables which affected customer’s satisfaction with retail stores.

There is one more analysis which is used to classify objects which is known as discriminant analysis. The main objective of such analysis is to to assess the adequacy of classification, given the group membership of the objects under study; or to assign objects to (known) groups of objects. Discriminant Analysis may thus have a descriptive or a predictive objective. But here is a very important tip - Although both cluster analysis and discriminant analysis classify objects (or cases) into categories, discriminant analysis requires you to know group membership for the cases used to derive the classification rule. The goal of cluster analysis is to identify the actual groups. For e.g. - if you are interested in distinguishing between several disease groups using discriminant analysis, cases with known diagnoses must be available. Based on these cases, you derive a rule for classifying undiagnosed patients. In cluster analysis, you don’t know who or what belongs in which group. You often don’t even know the number of groups.

Further we also learnt that broadly two methods of performing cluster analysis are used – Hierarchical and K-Means. The k-means algorithm gives us what's sometimes called a simple or at partition, because it just gives us a single set of clusters, with no particular organization or structure within them. But it could easily be the case that some clusters could, themselves, be closely related to other clusters, and more distantly related to others. (If we are clustering images, we might want not just to have a cluster of flowers, but roses and marigolds within that). So sometimes we need hierarchical clustering which is depicted by a tree or a dendrogram.

We also learnt that there are two approaches to hierarchical clustering - ‘from the bottom-up’ known as agglomerate clustering where in small clusters are grouped into larger ones and ‘from the top down’ known as divisive clustering where in big clusters are splitted into small ones.

One question which popped in my mind during the session was that how many clusters are appropriate solutions for any problem? I searched and found a tip that there is no right or wrong answer as to how many clusters you need. It depends on what you’re going to do with them. To find a good cluster solution, you must look at the characteristics of the clusters at successive steps and decide when you have an interpretable solution or a solution that has a reasonable number of fairly homogeneous clusters.

Anyways I hope the above information offered you a bit of insight and added value to what we learnt in class. Hope to find new and interesting things out here. I will be posting soon.

Bye.

Source - http://www.norusis.com/pdf/SPC_v13.pdf

No comments:

Post a Comment