Wednesday, 11 January 2012

Dendrogram


In yesterday’s class we learned about Dendrogram. The dendrogram is a visual representation of the spot correlation data. The individual spots are arranged along the bottom of dendrogram and referred to as leaf nodes. Spot clusters are formed by joining individual spots or existing spot clusters with the join point referred to as a node.
At each dendrogram node we have a right and left sub-branch of clustered spots. Spot clusters can refer to a single spot or a group of spots. The vertical axis is called distance and it refers to a distance measure between spots or spot clusters.
The height of the node can be thought of as the distance value between the right and left sub-branch clusters. The distance measure between two clusters is calculated as follows:
D=1-C
Where D = Distance and C = correlation between spot clusters.
We could interpret that, if spots are highly correlated, they will have a correlation value close to 1 and so D=1-C then we will have a value close to zero. Therefore, highly correlated clusters are nearer the bottom of the dendrogram.
Spot clusters that are not correlated have a correlation value of zero and a corresponding distance value of 1. Spots that are negatively correlated, i.e. showing opposite expression behavior, will have a correlation value of -1 and D = 1 - -1 = 2.
As we move up the dendrogram, the spot clusters get bigger and the distance between spot clusters increases in value. It becomes difficult to interpret distance between spot clusters when spot clusters increase in size. A possible way to think about the expression profile behavior of two spots would be to see how far up the dendrogram we need to go so we can move between the two spots.
Links:

1 comment:

  1. Good value add. Novel way of using correlations as distances (D=1-C). However, it is a verbatim reproduction from the site and does not show a business application.
    Marks=3

    ReplyDelete