Monday, 23 January 2012
Basic Difference Between Clustering Analysis and Discriminant analysis
Thursday, 19 January 2012
Scree Plot
A Scree Plot is a simple graph that shows the fraction of total variance in the data as explained or represented by each principal component.
The principal components with the largest fraction contribution is labeled with the label name from the preferences file.
Such a plot when read left-to-right across the abscissa, can often show a clear separation in fraction of total variance where the 'most important' components cease and the 'least important' components begin.
This point of separation is often called the 'elbow'.
This was something interesting to know for me:
In the PCA (Principal Component Analysis) literature, the plot is called a 'Scree' Plot because it often looks like a 'scree' slope, where rocks have fallen down and accumulated on the side of a mountain.
Some tips regarding when to use Scree Plot graph:
1) If there are less than 30 variables and communalities after extraction are greater than 0.7 OR if the sample size exceeds 250 and the average communality is greater than 0.6, than retain all factors having Eigenvalues more than 1.
2) If none of the above apply, we can use scree plot when the sample size is cionsiderably large- around 300 or more cases.
Links:
http://www.statisticshell.com/docs/factor.pdf
http://www.improvedoutcomes.com/docs/WebSiteDocs/PCA/Creating_a_Scree_Plot.htm
Monday, 16 January 2012
Factor Analysis with help of other statistical techniques
Factor Analysis - Components, PCA, Comparison of Factor Analysis and PCA
HELLO FRIENDS !!!
Hope you all are enjoying reading the blog, and the data that is updated is valuable to all of you. Let us today get more familiar with a new concept called Factor Analysis, PCA.
Factor Analysis
Factor analysis is a collection of methods used to examine how underlying constructs influence the responses on a number of measured variables.
There are basically two types of factor analysis: exploratory and confirmatory.
1. Exploratory factor analysis (EFA) attempts to discover the nature of the constructs influencing a set of responses.
2. Confirmatory factor analysis (CFA) tests whether a specified set of constructs is influencing responses in a predicted way.
Both types of factor analyses are based on the Common Factor Model, illustrated in figure 1.1. This model proposes that each observed response (measure 1 through measure 5) is influenced partially by underlying common factors (factor 1 and factor 2) and partially by underlying unique factors (E1 through E5). The strength of the link between each factor and each measure varies, such that a given factor influences some measures more than others. This is the same basic model as is used for LISREL analyses.
Factor analyses are performed by examining the pattern of correlations (or covariances) between the observed measures. Measures that are highly correlated (either positively or negatively) are likely influenced by the same factors, while those that are relatively uncorrelated are likely influenced by different factors.
Exploratory Factor Analysis
Objectives:
The primary objectives of an EFA are to determine:
· The number of common factors influencing a set of measures.
· The strength of the relationship between each factor and each observed measure.
Some common uses of EFA are to:
· Identify the nature of the constructs underlying responses in a specific content area.
· Determine what sets of items “hang together” in a questionnaire.
· Demonstrate the dimensionality of a measurement scale. Researchers often wish to develop scales that respond to a single characteristic.
· Determine what features are most important when classifying a group of items.
· Generate “factor scores" representing values of the underlying constructs for use in other analyses.
Confirmatory Factor Analysis
Objectives
The primary objective of a CFA is to determine the ability of a predefined factor model to fit an observed set of data.
Some common uses of CFA are to:
· Establish the validity of a single factor model.
· Compare the ability of two different models to account for the same set of data.
· Test the significance of a specific factor loading.
· Test the relationship between two or more factor loadings.
· Test whether a set of factors are correlated or uncorrelated.
· Assess the convergent and discriminant validity of a set of measures.
Factor Analysis vs. Principal Component Analysis
· Exploratory factor analysis is often confused with principal component analysis (PCA), a similar statistical procedure. However, there are significant differences between the two: EFA and PCA will provide somewhat different results when applied to the same data.
· The purpose of PCA is to derive a relatively small number of components that can account for the variability found in a relatively large number of measures. This procedure, called data reduction, is typically performed when a researcher does not want to include all of the original measures in analyses but still wants to work with the information that they contain.
· Differences between EFA and PCA arise from the fact that the two are based on different models. An illustration of the PCA model is provided in figure 2.1. The first difference is that the direction of influence is reversed: EFA assumes that the measured responses are based on the underlying factors while in PCA the principal components are based on the measured responses. The second difference is that EFA assumes that the variance in the measured variables can be decomposed into that accounted for by common factors and that accounted for by unique factors. The principal components are defined simply as linear combinations of the measurements, and so will contain both common and unique variance.
In summary, you should use EFA when you are interested in making statements about the factors that are responsible for a set of observed responses, and you should use PCA when you are simply interested in performing data reduction.
Factor- Example and Mathematical Model
- Seven methods of factor
extraction are available.
- Five methods of rotation are
available, including direct oblimin and promax for non-orthogonal
rotations.
- Three methods of computing
factor scores are available, and scores can be saved as variables for
further analysis.
Multiple orthogonal factors: After we have found the line on which the variance is maximal, there remains some variability around this line. In principal components analysis, after the first factor has been extracted, that is, after the first line has been drawn through the data, we continue and define another line that maximizes the remaining variability, and so on. In this manner, consecutive factors are extracted. Because each consecutive factor is defined to maximize the variability that is not captured by the preceding factor, consecutive factors are independent of each other. Put another way, consecutive factors are uncorrelated or orthogonal to each other.
STATISTICA FACTOR ANALYSIS | Eigenvalues (factor.sta) Extraction: Principal components | |||
---|---|---|---|---|
Value | Eigenval | % total Variance | Cumul. Eigenval | Cumul. % |
1 2 3 4 5 6 7 8 9 10 | 6.118369 1.800682 .472888 .407996 .317222 .293300 .195808 .170431 .137970 .085334 | 61.18369 18.00682 4.72888 4.07996 3.17222 2.93300 1.95808 1.70431 1.37970 .85334 | 6.11837 7.91905 8.39194 8.79993 9.11716 9.41046 9.60626 9.77670 9.91467 10.00000 | 61.1837 79.1905 83.9194 87.9993 91.1716 94.1046 96.0626 97.7670 99.1467 100.0000 |
Eigenvalues: In the second column above, we find the variance on the new factors that were successively extracted. In the third column, these values are expressed as a percent of the total variance (in this example, 10). As we can see, factor 1 accounts for 61 percent of the variance, factor 2 for 18 percent, and so on. As expected, the sum of the eigenvalues is equal to the number of variables. The third column contains the cumulative variance extracted. The variances extracted by the factors are called the eigenvalues. This name derives from the computational issues involved.
Which criterion to use: Both criteria have been studied in detail (Browne, 1968; Cattell & Jaspers, 1967; Hakstian, Rogers, & Cattell, 1982; Linn, 1968; Tucker, Koopman & Linn, 1969). Theoretically, you can evaluate those criteria by generating random data based on a particular number of factors. You can then see whether the number of factors is accurately detected by those criteria. Using this general technique, the first method (Kaiser criterion) sometimes retains too many factors, while the second technique (scree test) sometimes retains too few; however, both do quite well under normal conditions, that is, when there are relatively few factors and many cases. In practice, an additional important aspect is the extent to which a solution is interpretable. Therefore, you usually examines several solutions with more or fewer factors, and chooses the one that makes the best "sense." We will discuss this issue in the context of factor rotations below.
Histogram & Box Plot
Histogram
A histogram is a graphical representation of the distribution of the data. It contains tabular frequencies represented in the form of rectangles adjacent to each other. These discrete intervals are known as bins. The total area of the histogram is equal to the number of the data.
Box Plot
The Box plot is a chart that graphically represents the five most important descriptive values for a data set. It summarizes the following statistical measures:-
· Median
· Upper & lower Quartiles
· Minimum & maximum data values
Comparing histogram & box plots
® The data in a histogram is represented in the form of bars which are considered as the peaks. This helps us to interpret the data and also shows the fluctuations. Whereas in a box plot the values average one another out, causing the distribution to look roughly normal.
® A histogram is preferable over a box plot is when there is very little variance among the observed frequencies. The histogram displayed to the right shows that there is little variance across the groups of data; however, when the same data points are graphed on a box plot, the distribution looks roughly normal with a high portion of the values falling below six.
® When there is moderate variation among the observed frequencies, the histogram looks ragged and non-symmetrical due to the way the data is grouped. However, when a box plot is used to graph the same data points, the chart indicates a perfect normal distribution.
Source:-
http://en.wikipedia.org/wiki/Box_plot
http://en.wikipedia.org/wiki/Histogram
http://www.netmba.com/statistics/plot/box/
http://www.brighthub.com/office/project-management/articles/58254.aspx#
Factor Analysis - Few examples
Hi Friends!
It’ll be interesting to know that factor analysis was invented nearly 100 years ago by psychologist Charles Spearman, who hypothesized that the enormous variety of tests of mental ability--measures of mathematical skill, vocabulary, other verbal skills, artistic skills, logical reasoning ability, etc.--could all be explained by one underlying "factor" of general intelligence that he called g. He hypothesized that if g could be measured and you could select a subpopulation of people with the same score on g, in that subpopulation you would find no correlations among any tests of mental ability. In other words, he hypothesized that g was the only factor common to all those measures.
It was an interesting idea, but it turned out to be wrong. Today the College Board testing service operates a system based on the idea that there are at least three important factors of mental ability--verbal, mathematical, and logical abilities--and most psychologists agree that many other factors could be identified as well.
I came across different examples on factor-analysis problems. I would like to share them with you. It shows in how many different fields factor analysis can be used.
Some Examples of Factor-Analysis Problems
1. Suppose many species of animal (rats, mice, birds, frogs, etc.) are trained that food will appear at a certain spot whenever a noise--any kind of noise--comes from that spot. You could then tell whether they could detect a particular sound by seeing whether they turn in that direction when the sound appears. Then if you studied many sounds and many species, you might want to know on how many different dimensions of hearing acuity the species vary. One hypothesis would be that they vary on just three dimensions--the ability to detect high-frequency sounds, ability to detect low-frequency sounds, and ability to detect intermediate sounds. On the other hand, species might differ in their auditory capabilities on more than just these three dimensions. For instance, some species might be better at detecting sharp click-like sounds while others are better at detecting continuous hiss-like sounds.
2. Suppose each of 500 people, who are all familiar with different kinds of automobiles, rates each of 20 automobile models on the question, "How much would you like to own that kind of automobile?" We could usefully ask about the number of dimensions on which the ratings differ. A one-factor theory would posit that people simply give the highest ratings to the most expensive models. A two-factor theory would posit that some people are most attracted to sporty models while others are most attracted to luxurious models. Three-factor and four-factor theories might add safety and reliability. Or instead of automobiles you might choose to study attitudes concerning foods, political policies, political candidates, or many other kinds of objects.
4. Rubenstein (1986) studied the nature of curiosity by analyzing the agreements of junior-high-school students with a large battery of statements such as "I like to figure out how machinery works" or "I like to try new kinds of food." A factor analysis identified seven factors: three measuring enjoyment of problem-solving, learning, and reading; three measuring interests in natural sciences, art and music, and new experiences in general; and one indicates a relatively low interest in money.
Hope you all found this piece of information useful and interesting!!
Thanks
Shreya Khamar
Factor Analysis- An Insight
What is Factor Analysis??? Factor analysis is a statistical technique, the aim of which is to simplify a complex data set by representing the set of variables in terms of a smaller number of underlying (hypothetical or unobservable) variables, known as factors.
Types of Factor Analysis – There are basically two types of factor analysis: Exploratory Factor Analysis & Confirmatory Factor Analysis
Exploratory factor analysis (EFA) attempts to discover the nature of the constructs influencing a set of responses. It is exploratory when you do not have a pre-defined idea of the structure or how many dimensions are in a set of variables.
Uses of Exploratory factor analysis-
- To determine what sets of items hang together in a questionnaire.
- To determine what features are most important when classifying a group of items.
- For Psychometric instrument development
Confirmatory factor analysis (CFA) tests whether a specified set of constructs is influencing responses in a predicted way. It is confirmatory when you want to test specific hypothesis about the structure or the number of dimensions underlying a set of variables (i.e. in your data you may think there are two dimensions and you want to verify that).
Uses of Confirmatory Factor Analysis-
- To test whether a set of factors are correlated or uncorrelated
- For social research in developing tests such as intelligence test and personality test
Similarities between Exploratory and Confirmatory Factor Analysis
- Exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) are two statistical approaches used to examine the internal reliability of a measure.
- Both are used to assess the quality of individual items.
- Both techniques assume a normal distribution.
Differences between Exploratory and Confirmatory Factor Analysis
- CFA requires that a particular factor structure be specified, in which the researcher indicates which items load on which factor. EFA allows all items to load on all factors.
- CFA requires specifications of the number of factors whereas EFA determines the factor structure.
Sources: http://www.stat-help.com/factor.pdf