Analysis of variance (ANOVA) is a statistical procedure that attempts to determine the amount of variation in a variable y that is explained by a variable, or set of variables, X. It does so by decomposing a group of measurements into its components of systematic variation and random variation (or measurement error).
A simple, one-way ANOVA compares the variation in a variable y within groups, to the variation between groups, where one defines the groups by a single variable. It is equivalent to a simple linear regression with indicator variables for each group. An ANOVA procedure estimates the following model: y = ß X + e, where e is the random error term, y is the endogenous, or dependent, variable, X is the exogenous, or independent, set of variables (traditionally indicator variables that describe categorical variables, such as race, gender, or type of treatment), and ß is the coefficient. X and y are measured; e and ß are estimated.
Section -I
Data:ADD dataset from Howell's textbook, available on course web page.
In 1965, second-grade teachers in a number of schools in Vermont were asked to complete a questionnaire indicating the extent to which each student exhibited behaviors associated with attention deficit disorder (ADD). Based on the questionnaire, an ADD “score” was computed for each student (with higher scores indicating more ADD-like behaviors). The questionnaires for the same children were again completed when the children were in fourth and fifth grades. The children were followed through high school, and in 1985 Howell and Huessy reported some data from this study. The variables in the data set are as follows:
ID
ID number
ADDSC
Average of the three ADD scores
SEX
1=male; 2=female
REPEAT
1 = repeated at least one grade; 0 = did not repeat a grade
IQ
IQ obtained from a group-administered test
ENGL
Level of English in ninth grade: 1=college prep; 2=general; 3=remedial
ENGG
Grade in English in ninth grade: 4=A; 3=B; etc.
GPA
Grade point average in ninth grade
SOCPROB
Social problems in ninth grade: 1=yes; 0=no
DROPOUT
1 = dropped out before completing high school; 0 = did not drop out
Section II
Assumptions, Data Screening, and Verification of Assumptions
Analysis of variance assumes normal distributions and homogeneity of variance. Therefore, in a one-factor ANOVA, it is assumed that each of the populations is normally distributed with the same variance (s²). In between-subjects analyses, it is assumed that each score is sampled randomly and independently. Research has shown that ANOVA is "robust" to violations of its assumptions.
This means that the probability values computed in an ANOVA are satisfactorily accurate even if the assumptions are violated. Moreover, ANOVA tends to be conservative when its assumptions are violated. This means that although power is decreased, the probability of a Type I error is as low or lower than it would be if its assumptions were met. There are exceptions to this rule. For example, a combination of unequal sample sizes and a violation of the assumption of homogeneity of variance can lead to an inflated Type I error rate.
An important first step in the analysis of variance is establishing the ...