I will develop the degree of relationship between two variables (education and income) in what we call correlation analysis, to represent this relationship using a graphic called scatter plot study a mathematical model to estimate the value of a variable based on the value of another in what we call regression analysis, and eventually will develop an exercise to apply what they learn, where we use real data of GSS data set (http://publicdata.norc.org:41000/gssbeta/aboutNDPSS.html). From the GSS data set, we have selected to variables to study the correlation between them.
Variable description
Education (variable 27 GSS): education variable denotes the level of education of respondents in number of years (for example, graduation is of 16 years and masters 18 years of education)
Income (Variable 63 GSS): income variable denotes the income of respondent (family) in dollars (annual)
Correlation analysis
This is the set of statistical techniques used to measure the strength of association between two variables. The main objective of correlation analysis is to determine how intense the relationship between two variables. Usually the first step is to display data in a scatter plot (Aldrich, 1995).
Assumptions of Correlation
The degree of relationship between two continuous variables were summarized by a correlation coefficient is known as "Pearson's r" in honor of the great mathematician Karl Pearson, who devised this method. This technique is valid as long if it is possible to establish very strict assumptions (David, 2005). These assumptions are:
Both x and y are continuous random variables. That is, unlike the reference regression analysis, it is acceptable to select certain values ??of x, and then measure and, y and x should vary freely.
The joint distribution is often normal. This is just b and the name of bivariate normal distribution.
Hypothesis
Ho: there is significant relationship between years of education and income level of Americans
H1: there is no significant relationship between years of education and income level of Americans
Analysis
Correlations
Education in years
Annual income in dollars
Education in years
Pearson Correlation
1
.021
Sig. (2-tailed)
.838
Sum of Squares and Cross-products
910.182
138580.182
Covariance
9.288
1414.083
N
99
99
Annual income in dollars
Pearson Correlation
.021
1
Sig. (2-tailed)
.838
Sum of Squares and Cross-products
138580.182
4.846E10
Covariance
1414.083
4.945E8
N
99
99
In above table we can see that there is no correlation (relationship) between education and income because the p value (at 95% confidence interval) is more than 0.05. base on this analysis we reject the null hypothesis and accept alternative hypothesis which say there is no significant relationship between two variables.
The correlation coefficient can take values from minus one to one, indicating that the closer one is the coefficient of correlation, in either direction, the stronger the linear association between two variables. The closer to zero the coefficient of correlation indicates that the weaker the association between two variables. If zero is concluded that there is a linear relationship between both variables (Francis and Gibson, 1999).
Scattered Plot
Scattered Plot diagram is one graphic that represents the relationship between two variables.
Above scattered diagram shows no relationship between education and income. This is a null scattered plot showing no trend and if we draw a linear line then we get a line horizontal to a- axis ...