A scatter graph is a type of mathematical diagram that uses Cartesian coordinates to display values ??of two variables for a set of data. Data are shown as a set of points, each with the value of a variable which determines the position on the horizontal axis and the value of another variable determined by the position on the vertical axis. A scatter graph is also known as scatter plot.
A scatter plot can suggest various types of correlations between variables with a confidence interval determined. The correlation can be positive (increase), negative (down), or null (uncorrelated variables). You can draw a line adjustment (also called "trend line") to study the correlation between variables. The scatter graph shown in the figure above gives us the clear description of the data set obtained from the Eurostat website; it can be observed that there is linear trend between 2007 and 2011 which means that there is significant relationship between the number of broadband lines in January 2007 and in January 2011.
The coefficient of correlation shown in the table below suggests that there is strong positive correlation between the two variables; the correlation coefficient is not sensitive to the units of each variable.
Correlation Coefficient
2007 January
2011 January
2007 January
1
2011 January
0.989689402
1
The coefficient of determination or R2 measures the goodness of fit estimates of the regression equation. It is used in both simple regression and multiple regressions. It gives a general idea of ??the model fit. It is interpreted as the proportion of the variance of variable Y explained by the regression, varies between 0 and 1 and is often expressed as a percentage.
Regression Statistics
Multiple R
0.99009274
R Square
0.980283634
Adjusted R Square
0.97949498
Standard Error
623366.5611
Observations
27
The results in the table above shows that 98% of the total variability is explained by the selected variables, therefore, we can say that the data is good fitted as in simple regression, an R2 close to 1 is sufficient to say that the fit is good.
Coefficientsa
Model
Unstandardized Coefficients
Standardized Coefficients
t
Sig.
B
Std. Error
Beta
1
(Constant)
155738.240
237421.901
.656
.518
January2007
1.610
.046
.990
35.256
.000
a. Dependent Variable: January2011
The coefficients of regression equation are shown in the table; the equation can be given as follow:
January 2011 = 155738.240 + 1.610 (January 2007)
It is important to note that if you want to make predictions, it is desirable that the coefficient of determination is high, because the higher the value of R2, the higher that of the unexplained variation is small.
The predicted values for the countries of Germany, Spain and the UK based on their number of broadband lines in January 2007 are shown below.
Prediction
Difference
Germany
24148280.24
9246080
Spain
10866528.89
4213864
United Kingdom
21250438.02
8148140
Part 2
Internet Retail Sales: Quarterly Data
Time series is a collection of observations taken over time whose main objective is to describe, explain, predict and control some process. The observations are ordered with respect to time and successive observations are usually dependent. In fact this dependence between the observations will play an important role in the analysis of the series.
Seasonal Decomposition
Series Name:InternetRetailSales
DATE_
Original Series
Moving Average Series
Difference of Original Series from Moving Average Series