In order to analyse the distribution of total cholesterol, here we have used histogram with a curvilinear plot of distribution.
Statistics
Tchol
N
Valid
450
Missing
0
Mean
5.6976
Std. Error of Mean
.17508
Median
5.3350
Mode
3.36a
Std. Deviation
3.71406
Variance
13.794
Skewness
10.338
Std. Error of Skewness
.115
Kurtosis
122.634
Std. Error of Kurtosis
.230
Range
53.54
Minimum
2.55
Maximum
56.09
Percentiles
25
4.4428
50
5.3350
75
6.2940
a. Multiple modes exist. The smallest value is shown
In a distribution which is not perfectly normal and skewed to the right, median is better to use a measure of centerl tendency than mean. The total cholestrol in our sample has an verage vlaue of 5.335 mmol/l. Totla cholestrol among the individaul varies by 3.714 mmol/l accordin to the standard deviation and by 1.8512 mmol/l according to the Inter-Quartile Range.
Outliers
Outliers, the extreme values of a random variable, affect the efficiency of the inferences made from the data set. Here the outliers are identified using box plot.
Extreme Values
Case Number
Value
Tchol
Highest
1
443
56.091
2
99
43.251
3
86
43.997
86th, 99th and 443rd observations have very high values of total cholesterol. These high values can create biasness in measures specially in the measures of centeral tendency. This problem can be solved through exluding these outliers in our analysis. In SPSS we can filter out the three cases thorugh establishing the creiteria that only the cases in which total cholesterol is less than 20 are sto be selected for analysis.
Frequency Table for Tchol_Group variable
The values of total cholesterol are divided into five groups and following is the grouped frequency distribution.
Tchol_Group
Frequency
Percent
Valid Percent
Cumulative Percent
Valid
2.545 to 3.963
60
13.42
13.42
13.42
3.963 to 5.381
168
37.58
37.58
51.00
5.381 to 6.529
133
29.75
29.75
60.76
6.529 to 7.947
72
16.10
16.10
96.86
7.947 to 9.365
14
3.13
3.13
100
Total
447
100.0
100.0
Filtered data
In Q1a(ii) the outliers are identified and the data is filtered for these outliers. Following are the descriptive statistics of total cholesterol after filtration.
Statistics
Tchol
N
Valid
447
Missing
0
Mean
5.4152
Std. Error of Mean
.06098
Median
5.3230
Mode
3.36a
Std. Deviation
1.28930
Variance
1.662
Skewness
.302
Std. Error of Skewness
.115
Kurtosis
-.251
Std. Error of Kurtosis
.230
Range
7.08
Minimum
2.55
Maximum
9.62
Percentiles
25
4.4330
50
5.3230
75
6.2840
a. Multiple modes exist. The smallest value is shown
In above statistics, after filteration, the mean and median are now very close depicting that the distribution is now a normal distribution, with a little skewness. The value of skewness and the slight difference between the median and mean shows that the distruibution is positively skewed but to a very saml extent. Value of Kurtosis represent a little divergence from a normal spread. Now data is less dispersed than it was with the outliers. We can see that the standard deviation has lowered after filteration.
If we compare the histogram of filtered data with that of the unfiltered data, skewness in this plot has reduced making it more symmetric. After the valure of 8 mmol/l a little skewness in shape can be observed.
The results for the distribution of total cholestrol as found in the statistics and histogram are further confirmed with this box plot. The two higher values of total cholestrol in the 4th and 17th observations are causing the slight skewness to the right in the distribution. We tested the distribution of total cholesterol after removing outliers using descriptive statistics, histogram, and box plot. Removing outliers, successfully increased the symmetricity in the distribution.
Question 2: Age Groups
Extreme Values
Case Number
Value
Age
Highest
1
345
89.16
Lowest
1
247
18.29
There appears to be 2 outliers. Observation 247 is the lowest with values of ...