Student Identity

Read Complete Research Material

STUDENT IDENTITY

Application of Statistics in Epidemiology

[Name of the Institute]Section 1

Question 1: Examination of Total Cholesterol

Distribution and Outliers

Distribution

In order to analyse the distribution of total cholesterol, here we have used histogram with a curvilinear plot of distribution.

Statistics

Tchol

N

Valid

450

Missing

0

Mean

5.6976

Std. Error of Mean

.17508

Median

5.3350

Mode

3.36a

Std. Deviation

3.71406

Variance

13.794

Skewness

10.338

Std. Error of Skewness

.115

Kurtosis

122.634

Std. Error of Kurtosis

.230

Range

53.54

Minimum

2.55

Maximum

56.09

Percentiles

25

4.4428

50

5.3350

75

6.2940

a. Multiple modes exist. The smallest value is shown

In a distribution which is not perfectly normal and skewed to the right, median is better to use a measure of centerl tendency than mean. The total cholestrol in our sample has an verage vlaue of 5.335 mmol/l. Totla cholestrol among the individaul varies by 3.714 mmol/l accordin to the standard deviation and by 1.8512 mmol/l according to the Inter-Quartile Range.

Outliers

Outliers, the extreme values of a random variable, affect the efficiency of the inferences made from the data set. Here the outliers are identified using box plot.

Extreme Values

Case Number

Value

Tchol

Highest

1

443

56.091

2

99

43.251

3

86

43.997

86th, 99th and 443rd observations have very high values of total cholesterol. These high values can create biasness in measures specially in the measures of centeral tendency. This problem can be solved through exluding these outliers in our analysis. In SPSS we can filter out the three cases thorugh establishing the creiteria that only the cases in which total cholesterol is less than 20 are sto be selected for analysis.

Frequency Table for Tchol_Group variable

The values of total cholesterol are divided into five groups and following is the grouped frequency distribution.

Tchol_Group

Frequency

Percent

Valid Percent

Cumulative Percent

Valid

2.545 to 3.963

60

13.42

13.42

13.42

3.963 to 5.381

168

37.58

37.58

51.00

5.381 to 6.529

133

29.75

29.75

60.76

6.529 to 7.947

72

16.10

16.10

96.86

7.947 to 9.365

14

3.13

3.13

100

Total

447

100.0

100.0

Filtered data

In Q1a(ii) the outliers are identified and the data is filtered for these outliers. Following are the descriptive statistics of total cholesterol after filtration.

Statistics

Tchol

N

Valid

447

Missing

0

Mean

5.4152

Std. Error of Mean

.06098

Median

5.3230

Mode

3.36a

Std. Deviation

1.28930

Variance

1.662

Skewness

.302

Std. Error of Skewness

.115

Kurtosis

-.251

Std. Error of Kurtosis

.230

Range

7.08

Minimum

2.55

Maximum

9.62

Percentiles

25

4.4330

50

5.3230

75

6.2840

a. Multiple modes exist. The smallest value is shown

In above statistics, after filteration, the mean and median are now very close depicting that the distribution is now a normal distribution, with a little skewness. The value of skewness and the slight difference between the median and mean shows that the distruibution is positively skewed but to a very saml extent. Value of Kurtosis represent a little divergence from a normal spread. Now data is less dispersed than it was with the outliers. We can see that the standard deviation has lowered after filteration.

If we compare the histogram of filtered data with that of the unfiltered data, skewness in this plot has reduced making it more symmetric. After the valure of 8 mmol/l a little skewness in shape can be observed.

The results for the distribution of total cholestrol as found in the statistics and histogram are further confirmed with this box plot. The two higher values of total cholestrol in the 4th and 17th observations are causing the slight skewness to the right in the distribution. We tested the distribution of total cholesterol after removing outliers using descriptive statistics, histogram, and box plot. Removing outliers, successfully increased the symmetricity in the distribution.

Question 2: Age Groups

Extreme Values

Case Number

Value

Age

Highest

1

345

89.16

Lowest

1

247

18.29

There appears to be 2 outliers. Observation 247 is the lowest with values of ...
Related Ads