The hotel data collected contains mix variables which among which some are categorical; some are nominal and some are just numeric data. In order to analyze the requirements of directors of the hotel, we will start with the exploration of data using descriptive statistics. Later on we will assess the association and casual relationships between variables using appropriate statistics in Excel.
Findings (Analysis)
Task 1
There are many techniques available to estimate the missing values and complete the data set but as we have observed, only record # 43 and 81 were having missing values. While entering data, one can contact the respondents and obtain the value. Here since the data loss is only for the two records, we will remove these records.
It is important for the analysis to have no outliers as it creates abnormalities in the data distribution. We have observed that distance, advertising and employee variables have 1 outlier each while the Type A and Type B has two outliers each. As these observations are skewing the distribution and effecting the averages. We will remove them as they will affect our conclusions. The final data has 91 observations.
In our data besides the Hotel Variable, we have 4 categorical Variables and we will represent them using histogram.
Figure 1, Histogram of Region
Figure 2: Histogram of Size
Figure 3: Histogram of Gender
Figure 4: Histogram of Year
Following are the Histograms of sizes in each region
Figure 5: Distribution of Sizes in All regions
Figure 6: Histogram with Frequency Distribution of sizes of Hotels in Denmark.
Figure 7: Histogram of Frequency Distribution of sizes of Hotels in Holland.
Figure 8: Histogram of Frequency Distribution of sizes of Hotels in Germany.
Task 2
Following are the descriptive statistics of numerical data:
Table 1: Descriptive Statistics of Distance Variable
Distance
Mean
2.7425
Standard Error
0.49622
Median
2.125
Mode
1.4
Standard Deviation
4.962202
Sample Variance
24.62345
Kurtosis
92.90575
Skewness
9.471457
Range
50
Minimum
1
Maximum
51
Sum
274.25
Count
100
Confidence Level(95.0%)
0.984609
Table 2: Descriptive Statistics of Advertising
Advertising
Mean
1511.71
Standard Error
143.8146
Median
1410.5
Mode
2010
Standard Deviation
1438.146
Sample Variance
2068264
Kurtosis
42.85336
Skewness
5.296972
Range
13024
Minimum
106
Maximum
13130
Sum
151171
Count
100
Confidence Level(95.0%)
285.3594
Table 3: Descriptive Statistics of Employee Variable
Employees
Mean
20.15
Standard Error
0.715679
Median
20
Mode
20
Standard Deviation
7.156794
Sample Variance
51.2197
Kurtosis
17.39431
Skewness
2.954261
Range
56
Minimum
11
Maximum
67
Sum
2015
Count
100
Confidence Level(95.0%)
1.420063
Table 4: Descriptive Statistics of Type A
Type A
Mean
385620
Standard Error
10677.96
Median
369000
Mode
476000
Standard Deviation
106779.6
Sample Variance
1.14E+10
Kurtosis
1.741373
Skewness
0.328692
Range
716000
Minimum
0
Maximum
716000
Sum
38562000
Count
100
Confidence Level(95.0%)
21187.39
Table 5: Descriptive Statistics of Type B
Type B
Mean
210363
Standard Error
2604.55
Median
213000
Mode
223200
Standard Deviation
26045.5
Sample Variance
6.78E+08
Kurtosis
45.18853
Skewness
-6.13151
Range
228500
Minimum
0
Maximum
228500
Sum
21036300
Count
100
Confidence Level(95.0%)
5167.993
We have removed 9 observations from the total data set. We removed record 43 and 81 because of missing values. Record # 10 was removed due to outliers in distance variable. Record # 29 of was removed for outliers in advertisement variable. We removed the record of hotel # 29 due to outliers in employees. Record # 8 and 22 are removed due to outliers in Type A. Record # 2 and 5 are removed due to outliers in Type B variable.
Table 6: Descriptive Statistics of Distance Variable
Distance
Mean
2.278571
Standard Error
0.09967
Median
2
Mode
1.5
Standard Deviation
0.950793
Sample Variance
0.904008
Kurtosis
-1.12109
Skewness
0.430463
Range
3
Minimum
1
Maximum
4
Sum
207.35
Count
91
Confidence Level(95.0%)
0.198012
Table 7: Descriptive Statistics of Advertising
Advertising
Mean
1390.626
Standard Error
87.37846
Median
1410
Mode
2010
Standard Deviation
833.5374
Sample Variance
694784.5
Kurtosis
-1.32121
Skewness
-0.2885
Range
2515
Minimum
106
Maximum
2621
Sum
126547
Count
91
Confidence Level(95.0%)
173.5926
Table 8: Descriptive Statistics of Employee Variable
Employees
Mean
19.6044
Standard Error
0.570517
Median
20
Mode
20
Standard Deviation
5.442383
Sample Variance
29.61954
Kurtosis
-0.59447
Skewness
0.436909
Range
23
Minimum
11
Maximum
34
Sum
1784
Count
91
Confidence Level(95.0%)
1.133431
Table 9: Descriptive Statistics of Type A
Type A
Mean
384237.4
Standard Error
9462.925
Median
367000
Mode
476000
Standard Deviation
90270.56
Sample Variance
8.15E+09
Kurtosis
-0.6473
Skewness
0.518563
Range
407500
Minimum
225500
Maximum
633000
Sum
34965600
Count
91
Confidence Level(95.0%)
18799.75
Table 10: Descriptive Statistics of Type B
Type B
Mean
213849.5
Standard Error
865.5422
Median
212700
Mode
223200
Standard Deviation
8256.746
Sample Variance
68173861
Kurtosis
-1.39057
Skewness
0.033973
Range
28100
Minimum
200400
Maximum
228500
Sum
19460300
Count
91
Confidence Level(95.0%)
1719.551
Table 11 in appendix shows the Grouped Frequency Distribution of the Type A data: