The coursework consists of a statistical analysis on the differences in prices in a city of US on the basis of luxuries provided in the houses. There are different houses whose structural and location wise information has been gathered through a survey and its features have been focused to distinguish one house from the other. For this purpose, different variables have been taken which can cause the price change of the houses. The variables PRICES (Price of house), SQ_FT (Area of the house in square feet), BEDS (Number of bedrooms in the house), BATHS (Number of bathrooms in the house), GARAGE (Number of cars can be fitted in a garage) and various categorical variables have also been taken which consists of HEAT takes on the value of 0 for gas forced air heating and 1 for electric heat. STYLE is the architectural style of the home: 0 indicates a trilevel, 1 indicates two-story house, and 2 indicates that the house is a ranch-styled home. FIRE and BASEMENT indicate the presence (1) or absence (0) of at least one fireplace or basement. SCHOOL is the school district (0 _ East school district; 1 _ABC Valley school district). All else the same, ABC Valley is viewed as the preferred school district.
Methodology
The above described data is initially used for finding correlations of significant variables to find out whether the variables are correlated we have just taken irrelevant variables in the study. Higher correlation values will show that the variables are highly correlated and can give reliable results. A t - test statistics will be used to analyze the significance of the data either scaled or ordinal. This is an important part of the analysis where the variability and significance have been analyzed in order to decide whether to go for further calculations or not because if there is a high variation in the data then again the problem of reliability of the results can be obtained which can lead to biased results.
ANOVA is another statistical tool which provides an overall analysis of variance and f - statistic whether gives significant results or not. Finally, after analyzing all the data, a regression analysis will be used to model an equation and which can be used to predict the house prices when the above variables are known. The results will certainly depend upon the coefficients whether they are significant or not and whether the overall model is significant or not which could be seen through f - statistics.
Data Analysis
Descriptive Statistics
The mean area of the houses in the city of US is almost 1555 sq. feet with a standard deviation of almost 396 sq. feet showing that the houses that have three to four rooms on average consist between 1159 sq. feet and 1951 sq. feet. The average prices for houses in the city found out to be almost $86300 with a standard deviation of $20750 which shows that the housing prices lies in the range from $65550 ...