RE1702: Real Estate Data Analytics

Questions

Q1. How many observations are there in the raw dataset? How many variables are there?

Q2. Produce a table of summary statistics for all the variables included in the raw dataset. For the two variables, area per square meter and distance in km to the CBD, what are their means and standard deviations?

Q3. Produce a table of correlation coefficients for all the variables in the raw dataset. What is the correlation between the area per square meter and the transaction price? Produce a scatter plot for these two variables. What is the correlation between the area per square meter and the number of units? Also produce a scatter plot for these two variables. Provide some brief reasoning to rationalize the correlations; there are no definite answers—you earn points as long as your arguments make logical sense.

Q4. Take 1995 as the base year. Create 6 dummy variables for properties transacted in the years 1996, 1997, 1998, 1999, 2000, and 2001, respectively. Provide summary statistics for these 6 variables. How many % of transactions occurred in 1998?

Q5. Run a standard level-level regression. The regression model is Pit = Xitβ + εit, where the subscripts t and I indicate the transaction year and individual property, respectively. Interpret the coefficient estimates of floor area and freehold.

Q6. Run a standard semi-log regression. The regression model is ln(Pit) = Xitβ + εit, where the subscripts t and I indicate the year and individual property, respectively. Interpret the coefficient estimates of floor level and distance to MRT. What is the R2 value? What is the adjusted R2 value?

Q7. Repeat the regression in Q6, but this time, include the 6-year dummy variables in the regression function. What is the R2 value? What is the adjusted R2? Is it worthwhile to include these year dummy variables in the regression function, for the purpose of improving goodness of fit? Interpret the coefficient estimates of the year 1998 dummy variable. Did Asian Financial Crisis strike Singapore’s private housing market?

Q8. Create a period dummy variable “Period 2” whose value is one if the property was sold between 1998-2001 and zero if it was transacted during 1995-1997. Create the interaction term between Period 2 and distance in CBD. Repeat the regression in Q6, but this time, include the Period 2 dummy variable and the interaction term as the additional explanatory variables in the regression. Also, interpret the property value with respect to the distance to CBD during 1998-2001. Was the price gradient of CBD in 1998-2001 flatter than that in 1995-1997?