As a leading player in the real estate service industry, CBRE Group provides valuation and advisory services to support customers in their decision-making. The real estate market is impacted by different factors which directly affect the price of the properties. Hence, it is fundamental to develop an accurate model to enhance the CBRE property appraisal service. A literature review is conducted to examine the approach and business analytic methods implemented in different investigations related to housing price. The data was collected via zenodo.org. Descriptive and predictive analysis methods were applied, to finally obtain a statistically significant model in which 71.2% of the variations in housing price were explained by the explanatory variables. Moreover, some recommendations are given.
Table of Contents
List of Tables
List of Figures
CBRE Group is one of the leading organisations globally in real estate services and investment, ranked 146 in the Fortune 500 in 2019 (CBRE n.d. a), with a market share in Australia of 18.8% (Thomson 2020). The company is based in The United States of America, has more than 450 offices worldwide and around 90.000 employees (Thomson 2020). In the Australian market, CBRE has been 20 years operating, with offices in Sydney, Melbourne, Brisbane, Perth, and the Gold Coast, employing over 130 people (CBRE Residential Projects n.d.). The Asia-Pacific segment, which is worth approximately 10% of the overall revenue, includes Australian operations as well as New Zealand, China, India, and Japan, to illustrate (Thomson 2020). CBRE is also recognised as one of the Australian leaders providing expertise, insights, and resources to support customers in their decision-making. The organisation provides advisory and transaction services, capital markets, global workplace solutions, investment management, project management, property management and valuation, research, and valuation service (CBRE n.d. b).
Australian residential housing market has been changing dramatically, especially in the largest cities such as Melbourne, where the property price has grown significantly. Despite the COVID-19 pandemic, house pricing in 2020 rose by 2.9% in Melbourne (Australian Bureau of Statistics 2021). As the market price is fluctuating, several variables impact the value of a property. Therefore, it is fundamental to develop a model to predict the housing prices in greater Melbourne, using business analytical methods, to enhance CBRE Group services, especially in the appraisal of residential properties, to provide accurate prices to their customers between the forecasted and actual price and support their decision-making.
By conducting an explanatory data analysis, it was found that depending on the region where the property is located, there may be a variation of prices. Table 1 shows that most of the houses are concentrated in the northern, southern, and western metropolitan areas. Additionally, on average the most expensive properties are in the southern metropolitan region, while the cheapest are in the northern and western metropolitan areas. As a result, it was decided to focus on developing a model for the metropolitan area.
There were also analysed other facilities, that may affect the housing price. Figure 1 illustrates how the number of bathrooms in a property varies according to the price in the southern metropolitan region.
Innovations and technological advancements have led to a massive amount of data, that businesses require to deal with, to make accurate decisions and enhance their competitive advantages. Therefore, organizations are demanding efficient systems that can process and analyse a higher amount of data. In this way, real estate services require data-driven systems to support their decision-making during project development, planning and feasibility studies (Kim, Seo & Chung 2020). The term of Big data implies high volume, velocity, and variety. It can be produced in real-time and has a diverse format such as numerical or categorical data. Moreover, it can be used to analyse and extract valuable information, find patterns, and forecast changes based on the knowledge generated. (Kim, Seo & Chung 2020). One of the techniques used to analyse data is linear regression, to model the linear relationship between dependent and independent variables. It is applied to predict the value of a response variable Y, in the function of predictor known variables X or explanatory variables (Singh, Sharma & Dubey 2020).
There have been several attempts to model housing prices based on different variables. Singh, Sharma, and Dubey (2020) aimed to identify the factors that impact the final sale price of a house statistically using a data set of Ames, Iowa. Different techniques were used for this predictive analysis, to illustrate, LASSO (Least Absolute Shrinkage and Selection Operator) was used for clearing data and variable selection. The most significant considered features were property layout, property size, location, parking, and garden (Singh, Sharma & Dubey 2020). Additionally, techniques such as random forest, gradient boosting and linear regression modelling were used to determine the predictive variable to estimate the sale price. These models were validated with the Random Mean Squared Error, to measure the accuracy (Singh, Sharma & Dubey 2020).
Another investigation was conducted by Li et al. (2017), to evaluate the fluctuations of housing prices used geospatial data analytics and modelling techniques, a total of 66,688 geotagged records in Wuhan, China, were collected. The concept of natural cities and health/tail breaks were applied, to identify geographical events, where the latter were the residential districts with higher housing prices (Li et al. 2017). Other techniques used were the nearest neighbour analysis to calculate spatial patterns, and geographically weighted regression to validate how the relationship between dependent and independent variables change spatially (Li et al. 2017).
Other studies, that aimed to efficiently predict the housing prices according to customer budget and priorities, was developed by Bhagat, Mohokar, and Mane (2016), with a data set of Navi Mumbai, India containing values from 2009 till 2015. The model was created using a linear regression algorithm to forecast future prices. The examined attributes were the years of the data set that were divided into quarters, it was also categorised into upper, average, and lower, being upper the most renovated houses full of facilities (Bhagat, Mohokar & Mane 2016). The linear regression allows to use of the data in the most efficient way, fulfil customer needs by supporting decision making through measuring the accuracy of the model.
In this study, data from Melbourne houses will be analysed. The data was collected mainly from a public access website, zenodo.org. The original dataset contained 34.857 data of properties between the years 2016 to 2018, and 20 variables, such as suburb where the property is located, postcode, council area, region, distance to CBD, price, type of house, number of bedrooms, bathrooms, car parking spaces, land size, building area, and building year. The next step was cleaning the data. Any observation, that had missing or inconsistent values, was removed from the dataset, including the land size observations with a value lower than 50. This process was conducted by filtering the columns such as price, bedroom, building year, and land size. Finally, the dataset contained 7.587 observations. Moreover, some categorical variables such as type of house and region were converted into numerical data.
Aiming to solve the identified problem, predictive analysis was conducted using the multiple linear regression method, and descriptive analysis was implemented through histogram, line chart and pie chart, to identify the independent variables and define the model. Several regressions were run, to finally obtain the most accurate model. The response variable considered to create the model was the price.
In the histogram can be observed that most of the housing prices are in the range between 0.1 to 1.1 million Australian dollars, and the trend is downwards, as higher the price as fewer are the properties.
The line chart, Figure 3, illustrates the average variance of the housing prices throughout 2016 to 2018 according to the type of property, h (house, cottage, villa, semi, terrace), u (unit, duplex), and t (townhouse). On average, h has the higher prices compared with the other 2 variables. The prices fluctuate around the same ranges for each variable, there is a noticeable upward trend in 2018. Moreover, Figure 4 shows that most of the population are h, hence, this variable could be more relevant to consider in the model.
For the initial multiple linear regression analysis was used the cleaned dataset with 7.587 observations, considering 4 independent variables, postcode, bedroom, bathroom, and car. According to the results of each regression, the variables with a P-value<0.05 were excluded from the next model, shown in Table 2. However, in all the regressions the significance F was zero and R square close to 0, concluding those models were not accurate.
After several attempts of developing the model, it was decided to work with the latest dates (i.e., January, February, and March of 2018), because it could be more relevant for the model as it was more updated. Therefore, the final dataset used to build the model contained 977 observations.
Firstly, there were consider 7 explanatory variables. Even though the model was statistically significant, F<0.05, there was no relationship between price and some of the explanatory variables, h, t, and Car, P-value>0.05, and only 45.8% of the variations in housing price was explained by the independent variables (see Table 3).
Successively, all the variables were tested through different models, until obtaining the final version, illustrated in Table 4.
Table 4 shows the results of the multiple linear regression to predict the housing prices in greater Melbourne. The Significance F (2.7e-251) and P-value of the coefficient of the explanatory variables are less than 0.05, which suggests that there is a significant relationship between housing prices and the explanatory variables, there is a correlation with price.
The following model is proposed to solve the problem:
Where the explanatory variables analysed are as follows:
YearBuilt: Year the property was built.
BuildingArea: Building size.
Landsize: Land size.
h: If the property is h.
Bedroom: number of bedrooms
Bathroom: number of bathrooms
Distance: distance to CBD
Western: If the property is in the western metropolitan region.
Southern: If the property is in the southern metropolitan region.
Northern: If the property is in the northern metropolitan region.
Eastern: If the property is in the eastern metropolitan region.
The accuracy of this model, according to the R, is 0.71, which means that 71.2% of the variations in the prices of the houses is explained by the independent variables. However, the R square also suggests that there are other factors that influence the price (the other 28.8%). Moreover, this predictive model obtained the highest R square adjusted, 0.71, compared to the previous models. R square adjusted to try to balance the model between accuracy and complexity. The explanatory variables that have a higher impact on the price are the region where the property is located, bathrooms, and h, as illustrated in Table 5, how the price varies from region keeping other variables relatively constant.
Based on the analyses and discussion, the recommendations for CBRE Group are as follow:
Adopting this model will help CBRE to provide more accurate price prediction of a property for their customers, when looking into market trends and the general information is known.
This model highlights the difference in prices according to the region and the facilities. Therefore, if one of their customers works in the CBD, is looking for a house for only one person nearby, and has a reduced budget, according to the model will be suggested in the northern and western metropolitan region.
Based on this model, a quote with the difference in price per location could be generated for a customer that wants to invest in a property with certain characteristics, to support their decision-making. Moreover, for customers willing to sell the house an appraisal price can be provided.
In case the company desire to add more variables, this model could be upgraded to analyse other scenarios.
Australian Bureau of Statistics 2021, ‘Residential Property Price Indexes: Eight Capital Cities’, Australian Bureau of Statistics, 3 March, viewed 7 May 2021, <https://www.abs.gov.au/statistics/economy/price-indexes-and-inflation/residential-property-price-indexes-eight-capital-cities/latest-release#key-statistics>.
Bhagat, N, Mohokar, A & Mane, S 2016, ‘House price forecasting using data mining’, International Journal of Computer Applications, vol. 152, no. 2, pp. 23-26.
CBRE n.d. a, Corporate Information, CBRE, viewed 3 May 2021, <https://www.cbre.com.au/about/corporate-information>.
CBRE n.d. b, Business Lines, CBRE, viewed 3 May 2021, <https://www.cbre.com.au/real-estate-services/directory>.
CBRE n.d. c, Advisory & Transaction Services, CBRE, viewed 3 May 2021, <https://www.cbre.com.au/real-estate-services/directory/advisory-and-transaction-services>.
CBRE Residential Projects n.d., About Us, CBRE Residential Projects, viewed 3 May 2021, <https://www.cbresi.com.au/about>.
Kim, J, Seo, D & Chung, Y 2020, ‘An Integrated Methodological Analysis for the Highest Best Use of Big Data-Based Real Estate Development’, Sustainability, vol. 12, no. 3, pp. 1-17.
Li, S, Ye, X, Lee, J, Gong, J & Qin, C 201, ‘Spatiotemporal Analysis of Housing Prices in China: A Big Data Perspective’, Applied Spatial Analysis and Policy, vol. 10, pp. 421–433.
Singh, A, Sharma, A & Dubey, G 2020, ‘Big data analytics predicting real estate prices’, International Journal of System Assurance Engineering and Management, vol. 11, pp. 208–219.
Thomson, J 2020, Real Estate Valuation Services in Australia, report No. OD5453, industry report, IBISWorld, viewed 3 May 2021, IBISWorld database.
Tierney, N 2019, ‘Melbourne housing data’, Zenodo, 22 February, viewed 3 May 2021, <https://zenodo.org/record/2575545#.YIpB-rUzbIW>.