Introduction to Linear Regression Using Spreadsheets with Real Estate Data
Editor’s note: Roberto Reif is a speaker for ODSC East this April 23–25. Be sure to check out his talk, “Introduction to Linear Regression using Spreadsheets with Real Estate Data,” there!
In the real estate industry, predicting property sales prices presents a significant challenge due to several influencing factors. These factors include square footage, number of bathrooms, year the property was built, year of sale, quality of the schools in the area, crime rate, the recentness of the kitchen and bathroom renovations, and more.
Get your ODSC East 2024 pass today!
In-Person and Virtual Conference
April 23rd to 25th, 2024
Join us for a deep dive into the latest data science and AI trends, tools, and techniques, from LLMs to data analytics and from machine learning to responsible AI.
A method for estimating the sales price for a property is using linear regression models. These models predict a continuous variable (e.g. price of sale) based on one or more independent variables (e.g. square feet). The equation for a linear regression model that uses only one variable, known as a simple linear regression model, is expressed by:
Price=m×SquareFeet+b
Here, the m denotes the slope for the independent variable (square feet) and b represents the intercept. Building such a model involves collecting historical data of sold properties, determining optimal values for m and b that minimize the price prediction errors, and then applying the model to predict the sales price of new properties.
Below is a table that shows a sample dataset displaying various property variables alongside their respective sale prices. Each row represents a property that has been sold.
TABLE 1: Example of a dataset containing several properties with the year built, the number of square feet, the number of baths, the property type, and the sales price.
In this blog the focus is on a simple linear regression model, using only the square footage, as depicted in Figure 1. Each blue dot represents a property from our dataset (a row in Table 1), displaying a general trend where higher square footage correlates with higher sale prices. The reason the dots do not follow a perfect straight line is because we have not considered other variables in the model such as the number of bathrooms or year built.
FIGURE 1: Price of sale as a function of the square footage
In Figure 1 we have added a straight line to indicate the line of best fit to the data. The equation is presented in the graph and should be read as the price of a property is approximated by $458 times the square footage of the property plus an offset of $143,427. In simple terms, each additional square foot adds $458 to the price of the property. Also, a property with zero square feet will cost $143,427. Although this last statement does not make practical sense, it helps the model make better predictions. For instance, using the model, a 2,000 square feet property would be predicted to sell for:
Price=$458×2,000+$143,427=$1,059,427
To determine the value for the slope of $458 and the intercept of $143,427 various methods such as the least-squares method or gradient descent can be employed. Spreadsheet tools such as Excel and Google Sheets have implemented functions such as LINEST for this purpose. The LINEST function receives two parameters as input, the known values of y (sales price) and the known values of x (square feet). It is important to note that there are other parameters in x that can be included such as number of bathrooms, year built, etc. For this example, the expression in the spreadsheet tool would be written like this:
=LINEST(PRICE,SQFT)
The output of this expression is shown below.
FIGURE 2: Results from the LINEST Function displaying the slope and intercept
To learn more about linear regression and its application in real estate data, come check out this Workshop at ODSC East 2024. In this hands-on workshop, you will work with spreadsheet tools (Excel or Google Sheets), to build simple and multiple linear regression models using real estate data and make predictions that minimize the determined price error. For more courses in data science and data analytics, check out ScholarU.
2024 Data Engineering Summit tickets available now!
In-Person Data Engineering Conference
April 23rd to 24th, 2024 — Boston, MA
At our second annual Data Engineering Summit, Ai+ and ODSC are partnering to bring together the leading experts in data engineering and thousands of practitioners to explore different strategies for making data actionable.
About the Author:
Roberto is the founder of ScholarU (www.scholar-u.com), a company focused on data and business analytics training and was the founder of Analysis Consulting LLC, a company focused on data science and data visualization. He was the Head and Executive Director of Data Science and Strategy at Metis, a company focused on data science and analytics training. Previously, he led the signal processing team at Sensoria Inc., did research at the University of Washington, and was a Program Manager at Microsoft. He has worked in applications in healthcare, the Internet of Things, and business intelligence markets. He received a PhD in Biomedical Engineering from Boston University, and has co-authored several scientific publications, book chapters, and patents.
Originally posted on OpenDataScience.com
Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Interested in attending an ODSC event? Learn more about our upcoming events here.