What is Predictive Analytics?
How we leverage machine learning algorithms to determine who is most likely to sell in the next 12 months.
Last updated
How we leverage machine learning algorithms to determine who is most likely to sell in the next 12 months.
Last updated
Predictive analytics uses data, statistical algorithms, and machine learning techniques to identify the likelihood of future outcomes based on historical data. (Yes, we realize we are nerds :)
In simpler terms, predictive analytics involves taking a large amount of data about past events and using it to make educated guesses about what might happen in the future.
By analyzing patterns in the data, predictive analytics can identify factors that are most likely to lead to certain outcomes. This information can then be used to make decisions and take actions to improve the likelihood of a desired outcome.
For example, in the context of real estate, we use predictive analytics to predict which homeowners in a particular area are most likely to sell their homes in the near future. This information is used by us and agents to target their marketing efforts and increase their chances of making a sale.
Without leveraging smart data, you are wasting up to 80% of your advertising dollars marketing to people who have no intention of selling any time soon.
We use our Predictive Sellers to drive all our marketing campaigns. It allows us to reduce the cost and also better target homeowners who may be thinking of selling in the next 12 months. With the rising cost of Google Ads, Facebook Ads and other websites - it is important for us to have a way to dramatically reduce the audience size we are targeting.
offrs.com - We predict future real estate listings using Predictive Analytics
Watch our video about how we leverage Predictive Analytics to generate Predictive Sellers.
Our approach is unique in that we use multiple approaches in how we build our models. Predictive Analytics models are trained on past data, and unique indicators and variables that drive future sellers. We have refined this over 10 years (Smartzip and offrs).
Our starting point is leveraging multiple datasources (property records, MLS, etc) that provide us text variables (e.g. Number of Beds, Square Footage, Year Built, Last Date Sold). For text variables, the training samples are broken down into the basic building blocks of sentences using stop-words removal, tokenization, and tagging. In this process, the input to the algorithm is word pattern and the output is an ordered set of fundamental entities with values. All the variables are cleaned, and some are transformed to numeric values.
We train classification models using machine-learning algorithms. In this step, the algorithm fits the scores (the y-variable) using the assessor and MLS data variables. For example, the algorithm would fit whether or not a house would be sold within the next 12 months. We use a variety of machine-learning algorithms, including support vector machine, random forests, naïve Bayes, and partial least square regression. To train the ultimate predictive algorithm, we use ensemble methods to combine results from the multiple machine-learning algorithms.
We employ ensemble learning because different algorithms perform differently based on underlying data characteristics, or have varying precision or recall in different locations of the feature vector space. Thus, combining them will result in better classification output, either by reducing variance or reducing bias. This step involves combining the predictions from individual classifiers by weighted-majority voting, unweighted-majority voting, or a more elaborate isotonic regression and choosing the best-performing method in terms of accuracy, precision and recall for each content profile. In our case, we found that a support vector machine algorithm delivered high precision and low recall, while a random forest algorithm delivered high recall, but low precision. By combining these, we developed an improved algorithm that delivers higher precision and recall and higher accuracy. Finally, we assessed the performance of the overall algorithm on the basis of three criteria: accuracy, precision, and recall.