What is predictive analytics?
Why is predictive analytics important?
Predictive analytics is important because it lets businesses and organizations make critical decisions based on actual data — predicting probable outcomes — at a scale that was previously impossible. All enterprises survive or fail based on their ability to forecast, plan and operate efficiently while meeting the needs of their customers. Key decisions based on intuition, guesswork and historical information have led companies to lose billions — or fail — by launching new products they thought the public would love.
What are the three types of data analytics?
The three types of data analytics are descriptive, predictive and prescriptive.
- Descriptive analytics uses data aggregation and data mining of historical data to answer the question, “What happened?” Descriptive analytics is essentially the same as the science of statistics, providing information without analysis or insight.
- Predictive analytics identifies patterns in previous data to answer the question, “What might happen next?”
- Prescriptive analytics, a relatively new term, describes a type of analytics designed to answer the question, “What do we do now?” In prescriptive analytics, the outcome is not just a prediction or forecast, but recommendations for the best course of action.
What are the outcomes of predictive analytics?
Nearly every human endeavor in the 21st century generates data, so nearly every business, organization or industry can get value from predictive analytics. Here are a few of the hundreds of potential predictive analytics use cases.
Predictive analytics in banking and financial services:
Predictive analytics is valuable across the spectrum of banking and financial service activities, from assessing risk to maximizing customer relationships. Predictive analytics are used to:
- Prevent credit card fraud by flagging unusual transactions.
- Scoring credit and deciding whether to approve a loan or credit applications.
- Predict customer churn, allowing banks to reach out right before a customer is likely to switch institutions.
Predictive analytics in retail:
Retailers, whether online or brick-and-mortar, need to manage inventory and logistics. Predictive analytics tools let retailers correlate huge amounts of information — historical sales data, buying habits, geographical preferences, even weather data — to optimize performance.
- Sales and logistical data can ensure that retailers have enough product in warehouses, and the right merchandise in stores at the right time.
- Customer data fuels customized recommendations and promotions to individual buyers. Better targeting built on real data can help retailers create ads and promotions that shoppers are more likely to respond to.
- Timing of sales and promotion becomes a science, with predictive analytics bringing together customer, inventory, competitor and historical sales data to pick the perfect time to lower (or raise) prices.
Predictive analytics in healthcare:
Drawing from global disease statistics, drug interactions, individual patient histories and more, predictive analytics can help medical professionals provide better care and run more efficient and effective practices and hospitals.
- A 2018 study conducted by the Mental Health Research Network and Kaiser Permanente researchers used predictive analytics to correlate patients’ electronic health records (EHR) to their answers on a depression questionnaire and were able to identify those with an elevated risk of suicide.
- The University of Pennsylvania Health System developed a predictive tool that, during its trial period, identified patients headed for severe sepsis or shock a full 12 hours before the onset of the illness.
- Researchers from Duke University found that applying predictive analytics to a clinic’s historical appointment data allowed them to identify potential no-shows and late cancellations 67 percent more accurately than existing models, thusly saving time and resources.
Predictive analytics in manufacturing:
In a modern, highly automated factory, predictive analytics tools can be used to monitor and optimize each step in the manufacturing process, including design, purchasing, production, quality control, inventory management, delivery and more.
- Supply chain data and sales predictions, for instance, can help make more accurate purchasing decisions, ensuring that expensive raw materials aren’t purchased before they’re necessary. The same data can also ensure that manufacturing schedules are adjusted to meet consumer demand.
- Predictive analytics can reduce shipping and transportation costs by considering all the factors involved in getting manufactured goods from one place to another in the most efficient manner.
- Using predictive analytics with machine data can help track and compare the maintenance status of a factory’s machines and equipment, predicting when a particular machine is likely to fail.
Predictive analytics in marketing:
Consumers are bombarded with advertising and marketing everywhere they look, making it harder than ever to attract and retain their attention.
- Predictive analytics tools can help segment marketing prospects more effectively, presenting ads on websites and social media that relate to their interest. More sophisticated predictive marketing tools can identify “intent to buy,” by analyzing publicly available data and information from proprietary databases to find people whose data matches that of an ideal consumer.
- Marketers also use predictive analytics for lead scoring, which uses historical data, intent data and other data about prospective customers to determine how likely they are to buy, and therefore how they should be contacted and with what information.
Predictive analytics and big data
You’ve no doubt heard plenty of statistics about the growth of data. According to a 2018 study by market intelligence firm IDC, worldwide data creation will grow to 163 zettabytes (ZB) by 2025 — that’s 10 times the amount of data produced in 2017. The Internet of Things (IoT) is a key driver. In 2006, there were around 2 billion connected devices in the world, according to a report from Intel. By 2020, they project there will be 200 billion. Each one of those devices creates data that can be used to provide better customer service, optimize networks, target marketing messages more effectively, increase data security and about a dozen additional uses.
The value of predictive analytics continues to increase alongside the growth of data. The sheer volume of information generated every day by billions of people, devices and networks creates both challenges and opportunities that cannot possibly be addressed by the human brain alone. Predictive analytics is a huge step toward realizing the promise of big data, offering an unprecedented ability to analyze data and make predictions about future outcomes.
Predictive analytics and other emerging technologies
Predictive analytics is often conflated with other developing data and analytics technologies. Three technologies often confused with predictive analytics are machine learning, predictive modeling, and data mining.
- Is predictive analytics the same as machine learning? Predictive analytics is not the same as machine learning. Machine learning, which allows computers to learn from their own activities, is one of the elements that can be applied as part of the predictive analytics process.
- Is predictive analytics the same as predictive modeling? Predictive analytics is not the same as predictive modeling. Predictive modeling is a technique used in predictive analytics in which data is applied to a particular algorithmic mathematical process (the model) to determine an outcome.
- Is predictive analytics the same as data mining? Predictive analytics is not the same as data mining. Data mining is the process of examining and analyzing large amounts of data to identify patterns and relationships. Making predictions or forecasts based on those data patterns is the job of predictive analytics.
Predictive Analytics and Modeling
What’s the difference between an algorithm and a predictive model?
Algorithms are the mathematical basis of predictive analytics. They are the series of steps, like a recipe, executed to achieve a result or solution. Models define the way the algorithms are applied to solve a particular problem. The model is the framework that defines the questions, and the variables considered in answering them. The algorithms are the steps used to weigh variables and arrive at answers.
A quick web search will reveal that many people use the terms “algorithm” and “predictive model” interchangeably. The word “classifier” is also used in the same context. Again, while the terminology is fluid, “classifier” is generally used to indicate an algorithm specifically designed for classification.
What types of models are used in predictive analytics?
The most common models used in predictive analytics are classification algorithms and regression algorithms.
- Classification algorithms sort (or classify) data by category. Is this person female or male? Is this email spam or not spam?
- Regression algorithms are used to predict a numerical outcome. Will the price go up or down? How many customers could a new business expect?
What are the most common models used in predictive analytics?
The most common models used in predictive analytics include linear regression, logistic regression, linear discriminant analysis, decision trees, naive bayes, K-nearest neighbors, support vector machines, random forest and boosting. A more complete description of each is included below.
Data scientists use a variety of predictive models based on the type of outcome they are hoping to achieve. The math behind each algorithm is complex and beyond the scope of this article, but here are a few of the most popular predictive analytics algorithms and a brief description of how they can be used.
Predictive analytics in banking and financial services: Predictive analytics is valuable across the spectrum of banking and financial service activities, from assessing risk to maximizing customer relationships. Predictive analytics are used to access the following:
- Linear regression. This compares a dependent variable with one or more independent variables. It is one of the most common algorithms, often used for predicting an outcome or forecasting an effect, and determining which variables have the most impact. For instance, a linear regression model would be used to answer these questions:
- What is the relationship between how many sales leads a marketing campaign generates to the amount of money spent promoting the campaign?
- How many more leads could be captured if the promotional budget was increased by, say, $10,000?
- How much will the cost of raw materials used in manufacturing increase in a year?
- Logistic regression. This compares a dependent variable with one or more independent variables to determine the probability of a particular outcome. Logistic regression could be used to predict how likely a person is to develop diabetes based on their age, sex, body mass, blood test results and family history, or which candidate in an election will most appeal to people with a particular combination of demographic information, such as age, race, income and location.
- Linear discriminant analysis is used for classification. A typical example might be, “Based on answers to a survey, which group of customers is more likely to buy a particular product?”
- Decision trees are binary, relying on yes/no questions to arrive at the outcome. A decision tree could be used to sort applicants for a job, for instance. Does the applicant have a college degree? If no, does the applicant have alternative qualifications? If yes, does the applicant have more than three years experience? If yes, does the applicant have a defined set of skills and experience?
- Random forest is a widely-used algorithm for both classification and regression. It is an ensemble technique (a combination of multiple algorithms) that combines multiple decision trees to get more accurate results than a single decision tree.
- Naive Bayes is a simple but powerful algorithm often used for text categorization, including spam filters. A Naive Bayes spam filter correlates the words in an email with spam and non-spam emails to determine the probability of the email in question being spam.
- K-nearest neighbors (KNN) is used to predict the characteristics of a given data point based on its proximity to other data points. KNN could be used in credit scoring, for example. A loan or credit card applicant with a particular set of financial details would likely have a similar credit rating to other people with the same financial details.
- Support vector machines (SVM) can be used for classification or regression problems. An SVM algorithm uses training examples (known data grouped into categories by similarity) to assign new examples to the appropriate category. SVMs have proven effective for image classification (“Is this a tree or a person?”), providing more accurate results than previous methods.
- Boosting is an ensemble technique designed to increase accuracy. A model is created using training data, then a second model is created to correct the errors of the first model, then a third to correct the errors of the second, and so on until the desired outcome is achieved.
- AdaBoost is considered the first successful boosting algorithm, and the basis on which subsequent models have been built.
What are neural networks?
Neural networks are mathematical models designed to approximate the function of the human brain. Neural networks are effective in complex pattern recognition problems and finding nonlinear relationships in data, where one or more variables are unknown. Self-driving cars rely on neural networks, because of the enormous amount of data that must be analyzed instantaneously to make driving decisions.
What is the difference between data analytics and data analysis?
Data analysis describes the process of analyzing data and drawing conclusions from it. It could also be described as the job performed by a data analyst. Data analytics is an umbrella term for the various techniques used to identify, categorize and organize data to make it ready for analysis.
How do you find the best predictive analytics software?
The best predictive analytics software is the one that most successfully meets your specific needs and budget. There are as many different types of predictive analytics tools, including:
- business intelligence software
- advanced statistical analysis software (both open source and proprietary)
- predictive customer analytics
- predictive marketing software
- predictive lead scoring
- predictive IT monitoring software
- industry-specific tools for supply chain management, healthcare, manufacturing, logistics and many more
As the discipline has become better-known and more widespread, more software vendors are incorporating predictive analytics, or versions of it, into their tools. The challenge for the buyer is to determine which tools provide actual predictive analytics, which ones use only basic algorithmic functions, and which have just appropriated the term.
Moreover, many software platforms (including Splunk) incorporate predictive analytics into various elements of their solution. The portfolio of offerings may include some solutions that include predictive analytics and others that perform functions where predictive analytics isn’t required. In other words, just because a vendor says they have predictive analytics, they might not, or might incorporate it only into certain products.
How do you get started with predictive analytics?
The best way to get started with predictive analytics is to create a plan to understand what problems you can and can’t solve, define the most critical problems to tackle, identify the gaps in your skills and technology, then run a pilot project.
- Understand what you can and can’t solve. Predictive analytics has multiple benefits, but it has its limits. It can’t replace the skills, judgment and experience of skilled professionals. Predictive analytics only works when there is enough data to provide useful output.
- Define the most critical problems to solve. You won’t get a usable outcome unless you know exactly what problems you are trying to solve. While it may be possible to apply predictive analytics indiscriminately to large datasets and hope to identify problems in the output, it’s far more effective to define the problem in the most precise way possible.
- Identify gaps in skills and technology. Software solutions make the practice of predictive analytics easier, but they still require expertise to use them. It’s critically important to have the people, infrastructure and tools necessary to identify and prepare the data you’ll need in your analysis.
- Conduct a pilot project. Now that you’ve answered all these questions, put the information to use by conducting a small pilot project. Pick a problem that you know other people agree to be important. Determine the outcome you would like to achieve, and what metrics you will use to prove it. Do you want to reduce process time? By how much? Will you measure it in seconds, or as a percentage? Do you have the baseline data you need? Your pilot will be much more effective in proving your case for predictive analytics if you can state the outcome quickly and memorably and quantify the value. “We reduced process time by 32%, which resulted in an average savings of 18 hours per week per employee” sounds a lot better than “We optimized our process time significantly.”
Predictive analytics is the future — and the present
Predictive analytics is no longer a new science; it’s a practical tool that businesses and organizations of all sizes are using to solve their biggest business problems. No matter where you are in your predictive analytics journey, from exploring your options to fine-tuning an existing implementation, it’s vital for you to keep on top of the changes in this fast-moving discipline.
Organizations need an approach that transforms previously complex and chaotic data into an opportunity instead of a risk or an impediment — and that’s where process mining comes in. Above all else, it represents a better way to analyze and correlate disparate and seemingly unrelated information, identify weaknesses and quickly take action. Rather than wasting hours, days or weeks of your time tackling process dysfunction on spreadsheets, adopting the right process mining tool will enable you to use the data you have more effectively and drive more business value. And while tackling the data chaos in your organization might seem like a daunting task, putting the wheels into motion now will reap a multitude of rewards down the road.
To find out more about predictive analytics and the ways that it can be applied to your IT infrastructure, download The Power of Predictive IT, from Harvard Business Review and Splunk.