Stock Photo

Big Data and Artificial Intelligence for Retailing

July 2018 | By: Ronald L. Hess Jr. PhD

Estimated reading time: 8 minutes

Key Takeaways

  1. The retail sector is in a unique position to improve and enhance marketing, product development, and customer experience through big data and AI.
  2. Retailers that collect data at the customer, product, location, time, and channel levels have an enormous data source, enabling them to extract valuable insights.
  3. The use of artificial intelligence techniques to improve and optimize existing processes is expected to increase overall revenues for the retail sector by 3.2 to 5.7 percent.

Considerable benefits are afforded retailers that harness the capabilities of artificial intelligence to gain the most from the astonishing velocity of big data that is collected on a daily basis. The exponential improvement in computational power and storage, along with significant advancements in artificial intelligence, have led to an unwavering focus on what can be achieved.

Retailers such as the Gap, H&M and Zara have begun to tap into big data and artificial intelligence to uncover fashion trends. “Today, given the extent of data available, retailers in fashion can be a few steps ahead of the game—for example, they have access to millions of conversations in social media and can be more predictive of what is going to catch on before it does.” “Access to shopper behavior based on weather, mobile data on store visitation . . . can be capitalized on further to change the game.”

Retailers also have begun to utilize artificial intelligence to develop more detailed and effective personalized marketing campaigns. It is not enough to communicate with customers based on a retailer’s ability to estimate intent. Today, retailers need to use artificial intelligence and data about a customer’s previous purchase history, loyalty program information, mobile search data and other data sources to develop effective and highly personalized marketing communication.

Artificial intelligence and deep learning also are being used to drastically improve the linguistic ability and emotional intelligence of chatbots (i.e., computers designed to simulate conversation with customers using voice or text). Using these intelligent analytic techniques, chatbots can learn to comprehend and respond appropriately to intricate customer responses including frustration, confusion, and dissatisfaction. When given access to more data about human responses, chatbots can improve over time and become quite “lifelike” when interacting with customers.

The retail sector is in a unique position to utilize the capabilities of artificial intelligence to take advantage of a wide variety and volume of big data sources that are unavailable to many other companies in the marketplace. This article is intended to provide the reader with a better understanding of the sources and dimensions of big data in the retailing environment and provide a basic explanation of artificial intelligence, machine learning, and deep learning.

Big Data: Everywhere Everyplace

The collection of big data has evolved significantly during the past 25 years. Beginning with the adoption of the World Wide Web (1994–2004), a considerable volume of data was collected about web usage (hyperlinks) and web content (content mining). Adding to this data, the widespread interest in social media (2005–2014) led to data collected on sentiment analysis (text analysis, natural language processing, computational linguistics) and social network analysis (social network structure, connections, and nodes).

Most recently (2015–present), with the introduction of the Internet of Things (IoT), data were collected from sensors and the interrelated communication of a multitude of electronic devices. These devices share a constant source of data in the form of images, audio, and video. Distribution of data from devices is unique because it occurs without any human intervention. The volume of these data has surpassed the data collected from both e-commerce and social media in just a few years. It is estimated that 2.5 quintillion bytes of data per day are shared by internet-connected devices. At this volume, 90 percent of the world’s data was collected in just the past two years.

The data that are collected from the World Wide Web, social media, and internet-connected devices come in a wide variety of formats including structured, unstructured, and semi-structured. Structured data are well defined, mostly quantitative and can easily be stored and retrieved from traditional databases. In contrast, unstructured data, which include text, photos, videos, audio, clickstream and sensor data, are less standardized and more difficult for traditional statistical programs to incorporate and analyze. Finally, semi-structured data conform to the structure required for analyses by traditional statistical programs but do not correspond to the structural requirements of relational databases.

Retailers Hold the Data

The retail sector is in the very privileged position in the marketplace to collect and access a wide variety of structured, unstructured, and semi-structured data. As shown in Exhibit 1, the retail sector has access to data from many important sources:

Flow Chart
  1. Enterprise Sales and Inventory Data: UPC scanner data collected from point-of-sale terminals and data held in enterprise resource planning systems (ERP)
  2. Customer/Household Data: data collected from loyalty programs, bonus card programs, web-presence data from retailer-specific websites, syndicated data sources, social graph and profile information and customer experience data (voice of customer survey data, interview transcripts, sensor data and biofeedback information)
  3. Location-based Data: data from mobile behavior, retail-specific mobile apps, RFID, eye-tracking, in-store layout locations, and environmental data sources

Taken collectively, retailers are in the unique position of collecting data at the customer, product, location, time, and channel levels. This indicates that the volume and variety of data amassed at the retail level grow significantly. How the retail sector manages and extracts valuable insights from this enormous data source represents a significant challenge in the foreseeable future.

Artificial Intelligence: The Next Data Analysis Frontier


Significant advancements in computing power and intelligent analytics have greatly assisted companies with gaining the most from this valuable source of big data. Artificial intelligence is the ability of a machine to conduct cognitive functionality similar to that performed by humans. These abilities include reasoning, learning, adapting to environmental or other factors, problem solving, and creativity. As presented in Exhibit 2, artificial intelligence is a broad term to refer to all techniques with the ability to learn and adapt. Some types of artificial intelligence (machine learning and deep learning) possess a greater capacity to learn and improve.

Machine Learning: Supervised, Unsupervised, and Reinforced Learning

Machine-learning algorithms have the ability to expose patterns in big data and grow increasingly intelligent when exposed to additional data. These algorithms can learn to make educated predictions and provide important recommendations to decision-makers. Machine-learning algorithms learn by processing data rather than through specific programming instruction obtained from human operators. When well trained, these algorithms become more intelligent over time when exposed to additional big data.

Three frequently used types of machine learning include supervised learning, unsupervised learning, and reinforced learning. With supervised learning, the algorithm is supplied with training data (abbreviated source of big data), definitions of the input and output variables and limited feedback from human operators. The objective of supervised learning is to uncover the relationships of input variables to an outcome. Once the algorithm becomes reasonably accurate using the training data, it is provided with the full source of big data. The accuracy of the algorithm increases as it is exposed to updates in the big data. For example, supervised learning algorithms can learn how inputs such as promotion type, promotion timing, product placement, and time of year can forecast retailer revenues during a specific time frame. Supervised learning algorithms include simple neural networks, support vector machine, random forest, AdaBoost, gradient-boosting trees, decision trees, linear and quadratic discriminant analysis, logistic regression, and linear regression, among others.

Unsupervised algorithms are designed to uncover patterns in data. These algorithms are provided an unlabeled or undefined source of big data in order to identify structures or patterns in the big data. These algorithms are often used to identify specific clusters of data that exhibit similar behaviors. For example, unsupervised algorithms can form homogenous clusters of customers who respond differently to various digital marketing and social media communications often used by retailers. Unsupervised learning algorithms include k-means clustering, Gaussian mixture model, hierarchical clustering, and recommender systems.

The final type of machine learning is reinforcement learning. This algorithm becomes increasingly intelligent by altering its actions based on environmental inputs in order to maximize the rewards it receives. For example, the algorithm will take an action on the environment (e.g., action in a game). If this action leads to higher point accumulation in the game (i.e., objective of the game), then the algorithm will be rewarded for the action. The algorithm continues developing actions until the rewards are maximized. Thus, this algorithm learns the optimal set of actions through self-correction over time.

Deep Learning

Deep learning possesses many important advantages over the machine learning algorithms described above. Deep learning has the ability to integrate a wider variety of data sources and requires less human operator intervention with data pre-processing. When provided a significant volume of big data, deep learning algorithms can develop more accurate models and results than traditional machine learning.

Deep learning is based on the development of neural networks that include multiple layers of connected software-based calculators called neurons. The neural networks have the ability to analyze significant sources of input data and learn across each successive layer of data. Thus, a neural network formulates a conclusion about an initial layer of data, learns whether that conclusion was correct, and applies any updated learning when exposed to new data. This process continues with further data exposure.

Deep learning can outperform other techniques for activities that involve facial recognition, image classification, and voice recognition. Two primary deep learning techniques include Convolution Neural Network (CNN) and Recurrent Neural Network (RNN).

Value of Artificial Intelligence

According to a recent report, artificial intelligence is expected to provide $3.5 trillion to $5.8 trillion of annual savings and top-line value across all business sectors. This estimate represents approximately 40 percent of the combined value provided by all analytics techniques (traditional techniques and artificial intelligence techniques).

Most interesting, as shown in Exhibit 3, the value provided specifically to the retail sector from artificial intelligence will outpace that from all other sectors in total value, with 45 percent of its analytics usage derived from these techniques. Only the travel sector is expected to experience a higher rate of usage at 55 percent.Chart

The use of artificial intelligence techniques to improve and optimize existing processes is expected to increase overall revenues for the retail sector by 3.2 to 5.7 percent. The function that will experience the largest impact will be marketing and sales. Specifically, marketing activities such as pricing and promotion, customer service management, customer acquisition and market budget allocation will be improved notably by artificial intelligence.

Challenges with Artificial Intelligence

There are many challenges for retailers that intend to extract the most value from artificial intelligence. First, artificial intelligence requires a significant source of big data. The minimum requirement is 5,000 labeled data points per category, with 10 million required if artificial intelligence is expected to surpass human performance using traditional analytics techniques. Next, artificial intelligence excels when it is exposed to data sources that include a diverse range of data types including images, video, and audio. Without such data, artificial intelligence may not add value beyond traditional analytics techniques.

In addition, artificial intelligence techniques require frequent model updates (called refresh). Due to the velocity with which big data is collected, algorithms must be refreshed frequently to remain accurate given the changing circumstances in the data. Approximately 30 percent of algorithms must be refreshed monthly, while 25 percent need to be refreshed daily. More frequent refresh activity is needed for marketing and sales contexts, as data from these activities are updated quite frequently.

Next, many companies experience difficulty when labeling the training data used for supervised learning algorithms. Such labeling is done manually through direct human operator interaction. Misleading labels can produce less than optimal results by algorithms. As one expert stated, “machines are taught they do not teach themselves.” Finally, a risk of bias in the data can significantly impact the outcomes produced by artificial intelligence. Human operators must clearly understand how training data were collected.

Final Word on A.I.

Artificial intelligence has received considerable attention in the popular press and by a multitude of companies during the past few years. The availability of significant (and growing) sources of big data, advancements in computational power and storage, and further improvements in artificial intelligence and deep learning have made this an ideal time for retailers to implement these important capabilities.

The retail sector possesses unique access to customer and location-based big data that provides them with the opportunity to significantly improve and update many aspects of their business processes. Improvement in developing more personalized marketing communication, executing more effective marketing campaigns, reducing and managing customer churn, providing more helpful and knowledgeable service for the customer, identifying future trends and styles to improve product assortment, and improving the customer experience are but a few areas that can be addressed and improved with artificial intelligence.

Although some challenges remain to gaining the most from big data and artificial intelligence, retailers that effectively implement artificial intelligence can receive vast benefits and assist with gaining competitive advantage in the marketplace.