Скачать книгу

such as customers, transactions, accounts, or citizens. The columns are typically referred to as (explanatory or predictor) variables, characteristics, attributes, predictors, inputs, dimensions, effects, or features. The columns contain information on a particular entity as represented by a row in the table. In Table 1.4, the second column represents the age of a customer, the third column the postal code, and so on. In this book we consistently use the terms observation and variable (and sometimes more specifically, explanatory, predictor, or target variable).

Table 1.4 Structured Dataset

      Because of the structure that is present in the dataset in Table 1.4 and the well-defined meaning of rows and columns, it is much easier to analyze such a structured dataset compared to analyzing unstructured data such as text, video, or networks, to name a few. Specialized techniques exist that facilitate analysis of unstructured data – for instance, text analytics with applications such as sentiment analysis, video analytics that can be applied for face recognition and incident detection, and network analytics with applications such as community mining and relational learning (see Chapter 2). Given the rough estimate that over 90 % of all data are unstructured, clearly there is a large potential for these types of analytics to be applied in business.

      However, due to the inherent complexity of analyzing unstructured data, as well as because of the often-significant development costs that only appear to pay off in settings where adopting these techniques significantly adds to the easier-to-apply structured analytics, currently we see relatively few applications in business being developed and implemented. In this book, we therefore focus on analytics for analyzing structured data, and more specifically the subset listed in Table 1.1. For unstructured analytics, one may refer to the specialized literature (Elder IV and Thomas 2012; Chakraborty, Murali, and Satish 2013; Coussement 2014; Verbeke, Martens and Baesens 2014; Baesens, Van Vlasselaer, and Verbeke 2015).

      PROFIT-DRIVEN BUSINESS ANALYTICS

      The premise of this book is that analytics is to be adopted in business for better decision making– “better” meaning optimal in terms of maximizing the net profits, returns, payoff, or value resulting from the decisions that are made based on insights obtained from data by applying analytics. The incurred returns may stem from a gain in efficiency, lower costs or losses, and additional sales, among others. The decision level at which analytics is typically adopted is the operational level, where many customized decisions are to be made that are similar and granular in nature. High-level, ad hoc decision making at strategic and tactical levels in organizations also may benefit from analytics, but expectedly to a much lesser extent.

The decisions involved in developing a business strategy are highly complex in nature and do not match the elementary tasks enlisted in Table 1.1. A higher-level AI would be required for such purpose, which is not yet at our disposal. At the operational level, however, there are many simple decisions to be made, which exactly match with the tasks listed in Table 1.1. This is not surprising, since these approaches have often been developed with a specific application in mind. In Table 1.5, we provide a selection of example applications, most of which will be elaborated on in detail in Chapter 3.

Table 1.5 Examples of Business Decisions Matching Analytics

      Analytics facilitates optimization of the fine granular decision-making activities listed in Table 1.5, leading to lower costs or losses and higher revenues and profits. The level of optimization depends on the accuracy and validity of the predictions, estimates, or patterns derived from the data. Additionally, as we stress in this book, the quality of data-driven decision making depends on the extent to which the actual use of the predictions, estimates, or patterns is accounted for in developing and applying analytical approaches. We argue that the actual goal, which in a business setting is to generate profits, should be central when applying analytics in order to further increase the return on analytics. For this, we need to adopt what we call profit-driven analytics. These are adapted techniques specifically configured for use in a business context.

      Example

      The following example highlights the tangible difference between a statistical approach to analytics and a profit-driven approach. Table 1.5 already indicated the use of analytics and, more specifically, classification techniques for predicting which customers are about to churn. Having such knowledge allows us to decide which customers are to be targeted in a retention campaign, thereby increasing the efficiency and returns of that campaign when compared to randomly or intuitively selecting customers. By offering a financial incentive to customers that are likely to churn – for instance, a temporary reduction of the monthly fee – they may be retained. Actively retaining customers has been shown by various studies to be much cheaper than acquiring new customers to replace those who defect (Athanassopoulos 2000; Bhattacharya 1998).

      It needs to be noted, however, that not every customer generates the same amount of revenues and therefore represents the same value to a company. Hence, it is much more important to detect churn for the most valuable customers. In a basic customer churn prediction setup, which adopts what we call a statistical perspective, no differentiation is made between high-value and low-value customers when learning a classification model to detect future churn. However, when analyzing data and learning a classification model, it should be taken into account that missing a high-value churner is much costlier than missing a low-value churner. The aim of this would be to steer or tune the resulting predictive model so it accounts for value, and consequently for its actual end-use in a business context.

      An additional difference between the statistical and business perspectives toward adopting classification and regression modeling concerns the difference between, respectively, explaining and predicting (Breiman 2001; Shmueli and Koppius 2011). The aim of estimating a model may be either of these two goals:

      1. To establish the relation or detect dependencies between characteristics or independent variables and an observed dependent target variable(s) or outcome value.

      2. To estimate or predict the unobserved or future value of the target variable as a function of the independent variables.

      For instance, in a medical setting, the purpose of analyzing data may be to establish the impact of smoking behavior on the life expectancy of an individual. A regression model may be estimated that explains the observed age at death of a number of subjects in terms of characteristics such as gender and number of years that the subject smoked. Such a model will establish or quantify the impact or relation between each characteristic and the observed outcome, and allows for testing the statistical significance of the impact and measuring the uncertainty of the result (Cao 2016; Peto, Whitlock, and Jha 2010).

      A clear distinction exists with estimating a regression model for, as an example, software effort prediction, as introduced in Table 1.5. In such applications where the aim is mainly to predict, essentially we are not interested in what drivers explain how much effort it will take to develop new software, although this may be a useful side result. Instead we mainly wish to predict as accurately as possible the effort that will be required for completing a project. Since the model's main use will be to produce an estimate allowing cost projection and planning, it is the exactness or accuracy of the prediction and the size of the errors that matters, rather than the exact relation between the effort and characteristics of the project.

      Typically, in a business setting, the aim is to predict in order to facilitate improved or automated decision making. Explaining, as indicated for the case of

Скачать книгу