Скачать книгу

effort prediction, may have use as well since useful insights may be derived. For instance, from the predictive model, it may be found what the exact impact is of including more or less senior and junior programmers in a project team on the required effort to complete the project, allowing the team composition to be optimized as a function of project characteristics.

In this book, several versatile and powerful profit-driven approaches are discussed. These approaches facilitate the adoption of a value-centric business perspective toward analytics in order to boost the returns. Table 1.6 provides an overview of the structure of the book. First, we lay the foundation by providing a general introduction to analytics in Chapter 2, and by discussing the most important and popular business applications in detail in Chapter 3.

Table 1.6 Outline of the Book

      Chapter 4 discusses approaches toward uplift modeling, which in essence is about distilling or estimating the net effect of a decision and then contrasting the expected result for alternative scenarios. This allows, for instance, the optimization of marketing efforts by customizing the contact channel and the format of the incentive for the response to the campaign to be maximal in terms of returns being generated. Standard analytical approaches may be adopted to develop uplift models. However, specialized approaches tuned toward the particular problem characteristics of uplift modeling have also been developed, and they are discussed in Chapter 4.

      As such, Chapter 4 forms a bridge to Chapter 5 of the book, which concentrates on various advanced analytical approaches that can be adopted for developing profit-driven models by allowing us to account for profit when learning or applying a predictive or descriptive model. Profit-driven predictive analytics for classification and regression are discussed in the first part of Chapter 5, whereas the second part focuses on descriptive analytics and introduces profit-oriented segmentation and association analysis.

      Chapter 6 subsequently focuses on approaches that are tuned toward a business-oriented evaluation of predictive models – for example, in terms of profits. Note that traditional statistical measures, when applied to customer churn prediction models, for instance, do not differentiate among incorrectly predicted or classified customers, whereas it definitely makes sense from a business point of view to account for the value of the customers when evaluating a model. For instance, incorrectly predicting a customer who is about to churn with a high value represents a higher loss or cost than not detecting a customer with a low value who is about to churn. Both, however, are accounted for equally by nonbusiness and, more specifically, non-profit-oriented evaluation measures. Both Chapters 4 and 6 allow using standard analytical approaches as discussed in Chapter 2, with the aim to maximize profitability by adopting, respectively, a profit-centric setup or profit-driven evaluation. The particular business application of the model will appear to be an important factor to account for in maximizing profitability.

      Finally, Chapter 7 concludes the book by adopting a broader perspective toward the use of analytics in an organization by looking into the economic impact, as well as by zooming into some practical concerns related to the development, implementation, and operation of analytics within an organization.

      ANALYTICS PROCESS MODEL

Figure 1.1 provides a high-level overview of the analytics process model (Hand, Mannila, and Smyth 2001; Tan, Steinbach, and Kumar 2005; Han and Kamber 2011; Baesens 2014). This model defines the subsequent steps in the development, implementation, and operation of analytics within an organization.

Figure 1.1 The analytics process model.

      (Baesens 2014)

      As a first step, a thorough definition of the business problem to be addressed is needed. The objective of applying analytics needs to be unambiguously defined. Some examples are: customer segmentation of a mortgage portfolio, retention modeling for a postpaid Telco subscription, or fraud detection for credit-cards. Defining the perimeter of the analytical modeling exercise requires a close collaboration between the data scientists and business experts. Both parties need to agree on a set of key concepts; these may include how we define a customer, transaction, churn, or fraud. Whereas this may seem self-evident, it appears to be a crucial success factor to make sure a common understanding of the goal and some key concepts is agreed on by all involved stakeholders.

      Next, all source data that could be of potential interest need to be identified. This is a very important step as data are the key ingredient to any analytical exercise and the selection of data will have a deterministic impact on the analytical models that will be built in a subsequent step. The golden rule here is: the more data, the better! The analytical model itself will later decide which data are relevant and which are not for the task at hand. All data will then be gathered and consolidated in a staging area which could be, for example, a data warehouse, data mart, or even a simple spreadsheet file. Some basic exploratory data analysis can then be considered using for instance OLAP facilities for multidimensional analysis (e.g., roll-up, drill down, slicing and dicing). This will be followed by a data-cleaning step to get rid of all inconsistencies such as missing values, outliers and duplicate data. Additional transformations may also be considered such as binning, alphanumeric to numeric coding, geographical aggregation, to name a few, as well as deriving additional characteristics that are typically called features from the raw data. A simple example concerns the derivation of the age from the birth date; yet more complex examples are provided in Chapter 3.

      In the analytics step, an analytical model will be estimated on the preprocessed and transformed data. Depending on the business objective and the exact task at hand, a particular analytical technique will be selected and implemented by the data scientist. In Table 1.1, an overview was provided of various tasks and types of analytics. Alternatively, one may consider the various types of analytics listed in Table 1.1 to be the basic building blocks or solution components that a data scientist employs to solve the problem at hand. In other words, the business problem needs to be reformulated in terms of the available tools enumerated in Table 1.1.

      Finally, once the results are obtained, they will be interpreted and evaluated by the business experts. Results may be clusters, rules, patterns, or relations, among others, all of which will be called analytical models resulting from applying analytics. Trivial patterns (e.g., an association rule is found stating that spaghetti and spaghetti sauce are often purchased together) that may be detected by the analytical model are interesting as they help to validate the model. But of course, the key issue is to find the unknown yet interesting and actionable patterns (sometimes also referred to as knowledge diamonds) that can provide new insights into your data that can then be translated into new profit opportunities. Before putting the resulting model or patterns into operation, an important evaluation step is to consider the actual returns or profits that will be generated, and to compare these to a relevant base scenario such as a do-nothing decision or a change-nothing decision. In the next section, an overview of various evaluation criteria is provided; these are discussed to validate analytical models.

      Once the analytical model has been appropriately validated and approved, it can be put into production as an analytics application (e.g., decision support system, scoring engine). Important considerations here are how to represent

Скачать книгу