Скачать книгу

      Naive Bayes algorithm: This is a classification technique based on Bayes’ theorem with an assumption of independence among predictors. In simple terms, a naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

      Example 2.1

      An animal may be considered to be a tiger if it has four legs, weighs about 250 pounds, and has yellow fur with black strips. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this animal is a tiger, and that is why it is known as “Naive.”

      Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods. Bayes’ theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c). Here we start with

      (2.6)upper P left-parenthesis c bar x right-parenthesis equals upper P left-parenthesis x bar c right-parenthesis upper P left-parenthesis c right-parenthesis slash upper P left-parenthesis x right-parenthesis

      where

       P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).

       P(c) is the prior probability of the class.

       P(x|c) is the likelihood, which is the probability of a predictor given the class.

       P(x) is the prior probability of the predictor.

      Design Example 2.1

      Suppose we observe a street guitar player who plays different types of music, say, jazz, rock, or country. Passersby leave a tip in a box in front of him depending on whether or not they like what he is playing. The player chooses to play different songs independent of the tips he receives. Below we have a training dataset of song and the corresponding target variable “tip” (which suggests possibilities of getting a tip for a given song). Now, we need to classify whether player will get a tip or not based on the song he is playing. Let us follow the steps involved in this task.

      1 Convert the dataset into a frequency table Data TablesongjazzrockcountryjazzjazzrockcountrycountryjazztipnoyesyesyesyesyesnonoyescountryjazzrockrockcountryyesnoyesyesnoFrequency Tablesongnoyesrock4country32jazz23sum59

      2 Create a Likelihood table by finding the probabilities, for example, rock probability = 0.29 and probability of getting a tip is 0.64.Likelihood Tablesongnoyesrock4=4/140.29country32=5/140.36jazz23=5/140.36sum59=5/14=9/140.360.64

      3 Now, use the Naive Bayesian equation to calculate the posterior probability for each class. The class with the highest posterior probability is the outcome of prediction.

      In our case, the player will get a tip if he plays jazz. Is this statement correct? We can solve it using the above method of posterior probability.

upper P left-parenthesis italic y e s bar italic jazz right-parenthesis equals upper P left-parenthesis italic jazz bar italic y e s right-parenthesis upper P left-parenthesis italic y e s right-parenthesis slash upper P left-parenthesis italic jazz right-parenthesis upper P left-parenthesis italic jazz bar italic y e s right-parenthesis equals italic 3 slash italic 9 equals italic 0.33 comma upper P left-parenthesis italic jazz right-parenthesis equals italic 5 slash italic 14 equals italic 0.36 comma upper P left-parenthesis italic y e s right-parenthesis equals italic 9 slash italic 14 equals italic 0.64 upper P left-parenthesis italic y e s bar italic jazz right-parenthesis equals italic 0.33 asterisk italic 0.64 slash italic 0.36 equals italic 0.60 comma

      which has a relatively high probability. On the other hand, if he plays rock we have

upper P left-parenthesis italic y e s bar italic rock right-parenthesis equals upper P left-parenthesis italic rock bar italic y e s right-parenthesis upper P left-parenthesis italic y e s right-parenthesis slash upper P left-parenthesis italic rock right-parenthesis upper P left-parenthesis italic rock bar italic y e s right-parenthesis equals italic 4 slash italic 9 equals italic 0.44 comma upper P left-parenthesis italic rock right-parenthesis equals italic 4 slash italic 14 equals italic 0.29 comma upper P left-parenthesis italic y e s right-parenthesis equals italic 9 slash italic 14 equals italic 0.64 upper P left-parenthesis italic y e s bar italic rock right-parenthesis equals italic 0.44 asterisk italic 0.64 slash italic 0.29 equals italic 0.97 period

       R Code for Naive Bayes

       require(e1071) #Holds the Naive Bayes Classifier Train <- read.csv(file.choose()) Test <- read.csv(file.choose()) #Make sure the target variable is of a two-class classification problem only levels(Train$Item:Fat_Content) model <- naiveBayes(Item:Fat_Content~., data = Train) class(model) pred <- predict(model,Test) table(pred)

      Nearest neighbor algorithms: These are among the “simplest” supervised ML algorithms and have been well studied in the field of pattern recognition over the last century. They might not be as popular as they once were, but they are still widely used in practice, and we recommend that the reader at least consider the k‐nearest neighbor algorithm in classification projects as a predictive performance benchmark when trying to develop more sophisticated models. In this section, we will primarily talk about two different algorithms, the nearest neighbor (NN) algorithm and the k‐nearest neighbor (kNN) algorithm. NN is just a special case of kNN, where k = 1. To avoid making this text unnecessarily convoluted, we will only use the abbreviation NN if we talk about concepts that do not apply to kNN in general. Otherwise, we will use kNN to refer to NN algorithms in general, regardless of the value of k.

      kNN is an algorithm for supervised learning that simply stores the labeled training examples,left pointing angle normal x Superscript left-bracket i right-bracket Baseline comma y Superscript left-bracket i right-bracket Baseline right pointing angle element-of script upper D left-parenthesis bar script upper D bar equals n right-parenthesis, during the training phase and postpones the processing of the training examples until the phase of making predictions. Again, the training consists literally of just storing the training data.

      Then, to make a prediction (class label or continuous target), the kNN algorithms find the k nearest neighbors of a query point and compute the class label (classification) or continuous target (regression) based on the k nearest (most “similar”) points. The overall idea is that instead of approximating the target function f(x) = y globally, during each prediction, kNN approximates the target function locally. In practice, it is easier to learn to approximate a function locally than globally.

      Example 2.2

Скачать книгу