Скачать книгу

can be done by starting from the root node. On the basis of the result, a branch that leads to a child must be followed. The process would be repeated recursively for the time until the child is not a leaf. To examine a class and its corresponding leaf, test cases must be applied.

      b) Genetic Algorithms (GA)

      1 Fitness function

      2 Individuals representation

      3 Genetic algorithms parameters

      For designing an artificial immune system, genetic algorithm-based method can be used. By using this method, a method for smartphone malware detection has been proposed by Bin et al. (Wu et al., 2015). In this approach, static and dynamic signatures of malwares were extracted to obtain the malicious scores of tested samples.

      c) Random Forest

      1 A sample of N cases is arbitrarily selected from the original dataset which represents the training set required for growing the tree.

      2 Out of the M input variables, m variables can be selected arbitrarily. Value of m will be constant at the time of growing the forest.

      3 Maximum possible value can be given to each tree in the forest. There is no requirement of trimming or Pruning of the tree.

      4 To form the random forest, all classification trees can be combined. The problem of overfitting on large dataset can be fixed with the help of random forest. It can also train/ test quickly on complex data set. It can also be referred as Operational Data mining technique.

      Each and every classification tree can be used to cast vote for a class because of its special feature. On the basis of maximum votes assigned to a class, a solution class is built.

      d) Association-rule mining

      Breadth First Search (BFS) with Counting Occurrences: An eminent algorithm in this group is Apriori algorithm. By clipping the candidates with rare subsets and with the help of this algorithm, the downward closure property of an itemset can be utilized. It should be done before counting their support. Two important parameters to be measured at the time of association rule evaluation which is: support and confidence. In BFS, it is possible to do desired optimization by knowing the support values of all subsets of the candidates in advance. The main drawback of the above mentioned is the increment in computational complexity in a rule that has been extracted from a large database. An improved, dispersed and unsecured form of the Apriori algorithm is Fast Distributed Mining (FDM) algorithm (Lee et al., 1999). Organizations are able to use data more competently with the help of advancements in data mining techniques.

      It is possible in Apriori to count the candidates of a cardinality k with the help of a single scan of a large database. Most important limitation of apriori algorithm is to look up the candidates in each transaction. To do the same, a hash tree structure is used (Jacobsan et al., 2014). An extension of Apriori, i.e., Apriori-TID, signifies the current candidate on which each transaction is based, while a raw database is sufficient for a normal Apriori. Apriori and Apriori-TID when combined form Apriori-Hybrid. A prefix-tree is used to fix up the parting that occurs between the processes, counting and candidate generation in Apriori-DIC.

      1 a) Distribution Based

      2 b) Density Based

      3 c) Centroid Based

      4 d) Connection Based or Hierarchical Clustering

      5 e) Recent Clustering Techniques

      a) Distribution-Based Clustering

      A model of clustering in which the date is grouped/fitted in the model on the basis of probability, i.e., in what way it may fit into the same distribution. Thus, the groups formed will be on the basis of either normal distribution or gaussian distribution

      b) Density-Based Clustering

      In

Скачать книгу