Скачать книгу

ML ML is an AI discipline that allows machines to learn from previous data or experiences without having to be explicitly programmed. ANN ANN depends on algorithms resembling the human brain. DL DL algorithms automatically build a hierarchy of data representations using the low- and high-level features.

      1.3.1 Machine Learning

Schematic illustration of the types of machine learning.

       1.3.1.1 Data Pre-processing

      It is a process of converting raw data into a usable and efficient format.

       1.3.1.2 Feature Extraction

      Before training a model, most applications need first transforming the data into a new representation. Applying pre-processing modifications to input data before presenting it to a network is almost always helpful, and the choice of pre-processing will be one of the most important variables in determining the final system’s performance. The reduction of the dimensionality of the input data is another key method in which network performance can be enhanced, sometimes dramatically. To produce inputs for the network, dimensionality reductions entail creating linear or nonlinear combinations of the original variables. Feature extraction is the process of creating such input combinations, which are frequently referred to as features. The main motivation for dimensionality reduction is to help mitigate the worst impacts of high dimensionality.

       1.3.1.3 Working With Data Sets

      The most popular method is to split the original data into two or more data sets at random or using statistical approaches. A portion of the data is used to train the model, whereas a second subset is used to assess the model’s accuracy. It is vital to remember that while in training mode, the model never sees the test data. That is, it never uses the test data to learn or alter its weights. The training data is a set of data that represent the data that the ML will consume to answer the problem it was created to tackle. In certain circumstances, the training data have been labeled—that is, it has been “tagged” with features and classification labels that the model will need to recognize. The model will have to extract such features and group them based on their similarity if the data is unlabeled. To improve the generalization capability of the model, the data set can be divided into three sets according to their standard deviation: training sets, validation sets, and testing sets. The validation set is used to verify the network’s performance during the training phase, which in turn is useful to determine the best network setup and related parameters. Furthermore, a validation error is useful to avoid overfitting by determining the ideal point to stop the learning process.

       1.3.1.4 Model Development

      The ultimate goal of this stage is to create, train, and test the ML model. The learning process is continued until it provides an appropriate degree of accuracy on the training data. A set of statistical processing processes is referred to as an algorithm. The type of algorithm used is determined by the kind (labeled or unlabeled) and quantity of data in the training data set, as well as the problem to be solved. Different ML algorithms are used concerning labeled data. The ML algorithm adjusts weights and biases to give accurate results.

       i. Support Vector Machine

      Support vector machine finds out an optimum decision boundary to divide the linear data into different classes. It is also useful to classify nonlinear data by employing the concept of kernels to transform the input data into higher dimension data. The nonlinear data will be categorized into different classes in the new higher-dimensional space by finding out an optimum decision surface.

       ii. Regression Algorithm

       iii. Decision Tree

       iv. K-means Clustering

Скачать книгу