Скачать книгу

States and Canada [28]. Sentiment grabber model was developed using Ontology, probabilistic LDA and text annotations [13]. Hotel reviews were automatically classified by using SVM and fuzzy domain ontology [29].

      Double propagation method was proposed for the retrieval of new sentiments from sentences and positive or negative polarity was assigned for them [33]. Product’s features were extracted using unsupervised learning techniques [11] from the review documents, and words belong to the same concept are grouped using Latent Semantic Association (LaSA) model. Text analysis and statistical techniques were used to rank the product quality from their websites [31]. NLP techniques were used to identify the most frequently used positive and negative sentiment words for the classification of movie documents [37]. Non-negative matrix factorization and clustering techniques were used for retrieving suitable answers for the given query as a text summarization technique [38]. Lexicon-based NLP techniques were used to extract conjunctions, connectives, modals and conditionals for sentiment polarity detection of tweets [34]. Basiri et al. [39] used Dempster–Shafer theory for sentiment aggregation at document level using the mass function. The probabilistic based Latent Dirichlet Allocation (LDA) was used for annotation of semantics in text documents [13].

      Ontology learning includes extraction of domain terms from the sources, modelling of data through Ontology development and easy retrieval while querying. Manual building of Ontology takes greater effort and it is complex and challenging. This motivates the researchers to automatically generate Ontology for the domain specific terms present in the social media reviews written for a product/service. The Ontology-based Semantic Indexing (OnSI) method tags concepts and attributes, into the Ontology using the contextually related words. It enables query processing and further information retrieval processing easier in subsequent steps. This semantic-based approach of indexing improves higher accuracy while identifying the concepts or attributes (or features) from the contents of text documents [26, 27]. Ontology-based approach for mobile product review classification was resulted in precision 75% and in recall 40% [52], and recall more than 82% [27].

      Feature extraction from product or service review documents often includes different steps like data pre-processing, document indexing, dimension reduction, model training, testing, and evaluation. Labeled data set of document collection is used to train or learn the model. Further, the learned model is used for identifying unlabeled concept instances from the new set of documents. Document indexing is the most critical and complex task in text analysis. It decides the set of key features to represent the document. It also enhances the relevancy between the word (or feature) and the document. It needs to be very effective as it decides the storage space required and query processing time of documents.

      The Ontology-based or semantic-based approach is used to retrieve the concepts from the documents by establishing the contextual relationships. In content-based approach, BagOfWords model is used for representing the text, where synonymy and polysemy cannot be resolved as they use terns as indexes. However, in context-based semantic approaches like topic modeling techniques, concepts are used to extract information and their categorization. It projects the contextual relationship among the terms present in the documents.

      The World Wide Web has large amount of text documents and the necessity of annotating them has become vital. Customers would express their views by writing product reviews, user feedback in the descriptive format, which is mostly in unstructured. This format makes difficult for the machines to process these text documents. Hence, it becomes necessary to annotate large volume of text in order to develop business intelligence or automated solutions. These data have to be analyzed and modelled for enabling the decision-making process. The challenging task of extracting the information is made easier by adding, annotating documents, which in turn paves the way for automated solutions [53].

      Feature extraction is the process of building dataset with informative and non-redundant features from the initial set of data. The subsequent methods like feature selection reduces the amount of resources required for its representation. Many machine learning algorithms like classification and clustering are used for extracting the features such as entities and attributes from the text documents using their properties which are similar. The challenges like absence of semantic relations between entities while feature selection and lack of prior knowledge in domain may be overcome by applying suitable NLP and IE techniques [54, 55].

      Feature extraction from product or service review documents often includes different steps like data pre-processing, document indexing, dimension reduction, model training, testing, and evaluation. Labeled data set of document collection is used to train or learn the model. Further, the learned model is used for identifying unlabeled concept instances from the new set of documents. Document indexing is the most critical and complex task in text analysis. It decides the set of key features to represent the document. It also enhances the relevancy between the word (or feature) and the document. It needs to be very effective as it decides the storage space required and query processing time of documents.

Скачать книгу