Аннотация

Ontologies have become increasingly important as the use of knowledge graphs, machine learning, natural language processing (NLP), and the amount of data generated on a daily basis has exploded. As of 2014, 90% of the data in the digital universe was generated in the two years prior, and the volume of data was projected to grow from 3.2 zettabytes to 40 zettabytes in the next six years. The very real issues that government, research, and commercial organizations are facing in order to sift through this amount of information to support decision-making alone mandate increasing automation. Yet, the data profiling, NLP, and learning algorithms that are ground-zero for data integration, manipulation, and search provide less than satisfactory results unless they utilize terms with unambiguous semantics, such as those found in ontologies and well-formed rule sets. Ontologies can provide a rich «schema» for the knowledge graphs underlying these technologies as well as the terminological and semantic basis for dramatic improvements in results. Many ontology projects fail, however, due at least in part to a lack of discipline in the development process. This book, motivated by the Ontology 101 tutorial given for many years at what was originally the Semantic Technology Conference (SemTech) and then later from a semester-long university class, is designed to provide the foundations for ontology engineering. The book can serve as a course textbook or a primer for all those interested in ontologies.

Аннотация

The World Wide Web is now deeply intertwined with our lives, and has become a catalyst for a data deluge, making vast amounts of data available online, at a click of a button. With Web 2.0, users are no longer passive consumers, but active publishers and curators of data. Hence, from science to food manufacturing, from data journalism to personal well-being, from social media to art, there is a strong interest in provenance, a description of what influenced an artifact, a data set, a document, a blog, or any resource on the Web and beyond. Provenance is a crucial piece of information that can help a consumer make a judgment as to whether something can be trusted. Provenance is no longer seen as a curiosity in art circles, but it is regarded as pragmatically, ethically, and methodologically crucial for our day-to-day data manipulation and curation activities on the Web.
Following the recent publication of the PROV standard for provenance on the Web, which the two authors actively help shape in the Provenance Working Group at the World Wide Web Consortium, this Synthesis lecture is a hands-on introduction to PROV aimed at Web and linked data professionals. By means of recipes, illustrations, a website at www.provbook.org, and tools, it guides practitioners through a variety of issues related to provenance: how to generate provenance, publish it on the Web, make it discoverable, and how to utilize it. Equipped with this knowledge, practictioners will be in a position to develop novel applications that can bring open-ness, trust, and accountability.
Table of Contents: Preface / Acknowledgments / Introduction / A Data Journalism Scenario / The PROV Ontology / Provenance Recipes / Validation, Compliance, Quality, Replay / Provenance Management / Conclusion / Bibliography / Authors' Biographies / Index

Аннотация

This book describes OCLC’s contributions to the transformation of the Internet from a web of documents to a Web of Data. The new Web is a growing ‘cloud’ of interconnected resources that identify the things people want to know about when they approach the Internet with an information need.
The linked data architecture has achieved critical mass just as it has become clear that library standards for resource description are nearing obsolescence. Working for the world’s largest library cooperative, OCLC researchers have been active participants in the development of next-generation standards for library resource description. By engaging with an international community of library and Web standards experts, they have published some of the most widely used RDF datasets representing library collections and librarianship.
This book focuses on the conceptual and technical challenges involved in publishing linked data derived from traditional library metadata. This transformation is a high priority because most searches for information start not in the library, nor even in a Web-accessible library catalog, but elsewhere on the Internet. Modeling data in a form that the broader Web understands will project the value of libraries into the Digital Information Age.
The exposition is aimed at librarians, archivists, computer scientists, and other professionals interested in modeling bibliographic descriptions as linked data. It aims to achieve a balanced treatment of theory, technical detail, and practical application.

Аннотация

Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. This has led to an increasing demand for powerful software tools to help people analyze and manage vast amounts of text data effectively and efficiently. Unlike data generated by a computer system or sensors, text data are usually generated directly by humans, and are accompanied by semantically rich content. As such, text data are especially valuable for discovering knowledge about human opinions and preferences, in addition to many other kinds of knowledge that we encode in text. In contrast to structured data, which conform to well-defined schemas (thus are relatively easy for computers to handle), text has less explicit structure, requiring computer processing toward understanding of the content encoded in text. The current technology of natural language processing has not yet reached a point to enable a computer to precisely understand natural language text, but a wide range of statistical and heuristic approaches to analysis and management of text data have been developed over the past few decades. They are usually very robust and can be applied to analyze and manage text data in any natural language, and about any topic. This book provides a systematic introduction to all these approaches, with an emphasis on covering the most useful knowledge and skills required to build a variety of practically useful text information systems. The focus is on text mining applications that can help users analyze patterns in text data to extract and reveal useful knowledge. Information retrieval systems, including search engines and recommender systems, are also covered as supporting technology for text mining applications. The book covers the major concepts, techniques, and ideas in text data mining and information retrieval from a practical viewpoint, and includes many hands-on exercises designed with a companion software toolkit (i.e., MeTA) to help readers learn how to apply techniques of text mining and information retrieval to real-world text data and how to experiment with and improve some of the algorithms for interesting application tasks. The book can be used as a textbook for a computer science undergraduate course or a reference book for practitioners working on relevant problems in analyzing and managing text data.

Аннотация

The surge of interest in the REpresentational State Transfer (REST) architectural style, the Semantic Web, and Linked Data has resulted in the development of innovative, flexible, and powerful systems that embrace one or more of these compatible technologies. However, most developers, architects, Information Technology managers, and platform owners have only been exposed to the basics of resource-oriented architectures. This book is an attempt to catalog and elucidate several reusable solutions that have been seen in the wild in the now increasingly familiar «patterns book» style. These are not turn key implementations, but rather, useful strategies for solving certain problems in the development of modern, resource-oriented systems, both on the public Web and within an organization's firewalls.
Table of Contents: List of Figures / Informational Patterns / Applicative Patterns / Procedural Patterns

Аннотация

Linked Data (LD) is a well-established standard for publishing and managing structured information on the Web, gathering and bridging together knowledge from different scientific and commercial domains. The development of Linked Data Visualization techniques and tools has been followed as the primary means for the analysis of this vast amount of information by data scientists, domain experts, business users, and citizens. This book covers a wide spectrum of visualization issues, providing an overview of the recent advances in this area, focusing on techniques, tools, and use cases of visualization and visual analysis of LD. It presents the basic concepts related to data visualization and the LD technologies, the techniques employed for data visualization based on the characteristics of data techniques for Big Data visualization, use tools and use cases in the LD context, and finally a thorough assessment of the usability of these tools under different scenarios. The purpose of this book is to offer a complete guide to the evolution of LD visualization for interested readers from any background and to empower them to get started with the visual analysis of such data. This book can serve as a course textbook or a primer for all those interested in LD and data visualization.

Аннотация

This book introduces core natural language processing (NLP) technologies to non-experts in an easily accessible way, as a series of building blocks that lead the user to understand key technologies, why they are required, and how to integrate them into Semantic Web applications. Natural language processing and Semantic Web technologies have different, but complementary roles in data management. Combining these two technologies enables structured and unstructured data to merge seamlessly. Semantic Web technologies aim to convert unstructured data to meaningful representations, which benefit enormously from the use of NLP technologies, thereby enabling applications such as connecting text to Linked Open Data, connecting texts to each other, semantic searching, information visualization, and modeling of user behavior in online networks. The first half of this book describes the basic NLP processing tools: tokenization, part-of-speech tagging, and morphological analysis, in addition to the main tools required for an information extraction system (named entity recognition and relation extraction) which build on these components. The second half of the book explains how Semantic Web and NLP technologies can enhance each other, for example via semantic annotation, ontology linking, and population. These chapters also discuss sentiment analysis, a key component in making sense of textual data, and the difficulties of performing NLP on social media, as well as some proposed solutions. The book finishes by investigating some applications of these tools, focusing on semantic search and visualization, modeling user behavior, and an outlook on the future.