Скачать книгу

tools have been developed for supporting the job of data scientists. One of the leading company in this sector is the apache foundation that developed a high number of applications perfectly suited for handling big data like Apache Hadoop,16 Apache Spark,17 Apache Cassandra,18 Apache Commons RDF,19 Apache Jena,20 and many others.

      The impact of Open Data at the economic, political, and social levels has become clear in the recent years. The European Data Portal21 publishes every year several studies and reports about the situation of Open Data in Europe. They distinguished the benefits coming from Open Data in direct and indirect benefits. In their study [Carrara et al., 2015], they defined direct benefits as “monetised benefits that are realized in market transactions in the form of revenues and Gross Value Added (GVA), the number of jobs involved in producing a service or product, and cost savings” and indirect benefits as “new goods and services, time savings for users of applications using Open Data, knowledge economy growth, increased efficiency in public services and growth of related markets.” In the same document they estimate that the direct value of the Open Data market in the European Union is 55.3 billion Euros, with a potential growth between 2016 and 2020 of 36.9% to a value of 75.7 billion Euros, and that the overall Open Data market reaches is estimated to be between 193 and 209 billion Euros, with an estimated projection of 265–286 billion Euros for 2020. They also quantified the economic benefits by looking other three indicator: number of jobs created, cost savings, and efficiency gains. The forecasted number of direct Open Data jobs is expected to rise from 75,000 of 2016 to nearly 100,000 jobs by 2020. Moreover, thanks to the positive economic effect on innovation and the development of numerous tools to increase efficiency, not only the private sector, but also the public sector is expected to experience an increased level of cost savings through Open Data to a total of 1.7 billion Euros by 2020. They also estimated an augmentation of 7.000 saved lives thanks to a quicker response, a decreasing of 5.5% in road fatalities, a decreasing of 16% in enery usage, etc.

      Another important document that assesses the value of Open Data is Manyika et al. [2013]. That document, created in 2013, estimates the value of the world wide Open Data market is about 3 trillions dollar annually (1.1 trillion for the U.S. market, 0.7 trillion for the European market, and 1.7 for the others). The value is calculated over seven domains of interest (Education, Transportation, Consumer Products, Electricity, Oil and Gas, Health Care, Consumer finance). The staggering difference between the previous values imply that calculating the value of Open Data is not an easy task and that the value is highly dependent on the field in study. At the best of our knowledge there are no actual estimation of the value of the U.S. Open Data market.

      In order to unlock the full potential of Linked Data and to understand how to extract the maximum profit from them, it is important to dive into the technologies that have favored the birth of Linked Data [Bikakis et al., 2013]. Semantic Web is built upon a series of different technologies that have been piled up. All of these technologies form the Semantic Web Stack. Figure 1.6 represents the stack and highlights the logical structure (Concept and Abstraction) and the technologies adopted (Specification and Solutions) for the creation of the Semantic Web.

image

      Figure 1.6: Semantic Web Stack.

      The first layer of the stack is clearly the media for the information transfer, the Web platform. The idea behind the Semantic Web was to create a globally distributed database. This means that it is necessary to univocally identify the resources and that is necessary to adopt an universally accepted encoding system in order to identify thing even between countries that adopt different writing systems. This first step was accomplished by the adoption of URI (Uniform Resource Identifies). With the advent of RDF1.1, in 2014, the actual naming convention standard became the IRI (International Resource Identifier). IRIs are sequences of Unicode characters and supports any character of any languages. This is a quite important progress in the multi-cultural context of the Internet.

      Once defined how to identify and how to access the resources, it is mandatory to create them and provide additional information. The Resource Description Framework (RDF) is the model adopted for solving this task and it is a general purpose language for representing information about resources. RDF has a very simple and flexible data model, based on the central concept of the RDF statement. RDF statements describes simple facts as triples in the form of Subject – Predicate – Object consisting of the resource being described (the subject), a property (the predicate), and a property value (the object). In particular, the subject can either be an IRI or a Blank node, the predicate must be an IRI and the object can be an IRI, Blank node, or RDF Literal. A Blank node is a placeholder that stands for a resource to which no IRI nor literal is given. A collection of RDF statements (or else RDF triples) can be intuitively understood as a directed labeled graph, where the resources are nodes and the statements are arcs connecting two nodes (from the subject node to the object node). Finally, a set of RDF triples is called RDF Dataset or RDF Graph.

      RDF data can be written down in a number of different formats, known as serialization. The first standard serialization format is called RDF/XML and it is based on XML tags system. Although the RDF/XML is still in use, other RDF serialization are now preferred because they are more human-friendly. The other serialization formats include:

      • RDFa: notation for embedding RDF metadata in XHTML web pages;

      • N-Triples: an intuitive and line-based format. It express each triple of an RDF graph on a different line;

      • N3 (Notation 3): a serialization format developed by Tim Berners-Lee and designed to be compact and human-readable;

      • Turtle (Terse RDF Triple Language): a compact and human-friendly format. It is a subset of N3;

      • TriG: extension of Turtle notation;

      • N-Quads: a superset of N-Triples, for serializing multiple RDF graphs. The fouth element of the “triple” contains the name of the graph to which the statement belongs; and

      • JSON-LD: the standard JSON based serialization format that superseded RDF/JSON format. It can be used for writing RDF triples in a JSON style.

      The third layer of the stack aims at structuring the data. The former RDF model and its extension, the RDFS (RDF Schema), were designed to describe, using a set of reserved terms called the RDFS vocabulary, resources and/or relationships between resources. They provide constructs for the description of types of objects (classes), type hierarchies (subclasses), properties that represent object features (properties), and property hierarchies (subproperty). In particular, a Class in RDFS corresponds to the generic concept of a type or category, somewhat like the notion of a class in object-oriented languages, and is defined using the construct rdfs:Class. The resources that belong to a class are called its instances. An instance of a class is a resource having an rdf:type property whose value is the specific class. Moreover, a resource may be an instance of more than one class. Classes can be organized in a hierarchical fashion using the construct rdfs:subClassOf. A property in RDFS is used to characterize a class or a set of classes and is defined using the construct rdf:Property. The Web Ontology Language (OWL) was released in 2004 and is the standard language for defining and instantiating Web ontologies. OWL and RDFS have several similarities. Indeed, OWL is defined as a vocabulary like RDF, however OWL has richer semantics. An OWL Class is defined using the construct owl:Class and represents a set of individuals with common properties. Moreover, OWL provides additional constructors for class definition, including the basic set operations, union, intersection and complement that are implemented, respectively, by the constructs owl:unionOf, owl:intersectionOf, and owl:complementOf. Regarding the individuals, OWL allows to specify two individuals to be identical or different through the owl:sameAs and owl:differentFrom constructs. Unlike RDF Schema, OWL distinguishes a property whose range is a datatype value (owl:DatatypeProperty) from a property whose range is a set

Скачать книгу