Скачать книгу

that Big Data anlysis techniques must be implemented for mining data. Big Data is formed of large, diverse, complex, longitudinal, and distributed data sets generated from various instruments, sensors, Internet transactions, email, video, click streams, and other sources, whereas open-linked data focusses on the opening and the combining of data. The data can be released both by public organizations and by private organizations or individuals. Big Data analytics can be used to promote better utilization of resources and improved personalization. Naturally, there are no barriers between Big Data, Linked Data, and Open Data. It means that when a dataset is at the same time open, structured in node-edge fashion, and tremendously big, it can be referred as a BOLD (Big, Open, and Linked Data) source.

image

      Figure 1.4: Transition from the Web of Documents to the Web of Data.

      As a consequence, the arisen of the Web of Data gave birth to new specialized figures that can boost the value of those data. Data analysts, which are able to analyze and discover patterns from the data, Data Scientists, which try to predict the future based over past data, or the Chief Data Officer (CDO), who has the duty of defining and governing the data improvements strategy for supporting the achievement of corporate objectives, are only a few figures born for handling with the Web of Data.

      Now, in 2019, we are already entering the fourth-generation internet, the Internet of Things, or the Web of intelligence connections. It is talked to be the web of the augmented reality for interacting at the same time with the real world and the online world. Domotic houses, smart domestic appliances, and voice assistants are only a few applications that will take place in the following years. Although interesting, the innovations of the Web 4.0 are out of the scope of this book and will not be addressed.

      The term Linked Data was coined in 2006 from one of the creators of the Web, Sir Tim Berners-Lee. At the same time he published a note3 listing four rules for publishing LD.

      1. Use URIs as names for things. This is the first rule for publishing LD. This rule is the first milestone for creating a system where all resources could be univocally identified. The term resource refers both to real-world objects than web pages.

      2. Use HTTP URIs so that people can look up those names. The second rule adopted the HTTP protocol as the mean for reaching resources and their information. Thanks to it, users are able to look for a specific object and get all the information they need as a result. Moreover, considering the fact that the resources should also be machine-readable, it is possible to exploit the content negotiation system for obtaining different representations of the requested resource.

      3. When someone looks up a URI, provide useful information using the standard. This means the resource’s information should be returned to the requester in an RDF compliant format.

      4. Include links to other URIs so that they can discover more things. The last rule emphasizes the fact that resources should be connected to other resources in order to create what can be considered as the successor of WWW, the Giant Global Graph. This rule is the enabler of the great connectivity of Linked Data. Starting from a resource, the users of the Web have the possibility to jump from an object to another resource as they desire.

      Some time later, more precisely at the TED4 conference in 2009, the same Tim Berners-Lee restated the principles defined in 2006 as three “extremely simple” rules.5

      1. All kinds of conceptual things, they have names now that start with HTTP.

      2. If I take one of those HTTP names and I look it up … I will get back some data in a standard format which is kind of useful data that somebody might like to know about that thing, about the event, …

      3. When I get back that information it’s not just got somebody’s height and weight and when they were born, it’s got relationships. And when it has relationships, whenever it expresses a relationship, then the other thing that it’s related to is given one of those names that start with HTTP, so that I can go ahead and look that thing up.

      Shortly before the birth of Linked Data principles, Open Data arose and there were defined some principles. The first appearance of the term “Open Data” dates back in 1995 in a document of American scientific agency. That document stated that geophysical and environmental data transcends political border so they promoted a complete and open exchange of scientific information between different countries. However, a formal definition of the term Open Data wait until 2005 with the Open Definition 2.1.6 This document holds several characteristics for data to be considered open and it can be summarized as: “Knowledge is open if anyone is free to access, use, modify, and share it—subject, at most, to measures that preserve provenance and opennes.” Moreover, a more specific definition of the term Open Government Data7 had to wait for 2007 where 30 advocates gathered in Sebastopol, California. The meeting was meant to design a set of principles of open government data but the same logic could be inherited by all kinds of Open Data. At the end of the meeting it was stated that government data is considered open if it is compliance with the following principles.

      • Complete. All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations.

      • Primary. Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms.

      • Timely. Data is made available as quickly as necessary to preserve the value of the data.

      • Accessible. Data is available to the widest range of users for the widest range of purposes.

      • Machine processable. Data is reasonably structured to allow automated processing.

      • Non-discriminatory. Data is available to anyone, with no requirement of registration.

      • Non-proprietary. Data is available in a format over which no entity has exclusive control.

      • License-free. Data is not subject to any copyright, patent, trademark, or trade secret regulation. Reasonable privacy, security, and privilege restriction may be allowed.

      Well aware of the advantages both Linked Data and Open Data offered, it didn’t take long before someone started encouraging to fuse Linked Data with Open Data. In fact, in 2010, the same Tim Berners-Lee published an extension of its note containing a star rating system for publishing Linked Open Data (LOD). Every rule of this rating system is a specialization of the previous one, it means that a five-star dataset satisfies all the criteria.

      image Available on the Web (whatever format) but with an open license, to be Open Data. Documents are now publicly available online. Everyone can read, edit, save, share, and print them but unless building a custom parser, it is hard to extract data.

      image Available as machine-readable structured data (e.g., Excel instead of image scan of a table, …). Data are now accessible to machines but they remain bound to a proprietary file format. Extracting data means depending on proprietary software.

      image Available through non-proprietary format (e.g., CSV instead of Excel, …). Data are now fully accessible to everyone (both humans and machines) but they are still bound in documents and not freely accessible from the Web.

      image Use open standards from W3C (RDF and SPARQL) to identify things so that people can point at your stuff. Every resource has its own URI that identifies it univocally. Users can look them up through HTTP requests and read, edit, and share those data freely. Generally, the data are represented through RDF format however they can be converted

Скачать книгу