Скачать книгу

algebra. Datalog was a simple alternative that was similar to domain relational calculus, with which the database theory community was familiar. Thus, it was readily understood, and provided a natural setting in which to study topics such as deductive databases, recursion, its interaction with negation, and evaluation techniques. Much of the early discussion and presentations on Datalog took place at the informal “XP” workshops6 (particularly XP 4.5 in 1983 and XP 7.52 in 1986) and the early symposia on Principals of Database Systems (PODS), which were a follow-up to the XP workshops to some extent. Work on Datalog also highlighted the difference between query answering as evaluation in a model vs. deduction in a theory.

      Background. It was recognized early on that there were recursive queries expressible neither in relational algebra nor relational calculus. For example, Aho and Ullman [1979] prove the inexpressibility of transitive closure by finite relational expressions, and consider various extensions to handle it, such as a least-fixed-point operator and embedding in a host programming language. Paredaens [1978] and Bancilhon [1978] also explore this issue. The PROBE system supported traversal recursion over directed graphs [Rosenthal et al. 1986].

      The connection of logic and databases predates the relational model. For example, Green and Raphael [1968] uses theorem proving as the basis for the question-answering system QA1 that can “deduce facts that are not explicitly available in its data base.” The 1970s was an active time for investigating the connections between logic and databases, such as model-theoretic vs. proof-theoretic views of a database [Nicolas and Gallaire 1977] and “closed-world” vs. “open-world” assumptions about the information in a database [Reiter 1977b]. It was also a time when prototype deductive databases based on logic began appearing, such as MRRPS 3.0 [Minker 1977], DADM [Kellogg et al. 1977], and DEDUCE 2 [Chang 1977]. While many researchers of deductive databases focused on function-free logic, Reiter [1977a] was explicit in his opinion that function-free logic “approximates [his] own intuitive concept of what should be a database,” for otherwise “any first-order theory is a database,” such as point-set topology. For a history of deductive databases, see Minker et al. [2014].

       Artificial Intelligence

      The use of logic and deduction as a basis for question-answering and reasoning in expert systems dates to at least the late 1960s. Rule-based systems were also a common approach to AI problem solving. Logic languages such as Prolog were attractive to this community because they encompassed both basic information and rule-based “intelligence” to work with that knowledge in a uniform model, while providing a formal foundation for rule-based reasoning. There were other attractions, such as the natural use of meta-programming features to manipulate programs and implement alternative evaluation systems. Prolog rules seemed an accessible means for domain experts (who were assumed not to be sophisticated programmers) to directly capture their reasoning strategies. Furthermore, the resolution-based deduction methods used with Prolog were at once a close analog of human reasoning and an efficient evaluation mechanism. By the early 1980s, there was interest in working with large fact bases in expert systems, and imbuing database systems with intelligence, manifested in the closely related areas of Knowledge-Based Systems (KBS) and Expert Database Systems (EDS). Some saw the function-free subset of Prolog as a happy medium between databases and general rule-based reasoning. Datalog became a common basis for work in the KBS and especially the EDS communities.

      Background. As mentioned, one of the earliest proposals to use theorem proving as a mechanism for query answering was that of Green [1969]. The 1970s saw a proliferation of expert systems that tried to replicate human expertise in computational form [Puppe 1993]. These early systems tended to be ad hoc, with much of their knowledge encoded procedurally. Toward 1980, KBS emerged as an architectural approach to make expert systems (and other reasoning applications) easier to construct and maintain, especially when involving large collections of information [Davis 1986]. In a KBS, there is a separation of knowledge structures and the computational mechanism to apply that knowledge. These two parts are often called the knowledge base and the inference engine. (The inference engine did not necessarily use logical inference. It could, for example, make use of probabilistic reasoning.) The knowledge base contains both concrete information (facts) and more abstract forms of knowledge, such as rules, templates, or classification hierarchies. Logic was the representation for the knowledge base in some KBS, although there were competing approaches, such as frame-based systems [Minsky 1975] and semantic networks [Findler 1979]. (It is interesting to note that there was some debate at the time as to the suitability of logic for this role [Hayes 1977, Hayes 1980, Winograd 1975].) The logic-based KBS often structured the knowledge base in the form of facts and rules, similar to a logic program. For example, the DLOG system [Goebel 1985] was a KBS using logic, and its implemented subset consisted of facts and Horn clauses that were passed to a Prolog-based interpreter. Another example of a logic-based framework for KBS was the Syllog system [Fellenstein et al. 1985], which had a database and Prolog-style rules, but presented a structured natural-language interface to them. In a similar vein to KBS, Expert Database Systems (EDS) were an effort to imbue database systems with richer representation and reasoning capabilities. Logic (usually in the guise of logic programming) was often the choice for providing capabilities such as classification hierarchies [Dahl 1982] and incorporating constraints into query answering [Dahl 1986, Kifer and Li 1988]. Kerschberg [1990] provides an overview of EDS.

       1.2.1 Uptake

      While Datalog had precursors from Logic Programming, Database Systems, and Artificial Intelligence, the bulk of the early work on it took place in the database theory and query-language communities. Deductive-database researchers showed some interest, but their focus was more on removing non-declarative features—such as cut—from Prolog while retaining top-down, SLD-resolution approaches to evaluation [Minker et al. 2014]. On the database-query side, however, there was strong interest in adding rules and recursion to existing query frameworks, which mainly employed bottom-up techniques based on relational algebra for evaluation. Since Datalog did not have function symbols, a safe program implied a finite number IDB facts from a finite EDB, meaning that bottom-up techniques would converge. Much early work on implementation techniques for Datalog and similar languages was based on bottom-up approaches, and researchers found ways to adapt these approaches to support additional features, such as complex objects, aggregation, and negation. These extensions motivated new semantic constructs, such as stratification and stable models for negation and other non-monotone extensions. Therefore, it should be no surprise that the majority of sources cited in this chapter appeared in database venues, through there is a significant body of references from logic programming and AI publications.

       1.3 Coining “Datalog”

      To the best of our knowledge, the name “Datalog” as applied to function-free Prolog was coined while two of the authors of this chapter, David Maier and David S. Warren, were working on the book Computing with Logic [Maier and Warren 1988]. The book used simplified logics and languages to introduce concepts such as substitution and resolution. The simplest logic was propositional Horn clauses, and the obvious name for the corresponding language was “Proplog.” We wanted to introduce unification and specialization of clauses using predicate Horn logic without function symbols, to avoid some of more complicated bits of Prolog initially, such as the occurs check and structure sharing. Maier and Warren kicked around ideas for a name for the corresponding language, but came up with nothing exciting at first. The next day, Maier came back with “Datalog,” presumably because predicates without function symbols looked a lot like database relations.

      At the time, several researchers had already proposed function-free, definiteclause logic as a basis for deductive databases. While Maier and Warren were not thinking about “Datalog” as the name for a database language, it quickly spread to that use, and the first appearances in print of “Datalog” depict it as a database language. Computing with Logic only came out in January 1988, but Maier introduced the name to students at Oregon Graduate Institute (then Oregon Graduate Center) and to collaborators at Stanford University while he was working on the book. The first uses of the name Datalog in the general literature, as far was we can tell, were in the proceedings of the PODS conference in March 1986, where it appears in Afrati et al. [1986] and Bancilhon et al. [1986]. The first

Скачать книгу