Скачать книгу

(up to variable renaming) is found in the table, so it is not called again but instead the answers in the table are returned to it, which leads to new answers: dsw is returned from the table in line 8, which results in a new answer of dm on line 11; and of tjg is returned on line 13. Then the new answer of dm is returned from the table but that does not lead to new answers. Finally, tjg is returned but that again does not lead to any answers.

      Looking at these two programs and the single-source query, it is interesting to compare the different algorithms. We see that the second derivation is shorter, and this aspect is not an accident! Say we have a database with N people in which everyone cited everyone else. Note that with the right-recursive definition we will get N tabled subgoals, each of the form influenced(person, XX) for the N persons. And each tabled subgoal will have N answers, one for each person cited. So this table has size O(N2). For the left-recursive definition, we have only one tabled subgoal, influenced(dm, COLL)—for the initial query—and that subgoal will have only N answers. So the size of the table is O(N). It can be shown that O(N2d) and O(Nd), where d is the maximum node out-degree, are the respective worst-case time complexities of the two algorithms.

      Tekle and Liu [2010, 2011] present precise time and space complexity analysis for top-down evaluation with variant tabling and subsumptive tabling using finegrained complexity factors.

       1.5.2.2 Other Top-Down Approaches

      We briefly mention some variations on the top-down strategy aimed at evaluation of Datalog (or more general) programs. Tamaki and Sato [1986] introduced OLDT resolution, which is an iterative-deepening approach with tabling of answers from previous iterations. This paper was the first formal specification of a top-down tabling algorithm for logic programming, but it did not include an implementation.

      The Extension Table (ET) evaluation mechanism [Dietrich 1987, Dietrich and Warren 1986] is a form of tabling that can be implemented easily in Prolog. It uses two additional structures for each predicate p to which it is applied. One we call et_p, for the extension table for p. It has the same columns as p, and is used to record previously derived facts for p. The other is call_p, which contains the patterns of variables and constants used in subgoals for p encountered so far. These predicates record the subgoals and their answers, and are updated, or used, when subgoals are invoked and answers are returned. However, this simple Prolog-based algorithm is not complete. Consider for example the left-recursive influenced/2 definition, with the order of the two rules reversed. Then the initial goal, influenced(dm, COLL), will immediately call itself. But there are no answers for it yet, so it must fail back to try other paths to generate answers. Unless computation somehow comes back to restart that “failed” subgoal call, some answers may be lost. One might think that by re-ordering clauses, one can avoid this problem, but that strategy does not work in general. For completeness, subsequent subgoals must either suspend, to be resumed later if necessary, or must somehow be regenerated. To be complete, the ET algorithm repeatedly iterates its computation until the table predicates do not change.

      Variations and extensions of the ET algorithm have been developed and implemented in some Prolog systems including B-Prolog by Zhou et al. [2001] and ALS Prolog by Guo and Gupta [2001]. These techniques are known as linear tabling.

      The SLG algorithm of XSB [Chen and Warren 1996, Swift and Warren 2011, Warren et al. 2007] is an extension of OLDT, described above. It differs from OLDT mainly in that it suspends subgoal computations, which may later be resumed, as necessary, similarly to the discussion in Section 1.5.2.1. The fact that it suspends and resumes computation means that it does not do the recomputation that is necessary with iterative methods, such as ET, and thus has better, and simpler, complexity properties. XSB also incorporates an “incremental completion” algorithm that, by controlling the order of task scheduling and keeping track of subgoal dependencies, determines when subgoals are completely evaluated (that is, have all their answers in the table) well before the entire computation is completed. This approach enables efficient handling of default negation.

      Vielle developed the Query-SubQuery (QSQ) approach [Vieille 1986] specifically for Datalog, and it differs from tabling and ET in having “multigoals” where a given argument position in a predicate can contain a set of bindings. (We note that the original version of QSQ was incomplete on some programs—it did not always find all answers. Later work by Nejdl [1987] and Vieille [1987] provided complete versions of QSQ.) Such generalized goals can arise both from combining goals sharing the same pattern of variables and constants, and by solving goals against an EDB in a set-at-a-time manner. For example, consider solving the query

Image

      using the second rule for related/2:

Image

      The first subgoal will be advised(B, dsw). If we ask for all matching bindings from the EDB at once, we will get back {advised(jbf, dsw), advised(wcr, dsw)}. From that result, QSQ will generate the generalized subgoal related({jbf, wcr}, X). That subgoal can be matched with the same related rule, generating the generalized subgoal advised(B1, {jsf,wcr}). This subgoal illustrates one of the advantages of QSQ, that it can merge several calls to the EDB into one call (which is useful if the EDB is managed externally, say, by a database system). Solving this generalized subgoal will give the set of bindings B1 = {hw, dss}. However, rather than creating a new generalized subgoal related({hw,dss}, P2), these bindings will be merged with the existing generalized subgoal to get

Image

      If any of the new bindings had matched old bindings, it would indicate that there might be some results already available.

      QSQ comes in two flavors: iterative (QSQI) and recursive (QSQR). Both involve the steps of generating new answers and creating new subgoals [Gottlob et al. 1989]. QSQI gives priority to new answers, and suspends working on any new subgoals until all answers have been generated that do not require those subgoals. In QSQR, on the other hand, when a new subgoal is encountered, it becomes the focus and processing of the current subgoal is suspended. In practice, QSQI tends to be worse because of duplicate rule firings [Bancilhon and Ramakrishnan 1986]. QSQR seems closely related to SLG resolution applied to programs without negation, but to our knowledge no detailed comparison has been attempted.

       1.5.3 Bottom-Up vs. Top-Down

      The question naturally arises as to whether bottom-up or top-down is generally better for Datalog evaluation, and which particular methods are best. Obviously, one can concoct programs and datasets that favor a particular method, and some methods will not work at all on a given program. (For example, some counting variants of Magic Sets work for only some patterns of recursion.) Bancilhon and Ramakrishnan [1986] compare a range of bottom-up and top-down methods analytically on a variety of Datalog programs, EDB structures and query-binding patterns. While there are different orderings of the methods on the different cases, some patterns emerge, such as Magic Sets and QSQR generally being comparable in predicted performance, and usually beating QSQI and semi-naïve. Ullman [1989] claims that there are bottom-up methods that will always perform as well or better as top-down methods on Datalog. His argument is that for a given Datalog program P and query Q, there is a modified version of P that will generate all answers for Q that, when evaluated by the semi-naïve method, will beat a depth-first top-down method. He further claims that the Magic Sets approach bounds the performance of memoizing versions of top-down such as OLDT and QSQ. Bry [1990], however, disputes Ullman’s conclusion. He defines a top-down method called the Backward Fixpoint Procedure (BFP) and claims that bottom-up rewrite methods and top-down memoizing methods are actually BFP in different forms.

      In

Скачать книгу