Скачать книгу

to such assignment as a partial assignment to an instance; more specifically, for any VAΒ, a V-partial assignment (or partial assignment on V) is a function ϕ : V → Σ. For notational convenience, we sometimes write ΣV to denote the set of all functions from V to Σ.

       Theorem 2.2

      Moshkovitz-Raz PCP [Moshkovitz and Raz 2010 (Theorem 11)]. For every n and every ϵ > 0 (in particular, ϵ may be a function of n), solving 3SAT on inputs of size n can be reduced to distinguishing between the case that a (dA, dB)-bi-regular instance of LABEL COVER, with parameters |A| + |B| = n1+o(1) · poly(1/∈), |ΣA| = 2poly(1/∈), and dA, dB, |ΣB| = poly(1/∈), is completely satisfiable, versus the case that it has value at most ϵ.

      Counting the number of satisfying assignments is even harder. The following hardness is well-known, and we sketch its proof only for completeness:

       Fact 2.1

      There is a linear-time reduction from #3SAT to counting the number of satisfying assignments of a LABEL COVER instance.

       Proof

      Construct a vertex in A for each variable and a vertex in Β for each clause. Set ΣA ≜ {0, 1} and let ΣB ≜ {0, 1}3 \ (000) (i.e., ΣB is the set of satisfying assignments for a 3SAT clause, after applying negations). Now if variable x appears in clause C, add a constraint that the assignments to x and C are consistent (taking into account the sign of x in C). Notice that: (i) any assignment to A corresponds to a unique assignment to the 3SAT formula; and (ii) if the 3SAT formula is satisfied, this assignment uniquely defines a satisfying assignment to Β. Therefore there is a one-to-one correspondence between satisfying assignments to the 3SAT formula and to the instance of LABEL COVER.

      ■

      For a universe (or ground set) U, a concept C is simply a subset of U and a concept class C is a collection of concepts. For convenience, we sometimes relax the definition and allow the concepts to not be subsets of U; all definitions here extend naturally to this case.

      The VC and Littlestone’s Dimensions can be defined as follows.

       Definition 2.8

      VC Dimension [Vapnik and Chervonenkis 1971]. A subset S ⊆ U is said to be shattered by a concept class C if, for every TS, there exists a concept C ∈ C such that T = SC.

      The VC Dimension VC-dim(C, U) of a concept class C with respect to the universe U is the largest d such that there exists a subset S ⊆ U of size d that is shattered by C.

       Definition 2.9

      Mistake tree and Littlestone’s Dimension [Littlestone 1987]. A depth-d instance-labeled tree of U is a full binary tree of depth d such that every internal node of the tree is assigned an element of U. For convenience, we will identify each node in the tree canonically by a binary string s of length at most d.

      A depth-d mistake tree (aka shattered tree [Ben-David et al. 2009]) for a universe U and a concept class C is a depth-d instance-labeled tree of U such that, if we let vs ∈ U denote the element assigned to the vertex s for every s ∈ {0, 1}<d, then, for every leaf ∈ {0, 1}d, there exists a concept C ∈ C that agrees with the path from root to it, i.e., that, for every imageC iff i+1 = 1 where ≤i denotes the prefix of of length i.

      The Littlestone’s Dimension L-dim(C, U) of a concept class C with respect to the universe U is defined as the maximum d such that there exists a depth-d mistake tree for U, C.

      An equivalent formulation of Littlestone’s Dimension is through mistakes made in online learning, as stated below. This interpretation will be useful in the proof of Theorem 14.4.

       Definition 2.10

      Mistake bound. An online algorithm A is an algorithm that, at time step i, is given an element xi ∈ U and the algorithm outputs a prediction pi ∈ {0, 1} whether x is in the class. After the prediction, the algorithm is told the correct answer hi ∈ {0, 1}. For a sequence (x1, h1),…, (xn, hn), a prediction mistake of A is defined as the number of incorrect predictions, i.e., Σi∈n 1[pihi]. The mistake bound of A for a concept class C is defined as the maximum prediction mistake of A over all the sequences (x1, h1),…, (xn, hn), which corresponds to a concept C ∈ C (i.e., hi = 1[xiC] for all i ∈ [n]).

       Theorem 2.3

      Littlestone [1987]. For any universe U and any concept class C, L-dim(C, U) is equal to the minimum mistake bound of C, U over all online algorithms.

      The following facts are well-known and follow easily from the above definitions.

       Fact 2.2

      For any universe U and concept class C, we have

image

       Fact 2.3

      For any two universes U1, U2 and any concept class C,

image

      In this section, we introduce information-theoretic quantities used mostly in Chapter 12. For a more thorough introduction, the reader should refer to Cover and Thomas [2012]. Unless stated otherwise, all log’s in this book paper are base-2.

       Definition 2.11

      Let μ be a probability distribution on sample space Ω. The Shannon entropy (or just entropy) of μ, denoted by H(μ), is defined as image.

       Definition 2.12

      Binary entropy function. For p ∈ [0, 1], the binary entropy function is defined as follows (with a slight abuse of notation): H(p) := − p log p − (1 − p) log(1 − p).

       Fact 2.4

      Concavity of binary entropy. Let μ be a distribution on [0, 1], and let p ~ μ. Then image.

      For a random variable A, we shall write H(A) to denote the entropy of the induced distribution on the support of A. We use the same abuse of notation for other information-theoretic quantities appearing later in this section.

       Definition 2.13

      The conditional entropy of a random variable A conditioned on Β is defined as

image

       Fact 2.5

Скачать книгу