Given a distribution $\mu$, the goal of the distributional 20 questions game is to construct a strategy that identifies an unknown element drawn from $\mu$ using as little yes/no queries on average. Huffman’s algorithm constructs an optimal strategy, but the questions one has to ask can be arbitrary.
Given a parameter $n$, we ask how large a set of questions $Q$ needs to be so that for each distribution supported on $[n]$ there is a good strategy which uses only questions from $Q$.
Our first major result is that a linear number of questions (corresponding to binary comparison search trees) suffices to recover the $H(\mu)+1$ performance of Huffman’s algorithm. As a corollary, we deduce that the number of questions needed to guarantee a cost of at most $H(\mu)+r$ (for integer $r$) is asymptotic to $rn^{1/r}$.
Our second major result is that (roughly) $1.25^n$ questions are sufficient to match the performance of Huffman’s algorithm exactly, and this is tight for infinitely many $n$.
We also determine the number of questions sufficient to match the performance of Huffman’s algorithm up to $r$ to be $\Theta(n^{\Theta(1/r)})$.
The second part has appeared been published in Combinatorica. We hope to publish the first part at some point at an information theory journal.
The full version incorporates a third part (since relegated to a different paper), in which we show that the set of questions used to obtain the bound $H(\mu)+1$ performs better when the maximal probability of $\mu$ is small, bounding the performance between 0.5011 and 0.58607.
The full version also contains an extensive literature review, as well as many open questions.
See also follow-up work which addresses two of the open questions raised in the paper.
The Johnson and Kneser are graphs defined on the $k$-sets of $[n]$, a vertex set known as a slice of the Boolean cube. Two sets are connected in the Johnson graph if they have Hamming distance two, and in the Kneser graph if they are disjoint. Both graphs belong to the Bose–Mesner algebra of the Johnson association scheme; this just means that whether an edge connects two sets $S,T$ depends only on $|S \cap T|$.
All graphs belonging to the Bose–Mesner algebra have the same eigenspaces, and these are well-known, arising from certain representations of the symmetric group. The multiplicity of the $k$th eigenspace is rather large, ${n \choose k} – {n \choose k-1}$. As far as we can tell, prior to our work no explicit orthogonal basis for these eigenspace has been exhibited.
We present a simple orthogonal basis for the eigenspaces of the Bose–Mesner algebra of the Johnson association scheme, arising from Young’s orthogonal basis for the symmetric group. Our presentation is completely elementary and makes no mention of the symmetric group.
As an application, we restate Wimmer’s proof of Friedgut’s theorem for the slice. The original proof makes heavy use of computations over the symmetric group. We are able to do these computations directly in the slice using our basis.
Update: Qing Xiang and his student Rafael Plaza pointed out that the same basis has been constructed by Murali K. Srinivasan in his paper Symmetric chains, Gelfand–Tsetlin chains, and the Terwilliger algebra of the binary Hamming scheme.
Another update: The proof of Theorem 3.1 was clarified according to suggestions of Bruno Loff. See also this writeup for another approach, relying on the work of Murali K. Srinivasan mentioned above.
We present an optimal, combinatorial $1-1/e$ approximation algorithm for monotone submodular optimization over a matroid constraint. Compared to the continuous greedy algorithm due to Calinescu, Chekuri, Pál and Vondrák, our algorithm is extremely simple and requires no rounding. It consists of the greedy algorithm followed by local search. Both phases are run not on the actual objective function, but on a related auxiliary potential function, which is also monotone submodular.
In our previous work on maximum coverage (the preceding paper), the potential function gives more weight to elements covered multiple times. We generalize this approach from coverage functions to arbitrary monotone submodular functions. When the objective function is a coverage function, both definitions of the potential function coincide.
Our approach generalizes to the case where the monotone submodular function has restricted curvature. For any curvature $c$, we adapt our algorithm to produce a $(1-e^{-c})/c$ approximation. This matches results of Vondrák, who has shown that the continuous greedy algorithm produces a $(1-e^{-c})/c$ approximation when the objective function has curvature $c$ with respect to the optimum, and proved that achieving any better approximation ratio is impossible in the value oracle model.
The paper exists in several different versions:
The journal version supersedes the preceding versions.
Extremal combinatorics studies how large a collection of objects can be if it satisfies a given set of restrictions. Inspired by a classical theorem due to Erdős, Ko and Rado, Sominovits and Sós posed the following problem: determine how large a collection of graphs on the vertex set $\{1,\ldots,n\}$ can be, if the intersection of any two of them contains a triangle. They conjectured that the largest possible collection, containing $1/8$ of all graphs, consists of all graphs containing a fixed triangle (a triangle-star). The first major contribution of the thesis is a confirmation of this conjecture. This result first appeared in our paper Triangle-intersecting families of graphs with David Ellis and Ehud Friedgut.
We prove the Simonovits–Sós conjecture in the following strong form: the only triangle-intersecting families of the maximal measure $1/8$ are triangle-stars (uniqueness), and every triangle-intersecting family of measure $1/8-\epsilon$ is $O(\epsilon)$-close to a triangle-star (stability). Our proof uses spectral methods (Hoffman’s bound).
In order to prove the stability part of our theorem, we utilize a structure theorem for Boolean functions on $\{0,1\}^m$ whose Fourier expansion is concentrated on the first $t+1$ levels, due to Kindler and Safra. The second major contribution of this thesis consists of two analogs of this theorem for Boolean functions on $S_m$ whose Fourier expansion is concentrated on the first two levels. These results appear in our papers A quasi-stability result for dictatorships in $S_n$ and A stability results for balanced dictatorships in $S_n$, both with David Ellis and Ehud Friedgut.
In the same way that the Kindler–Safra theorem is useful for studying triangle-intersecting families, our structure theorems are useful for studying intersecting families of permutations, which are families in which any two permutations agree on the image of at least one point. Using one of our theorems, we give a simple proof of the following result of Ellis, Friedgut and Pilpel: an intersecting family of permutations on $S_m$ of size $(1-\epsilon)(m-1)!$ is $O(\epsilon)$-close to a double coset, a family which consists of all permutations sending some point $i$ to some point $j$.
The thesis includes a detailed exposition of Friedgut’s paper On the measure of intersecting families, uniqueness and stability, a proof of the Ahlswede–Khachatrian theorem in the $\mu_p$ setting, and a gentle introduction to the representation theory of $S_n$ from the point of view of class functions.
A survey of Friedgut’s research program in extremal combinatorics. Friedgut uses spectral methods — Hoffman’s eigenvalue bound — to obtain tight bounds on measures of intersecting families. His method has the advantage of implying stability: families of near-maximal measure are similar to families of maximal measure.
Results surveyed:
We consider the NP-complete optimization problem Bandwidth. Anupam Gupta gave an $O(\log^{2.5} n)$ approximation algorithm for trees, and showed that his algorithm has an approximation ratio of $O(\log n)$ on caterpillars, trees composed of a central path and paths emanating from it. We show that the same approximation ratio is obtained on trees composed of a central path and caterpillars emanating from it.
Our result relies on the following lemma.
Definition. A sequence $a_1,\ldots,a_n$ has thickness $\Theta$ if the sum of any $d$ consecutive elements is at most $d\Theta$, for $1 \leq d \leq n$.
Lemma. If a sequence has thickness $\Theta$, then the sequence obtained by ordering the elements in non-decreasing order also has thickness $\Theta$.
We give a novel proof of the FKN theorem using the Berry–Esseen theorem, and a novel proof of the biased FKN theorem using agreement.
We give a structure theorem for Boolean functions on the biased hypercube which are $\epsilon$-close to degree $d$ in $L_2$, showing that they are close to sparse juntas.
Our structure theorem implies that such functions are $O(\epsilon^{C_d} + p)$-close to constant functions. We pinpoint the exact value of the constant $C_d$.
This paper improves on our previous work in several ways:
The previous work applied in the more general $A$-valued setting. The proof in this work applies in that setting as well, but we only formulated it in the Boolean setting. The missing piece is the $A$-valued Kindler–Safra theorem, which follows in a black-box function from the usual Kindler–Safra theorem, as outlined in the previous version.
We show that a Boolean degree $d$ function on the slice ${[n]} \choose k$ is a junta if $k \geq 2d$, and that this bound is sharp. We prove a similar result for $A$-valued degree $d$ functions for arbitrary finite $A$, and for functions on an infinite analog of the slice.
One of the questions left open is the restriction threshold, which is the minimal $k$ which guarantees (for large enough $n$) that an $A$-valued degree $d$ function on ${[n]} \choose k$ is the restriction of an $A$-valued degree $d$ function on $\{0,1\}^n$. The paper gives an example in which the restriction threshold is larger than the junta threshold, and conjectures that the two coincide when $A = \{0,1\}$. This short note (joint with Antoine Vinciguerra) confirms this conjecture for all $A$ which are arithmetic progressions (and in particular, for $A = \{0,1\}$).
Every function on the Boolean cube $\{0,1\}^n$ has a unique presentations as a multilinear polynomial. This fails on the $(n,k)$ slice: for example, the multilinear polynomial $\sum_{i=1}^n x_i – k$ vanishes on the entire slice. An old result of Dunkl shows that functions on the slice can be presented uniquely as multilinear polynomials of degree at most $\min(k,n-k)$ satisfying an additional condition known as harmonicity: the sum of all partial derivatives is zero. An example of such a polynomial is $x_i – x_j$, and in fact the polynomials $x_i – x_j$ form a multiplicative basis for all harmonic multilinear polynomials.
We extend these results to the symmetric group and to the perfect matching scheme (the case of the multislice will be tackled in future work). In both cases, we extend the notion of harmonic to obtain a unique presentation theorem. In the case of the symmetric group, we also describe an explicit multiplicative basis.
The classical Friedgut–Kalai–Naor theorem describes the structure of linear functions on the Boolean cube which are almost Boolean. We describe extensions of this theorem to functions on the biased Boolean cube and on the slice, simplifying our earlier work.
We show that a function $f\colon S_n \to \{0,1\}$ which is close to degree 1 is close to a union of an almost-disjoint family of cosets. Our characterization is tight: any union of an almost-disjoint family of cosets is close to degree 1. This improves on our earlier work with David Ellis and Ehud Friedgut.
We complement this result, which is about the $L_2$ metric, with similar results in the $L_0$ and $L_\infty$ metrics.
A function $f\colon \{0,1\}^n \to \{0,1\}$ is a polymorphism of $g\colon \{0,1\}^m \to \{0,1\}$ if $f \circ g^n = g \circ f^m$. For example, $f$ is a polymorphism of $g(x,y) = x \oplus y$ if $f(x \oplus y) = f(x) \oplus f(y)$. Dokow and Holzman determined all polymorphisms of $g$ for arbitrary $g$.
What happens if $f$ only satisfies $f \circ g^n = g \circ f^m$ for most inputs? Extending earlier work on the case $g = \mathsf{AND}$, we show that in most cases, $f$ must be close to an exact polymorphism. When $g = \mathsf{NAND}$ or $g = \mathsf{NOR}$, we show that $f$ must be close to $\mathsf{AND}$ or $\mathsf{OR}$ (respectively).
We also discuss the list-decoding regime, showing that for each $g \neq \mathsf{XOR},\mathsf{NXOR}$ there is a constant $s_g < 1$ such that if $f \circ g^n = g \circ f^m$ holds with probability above $s_g$, then $f$ correlates with some low-degree character. When $g = \mathsf{AND}$, we determine the optimal value of $s_g$ to be roughly 0.815.
Hypercontractivity is the secret sauce in Boolean function analysis. Yet it many domains, hypercontractivity is useless for general functions. In recent work, Keevash, Lifshitz, Long and Mintzer showed that in such cases, hypercontractivity does hold for global functions, which are functions whose expectation doesn’t change significantly when restricting to subdomains of small codimension.
In this work, we extend this theory to functions on the symmetric group. As applications, we bound the size of global, product-free sets in the alternative group (via a level $k$ inequality), and prove a robust version of Kruskal–Katona.
We extend the theory of complexity measures of functions beyond the Boolean cube, to domains such as the symmetric group. We show that complexity measures such as degree, approximate degree, decision tree complexity, certificate complexity, block sensitivity and sensitivity are all polynomially related for many of these domains.
In addition, we characterize Boolean degree 1 functions on the perfect matching scheme, and simplify the proof of uniqueness for $t$-intersecting families of permutations and perfect matchings.
The Kindler–Safra theorem states that a constant degree almost Boolean function on $\{0,1\}^n$ is close to a junta. The theorem holds with respect to the $\mu_p$ measure whenever $p$ is bounded away from 0 and 1. When $p$ is very small, new phenomena emerge. For example, the sum of $\epsilon/p$ coordinates is $O(\epsilon^2)$-close to Boolean, yet is not close to a junta. My paper Friedgut–Kalai–Naor theorem for slices of the Boolean cube shows that this is essentially the only example in degree 1.
In this paper we generalize the Kindler–Safra theorem to the small $p$ regime. We show that a constant degree almost Boolean function is close to a constant degree sparse junta, a sparse polynomial having the property that on a random input, with high probability only a constant number of monomials are non-zero. As an application, we prove a large deviation bound, and show that constant degree Boolean functions are almost constant (though sparse juntas approximate them even better). Finally, we use our ideas to provide a new proof of the classical Kindler–Safra theorem.
A preliminary version of this work appeared in SODA’19. Unfortunately, that version relied on an agreement theorem whose proof contains a mistake (since corrected). The current version no longer relies on that agreement theorem. Instead, it employs a self-contained junta agreement theorem, following a suggestion of Dor Minzer.
This version is superseded by this new version.
If an $n$-bit Boolean function $f$ satisfies $f(x \land y) = f(x) \land f(y)$ then it is either constant or an AND. Nehama showed that if this equation holds most of the time, then $f$ is close to a constant or an AND. However, his bounds deteriorate with $n$.
We give a bound which is independent of $n$. This can be seen as a one-sided version of linearity testing, that should perhaps be called oligarchy testing.
A classical result of Nisan and Szegedy states that a Boolean degree $d$ function is a junta depending on at most $d2^{d-1}$ coordinates. We extend this result to the slice.
The multislice is a generalization of the slice to several colors.
We prove a Friedgut–Kalai–Naor theorem for balanced multislices. Our proof is inductive, and uses as a base case our Friedgut–Kalai–Naor theorem for balanced slices.
As an application, we prove stability versions of the edge-isoperimetric inequality for the multislices for settings of parameters in which the optimal set depends on a single coordinate.
The KKL theorem shows that every Boolean function can be biased by a coalition of $o(n)$ players. Russel, Saks and Zuckerman extended this result to multiround protocols having $o(\log^*n)$ rounds.
We extend both results to arbitrary product distributions on the Boolean hypercube.
The KKL theorem fails for highly biased coordinates. Indeed, such distributions exhibit qualitatively different behavior. Whether unbiased functions can be biased both ways with respect to the uniform distribution, the same doesn’t hold for highly biased distributions (for example, consider the OR function with respect to $\mu_p$ for $p=1/n$). This especially complicated the inductive step in the multiround setting. Our proof uses a novel boosting argument to overcome this difficulty.
We initiate the study of Boolean function analysis on high-dimensional expanders, and more generally, on weighted simplicial complexes.
We identify a linear-algebraic condition under which there is a notion of Fourier expansion for functions on the facets of the complex, a notion similar to the unique harmonic multilinear expansion for functions on the slice or Johnson scheme, and sharing many of its properties.
We prove a rudimentary FKN theorem for high-dimensional expanders, utilizing a recent agreement theorem of Dinur and Kaufman.
Our work also gives a novel definition of high-dimensional expansion. We also generalize this notion (work in progress) to the Grassmann scheme.
The classical invariance principle of Mossel, O’Donnell and Oleszkiewicz states that the distribution of low-degree, low-influence multilinear polynomials under a product distrribution essentially depends only on the first two moments of this distribution.
In recent work with Kindler and Wimmer, we extended this to harmonic multilinear polynomials on the slice. In that work we proved invariance with respect to the following three distributions: the uniform distribution on a slice, the matching skewed distribution on the Boolean cube, and the corresponding Gaussian slice (or Gaussian space; the distributions are the same). That invariance principle requires the function to have low degree and low influences.
While the condition of low influences is necessary when comparing discrete distributions to Gaussian space (consider the polynomial $x_1$), such a condition is no longer necessary when comparing the uniform distribution on a slice to the matching skewed distribution on the Boolean cube. In this paper we prove an invariance principle for these two distributions without any condition on the influences. Using the classical invariance principle, we can easily derive the more general invariance principle that we had proved with Kindler and Wimmer. Our new proof is completely different, and uses a martingale approach.
Complementing the invariance principle, we reprove several properties of harmonic multilinear polynomials. While most of these properties have been proven earlier in my paper using an explicit orthogonal basis for the slice, the proofs appearing in this paper are much simpler and do not require the basis. We hope that the new proofs are easier to generalize to other settings.
The multislice is a generalization of the slice. Whereas the slice consists of all vectors in $\{0,1\}^n$ of fixed Hamming weight, the multislice consists of all vectors in $[\ell]^n$ of fixed histogram.
The log-Sobolev inequality is a fundamental inequality in Boolean Function Analysis, equivalent to hypercontractivity. Lee and Yau determined (up to a constant factor) the optimal constant in the log-Sobolev inequality for the slice. We give a clear exposition of their argument, and extend it to the multislice.
As applications, we derive versions of the Kahn–Kalai–Linial, Friedgut’s junta, Kruskal–Katona, and Nisan–Szegedy theorems.
Our log-Sobolev inequality is tight only for constant $\ell$. Justin Salez has since proved a tight log-Sobolev inequality for all multislices.
An elementary result states that a Boolean degree 1 function on the hypercube is a dictator, and a similar result holds on the slice and on the symmetric group (a non-trivial result due to Ellis, Friedgut and Pilpel).
We explore this question on other domains. Our major result is a characterization of all Boolean degree 1 functions on the Grassmann scheme for $q=2,3,4,5$. A Boolean degree 1 function on these domains is either the indicator of a point, the indicator of a hyperplane, a combination of both, or the complement of one of the previous functions.
The classical invariance principle of Mossel, O’Donnell and Oleszkiewicz states that the distribution of low-degree, low-influence multilinear polynomials under Bernoulli random variables is similar to their distribution under Gaussian random variables with the same expectation and variance.
We prove an invariance principle for functions on the slice (all vectors in the Boolean cube having a fixed Hamming weight). The main difficulty is that the variables are no longer independent.
As corollaries, we prove a version of majority is stablest, a Bourgain tail bound, and a weak version of the Kindler–Safra theorem. The Kindler–Safra theorem implies a stability result for t-intersecting families along the lines of Friedgut.
In follow-up work, we improve the invariance principle by removing the condition of low influences (when appropriate).
The Friedgut–Kalai–Naor theorem is a fundamental result in the analysis of Boolean function. It states that if a Boolean function $f$ is close to an affine function, then $f$ is close to an affine Boolean function, which must depend on at most one coordinate. We prove an analog of this theorem for slices of the Boolean cube (a slice consists of all vectors having a given Hamming weight). In the small error regime, our theorem shows that $f$ is close to a function depending on at most one coordinate, and in general we show that $f$ or its negation is close to a maximum of a small number of coordinates (this corresponds to a union of stars, families consisting of all elements containing some fixed element).
See also our later simplified account.
The Johnson and Kneser are graphs defined on the $k$-sets of $[n]$, a vertex set known as a slice of the Boolean cube. Two sets are connected in the Johnson graph if they have Hamming distance two, and in the Kneser graph if they are disjoint. Both graphs belong to the Bose–Mesner algebra of the Johnson association scheme; this just means that whether an edge connects two sets $S,T$ depends only on $|S \cap T|$.
All graphs belonging to the Bose–Mesner algebra have the same eigenspaces, and these are well-known, arising from certain representations of the symmetric group. The multiplicity of the $k$th eigenspace is rather large, ${n \choose k} – {n \choose k-1}$. As far as we can tell, prior to our work no explicit orthogonal basis for these eigenspace has been exhibited.
We present a simple orthogonal basis for the eigenspaces of the Bose–Mesner algebra of the Johnson association scheme, arising from Young’s orthogonal basis for the symmetric group. Our presentation is completely elementary and makes no mention of the symmetric group.
As an application, we restate Wimmer’s proof of Friedgut’s theorem for the slice. The original proof makes heavy use of computations over the symmetric group. We are able to do these computations directly in the slice using our basis.
Update: Qing Xiang and his student Rafael Plaza pointed out that the same basis has been constructed by Murali K. Srinivasan in his paper Symmetric chains, Gelfand–Tsetlin chains, and the Terwilliger algebra of the binary Hamming scheme.
Another update: The proof of Theorem 3.1 was clarified according to suggestions of Bruno Loff. See also this writeup for another approach, relying on the work of Murali K. Srinivasan mentioned above.
We prove that Boolean functions on $S_n$ whose Fourier transform is highly concentrated on irreducible representations indexed by partitions of $n$ whose largest part has size at least $n-t$ are close to being unions of cosets of stabilizers of $t$-tuples. We also obtain an edge-isoperimetric inequality for the transposition graph on $S_n$ which is asymptotically sharp for sets of measure $1/\mathit{poly}(n)$. We then combine both results to obtain a best-possible edge-isoperimetric inequality for sets of size $(n-t)!$ where $n$ is large compared to $t$, confirming a conjecture of Ben-Efraim in these cases.
It is well-known that if $f$ is a Boolean function of degree $d$ then its total influence is bounded by $d$. There are several ways of extending the definition of influence for non-Boolean functions. The usual way is to define the influence of the $i$th variable as the $L_2$ norm of the discrete derivative in direction $i$. Under this definition, the total influence of a bounded function (bounded by 1 in magnitude) is still upper-bounded by the degree.
Aaronson and Ambainis asked whether total $L_1$ influence can be bounded polynomially by the degree, and this was answered affirmatively by Bačkurs and Bavarian, who showed an upper bound of $O(d^3)$ for general functions, and $O(d^2)$ for homogeneous functions. We improve their results by giving an upper bound of $d^2$ in the general case and $O(d\log d)$ in the homogenous case. Our proofs are also much simpler. We also give an almost optimal bound for monotone functions, $d/2\pi + o(d)$.
We prove that a balanced Boolean function on $S_n$ whose Fourier transform is highly concentrated on the first two irreducible representations of $S_n$ is close in structure to a dictatorship, a function which is determined by the image or pre-image of a single element. As a corollary, we obtain a stability result concerning extremal isoperimetric sets in the Cayley graph on $S_n$ generated by the transpositions.
Our proof works in the case where the expectation of the function is bounded away from 0 and 1. In contrast, the preceding paper deals with Boolean functions of expectation $O(1/n)$ whose Fourier transform is highly concentrated on the first two irreducible representations of $S_n$. These need not be close to dictatorships; rather, they must be close to a union of a constant number of cosets of point-stabilizers.
We prove that Boolean functions on $S_n$ whose Fourier transform is highly concentrated on the first two irreducible representations of $S_n$ are close to being unions of cosets of point-stabilizers. We use this to give a natural proof of a stability result on intersecting families of permutations, originally conjectured by Cameron and Ku, and first proved by David Ellis. We also use it to prove a ‘quasi-stability’ result for an edge-isoperimetric inequality in the transposition graph on $S_n$, namely that subsets of $S_n$ with small edge-boundary in the transposition graph are close to being unions of cosets of point-stabilizers.
Kelley and Meka recently proved strong bounds on the size of subsets of $\mathbb{Z}_N$ or $\mathbb{F}_q^n$ that do not contain 3-term arithmetic progression. We use their techniques to prove similar bounds for subsets of $\mathbb{F}_q^n$ that do not contain non-degenerate instances of affine binary linear systems whose underlying graph is 2-degenerate. We show that if a subset of $\mathbb{F}_q^n$ contains an atypical number of instances of an affine binary linear 2-degenerate system, then it has a constant density increment inside an affine subspace of polylogarithmic codimension. We give a counterexample showing that this kind of result does not hold for linear systems whose true complexity exceeds 1. Using the same techniques, we obtain a counting lemma for sparse quasirandom graphs, improving on the classical result of Chung, Graham, and Wilson (Combinatorica 1989), which is only nontrivial for dense graphs.
A function $f\colon \{0,1\}^n \to \{0,1\}$ is a polymorphism of a predicate $P \subseteq \{0,1\}^m$ if whenever $x^{(1)},\dots,x^{(n)} \in P$, then also $f(x^{(1)},\dots,x^{(n)}) = (f(x^{(1)}_1,\dots,x^{(n)}_1),\dots,f(x^{(1)}_m,\dots,x^{(n)}_m)) \in P$. Predicates and their polymorphisms are classified by Post’s lattice.
A special case of this framework is the truth-functional setting, in which $P = \{(x,y) : y = g(x)\}$, for some function $g\colon \{0,1\}^{m-1} \to \{0,1\}$. Stated differently (and switching from $m-1$ to $m$), a function $f$ is a polymorphism of $g$ if for every $n \times m$ array filled with $0,1$ entries, the following two operations always give the same result:
It is known that the only “polymorphic pairs” $f,g$ are ANDs, ORs and XORs. A similar result holds if we allow $f$ to depend on the column, that is, there are $m+1$ many different $f$’s (the extra one appears when first computing $g$ on each row). In symbols, we can express this as $f_0 \circ g = g \circ (f_1,\dots,f_m)$.
In this work, we solve the more general problem in which both $f$ depends on the column and $g$ depends on the row: $f_0 \circ (g_1,\dots,g_n) = g_0 \circ (f_1,\dots,f_m)$. The space of solutions becomes substantially more complicated. To prove the characterization, we think of both sides of the identity as functions from $\{0,1\}^{nm} \to \{0,1\}$, and consider their Fourier expansions, which must coincide.
We consider partitions of $\{0,1\}^n$ into subcubes, giving several examples of constructions which are irreducible in the sense that no nontrivial subpartition is a partition of a subcube. Our work leaves many interesting questions open.
The work is part of a three paper series. The first part is about partitions of $\mathbb{F}_q^n$ into affine subspaces, and the third part is about relations to proof complexity.
We consider partitions of $\mathbb{F}_q^n$ into affine subspaces, giving several examples of constructions which are irreducible in the sense that no nontrivial subpartition is a partition of an affine subspace. Our work leaves many interesting questions open.
The work is part of a three paper series. The second part is about partitions of $\{0,1\}^n$ into subcubes, and the third part is about relations to proof complexity.
We characterize the largest $2$-intersecting families of permutations of $\{1,2,\ldots,n\}$ and of perfect matchings of the complete graph $K_{2n}$ for all $n \geq 2$: the consist, respectively, of all permutations mapping $i_1$ to $j_1$ and $i_2 \neq i_1$ to $j_2 \neq j_1$, and of all perfect matchings containing two fixed edges.
The characterization uses techniques from our work Complexity measures on symmetric group and beyond, and answers open questions in Meagher and Razafimahatratra, 2-intersecting permutations and Fallat, Meagher and Shirazi, The Erdős–Ko–Rado theorem for 2-intersecting families of perfect matchings.
We give simpler algebraic proofs of uniqueness for several Erdős-Ko-Rado results, i.e., that the canonically intersecting families are the only largest intersecting families. Using these techniques, we characterize the largest partially 2-intersecting families of perfect hypermatchings, resolving a recent conjecture of Meagher, Shirazi, and Stevens.
In the Twenty Questions game, Bob picks a number $x$ according to a distribution $\mu$, and Alice, who known $\mu$, attempts to find $x$ by asking Bob Yes/No questions, which Bob answers truthfully. Alice’s goal is to minimize the expected number of questions. The optimal strategy for the game is given by a Huffman code.
A set of questions is optimal if using only questions from this set, for every $\mu$ there exists a strategy for Alice with the optimal expected number of questions. In previous work, we have shown that for every $n$ there is an optimal set of questions of size roughly $1.25^n$, and moreover, this is optimal for infinitely many $n$ (up to subexponential factors), namely, $n \approx 1.25 \cdot 2^m$.
In this paper, we prove that the optimal number of questions for $n$ of the form $\beta \cdot 2^m$ (where $\beta \in [1,2)$ is $2^{-G(\beta)n}$ (up to subexponential factors), where $G(\beta)$ is an explicit function which is strictly larger than $-\log_2 1.25$ for all $\beta \neq 1.25$.
We also extend the entire setup to $d$-ary questions, showing that the analog of the magic constant $1.25$ in this case is $1 + (d-1)/d^{d/(d-1)}$.
We generalize the Hoffman bound to simplicial complexes. We demonstrate our result by giving spectral proofs for Mantel’s theorem and for an optimal bound on $s$-wise intersecting families due to Frankl and Tokushige. As a new application, we give tight bounds on a question of Frankl about hypergraphs without extended triangles.
In the classical twenty questions game, Bob guesses an element from 1 to $n$, and Alice’s goal is to find the element using as few Yes/No questions as possible.
The game becomes more interesting when Bob chooses the element according to a distribution $\mu$ known to both players, and Alice attempts to minimize the expected number of questions. The optimal strategy is given by a Huffman code.
Rényi and Ulam asked what happens when Bob is allowed to lie. Of the several different ways of quantifying lies, we consider the setting in which Bob is allowed to lie at most $k$ times.
Rivest et al. showed that in the setting of the classical twenty questions game, allowing Bob to lie a fixed number of times requires Alice to ask $\log\log n$ additional questions per lie.
We extend the result of Rivest et al. to the distributional setting, in which the penalty is $H_2(\mu)$ per lie, where $H_2(\mu)=\sum_x \mu(x) \log\log (1/\mu(x))$.
As an application, we extend the result of Moran and Yehudayoff about distributional sorting to the setting of lies.
We extend the weighted Ahlswede–Khachatrian theorem to several new settings.
First, we extend the theorem to families on infinitely many points, showing that for most settings of the parameters, no new behavior is encountered.
Second, we extend the theorem to the Hamming scheme $\mathbb{Z}_m^n$, in which a family is $t$-intersecting if any two vectors have $t$ coordinates on which the values differ by less than $s$ (this corresponds to $p=s/m$).
Third, we extend the theorem to the continuous analog of the Hamming scheme, a power of the unit circle, in which a family is $t$-intersecting if any two vectors have $t$ coordinates on which the values differ by less than $p$.
A dyadic distribution is one in which all probabilities are negative powers of 2. If $\mu$ is a dyadic distribution on a finite domain, then Huffman’s algorithm produces a code whose average codeword length is $H(\mu)$. We can interpret this code as a decision tree.
A dyadic distribution can have many different optimal decision trees. We are interested in the following question: given $n$, what is the smallest list of queries that suffices to implement optimal decision trees for all dyadic distributions on $n$ elements?
We show that $1.25^{n+o(1)}$ queries suffice, and moreover, $1.25^{n-o(1)}$ queries are necessary for infinitely many $n$.
We also discuss how the number of queries scales as we allow slight deviations from optimality.
This paper is extracted from a longer STOC paper.
We extend the classical Sauer–Shelah–Perles lemma to lattices, using an argument which also appears in an unpublished monograph of Babai and Frankl on the linear algebra method in combinatorics.
The basic argument works only for lattices with nonvanishing Möbius function, but we are able to extend it to a slightly wider class of lattices.
The Sauer–Shelah–Perles lemma (in the strong form due to Pajor) fails for lattices with an induced copy of the lattice $0<1<2$. We conjecture that this is the only obstruction.
In a groundbreaking paper, Ellis, Friedgut and Pilpel showed that $t$-intersecting families of permutations contain at most $(n-t)!$ permutations, for large enough $n$. They also identified the extremal families: they are all $t$-cosets.
Unfortunately, Section 5 of the paper, which identifies the extremal families, is wrong. Fortunately, the identification follows from a different paper of Ellis, which is already mentioned in the original paper. A different proof follows from our work on complexity measures on the symmetric group.
We provide a complete proof of the Ahlswede–Khachatrian theorem in the $\mu_p$ setting: for all values of $n,t,p$, we determine the maximum $\mu_p$-measure of a $t$-intersecting family on $n$ points, and describe all optimal families (except for a few exceptional parameter settings). Our proof is based on several different articles of Ahlswede and Khachatrian.
The full version below includes more details, as well as some follow-up work.
Suppose that an element $x$ is sampled according to a known distribution $\mu$. Gilbert and Moore showed how to find $x$ using $H(\mu)+2$ comparison queries (in expectation).
In contrast, the classical Shannon–Fano code can recover $x$ using only $H(\mu)+1$ queries, but these queries can be arbitrary. Can we recover this result using a more restricted set of queries?
In this paper we show that by allowing equality queries, we can recover the performance of the Shannon–Fano code. We also determine the optimal number of queries needed to guarantee that $x$ can be recovered using $H(\mu)+r$ queries, for arbitrary $r\geq 1$. (Smaller $r$ cannot be achieved in general.)
This manuscript forms the first part of our more extensive STOC paper.
Given a distribution $\mu$ and a set of queries $Q$ (for example, all comparison queries), the cost of $Q$ on $\mu$ is the average depth of a decision tree that locates an item in the support of $\mu$ using only queries from $Q$.
The cost of any set of queries on $\mu$ is always at least the entropy $H(\mu)$, and the redundancy of $Q$ is the maximal gap between the cost of $Q$ on a distribution and its entropy.
The prolixity of a set of queries is defined in the same way, with the entropy replaced by the cost of the optimal unrestricted decision tree.
The redundancy and prolixity of a set of queries is often achieved on degenerate distributions which have very low entropy, and this suggests studying their asymptotic variants, in which we consider distributions whose min-entropy tends to infinity.
Gallager showed that the asymptotic redundancy of unrestricted decision trees is 0.086.
We obtain bounds, and in some case determine, the non-asymptotic and asymptotic redundancy and prolixity of several sets of queries, including comparison queries, comparison and equality queries, and interval queries.
The results in this paper are extracted from the full version of our STOC paper.
Given a distribution $\mu$, the goal of the distributional 20 questions game is to construct a strategy that identifies an unknown element drawn from $\mu$ using as little yes/no queries on average. Huffman’s algorithm constructs an optimal strategy, but the questions one has to ask can be arbitrary.
Given a parameter $n$, we ask how large a set of questions $Q$ needs to be so that for each distribution supported on $[n]$ there is a good strategy which uses only questions from $Q$.
Our first major result is that a linear number of questions (corresponding to binary comparison search trees) suffices to recover the $H(\mu)+1$ performance of Huffman’s algorithm. As a corollary, we deduce that the number of questions needed to guarantee a cost of at most $H(\mu)+r$ (for integer $r$) is asymptotic to $rn^{1/r}$.
Our second major result is that (roughly) $1.25^n$ questions are sufficient to match the performance of Huffman’s algorithm exactly, and this is tight for infinitely many $n$.
We also determine the number of questions sufficient to match the performance of Huffman’s algorithm up to $r$ to be $\Theta(n^{\Theta(1/r)})$.
The second part has appeared been published in Combinatorica. We hope to publish the first part at some point at an information theory journal.
The full version incorporates a third part (since relegated to a different paper), in which we show that the set of questions used to obtain the bound $H(\mu)+1$ performs better when the maximal probability of $\mu$ is small, bounding the performance between 0.5011 and 0.58607.
The full version also contains an extensive literature review, as well as many open questions.
See also follow-up work which addresses two of the open questions raised in the paper.
A family of graphs is said to be triangle-intersecting if the intersection of any two graphs in the family contains a triangle. A conjecture of Simonovits and Sós from 1976 states that the largest triangle-intersecting families of graphs on a fixed set of $n$ vertices are those obtained by fixing a specific triangle and taking all graphs containing it, resulting in a family containing 1/8 of all graphs.. We prove this conjecture and some generalizations (for example, we prove that the same is true of odd-cycle-intersecting families, and we obtain best possible bounds on the size of the family under different, not necessarily uniform, measures). We also obtain stability results, showing that almost-largest triangle-intersecting families have approximately the same structure.
The arXiv version corrects a mistake in the proof of the stability part of Corollary 2.3. The original proof assumed that the approximating family $G$ is odd-cycle-agreeing, and deduced that $G$ must be a triangle-junta from uniqueness. However, showing that $G$ is odd-cycle-agreeing requires a separate argument, found in the new version.
An alternative presentation of the results in this paper can be found in my PhD thesis.
Snepscheut came up with the following puzzle. There is a table with four coins at the center of the four sides. The goal is to get all of the coins in the same orientation, that is, all heads or all tails. You never get to see the coins (you are blindfolded), but you can turn one or two of them. Prior to each move, the table is rotated by an arbitrary, unknown multiple of 90 degrees. How many moves do you need to reach the goal?
Dijkstra generalized the solution to $2^n$ sides. We consider an even more general problem, in which the table has $n$ sides, and the coins have $m$ “states” which are modified by addition modulo $m$ (more generally, one could think of a finite Abelian group for the state of the coins). We show that:
For similar results and more, see Rotating-table games and derivatives of words by Bar Yehuda, Etzion and Moran.
A function $f\colon \{0,1\}^n \to \{0,1\}$ is a polymorphism of $g\colon \{0,1\}^m \to \{0,1\}$ if $f \circ g^n = g \circ f^m$. For example, $f$ is a polymorphism of $g(x,y) = x \oplus y$ if $f(x \oplus y) = f(x) \oplus f(y)$. Dokow and Holzman determined all polymorphisms of $g$ for arbitrary $g$.
What happens if $f$ only satisfies $f \circ g^n = g \circ f^m$ for most inputs? Extending earlier work on the case $g = \mathsf{AND}$, we show that in most cases, $f$ must be close to an exact polymorphism. When $g = \mathsf{NAND}$ or $g = \mathsf{NOR}$, we show that $f$ must be close to $\mathsf{AND}$ or $\mathsf{OR}$ (respectively).
We also discuss the list-decoding regime, showing that for each $g \neq \mathsf{XOR},\mathsf{NXOR}$ there is a constant $s_g < 1$ such that if $f \circ g^n = g \circ f^m$ holds with probability above $s_g$, then $f$ correlates with some low-degree character. When $g = \mathsf{AND}$, we determine the optimal value of $s_g$ to be roughly 0.815.
If an $n$-bit Boolean function $f$ satisfies $f(x \land y) = f(x) \land f(y)$ then it is either constant or an AND. Nehama showed that if this equation holds most of the time, then $f$ is close to a constant or an AND. However, his bounds deteriorate with $n$.
We give a bound which is independent of $n$. This can be seen as a one-sided version of linearity testing, that should perhaps be called oligarchy testing.
In the classical agreement test, we are given a black box which accepts a subset $S$ of $\{1,\ldots,n\}$ of size $k$, and outputs a binary vector of length $k$. The task is to determine whether the answers of the black box correspond to a global binary vector of length $n$. Dinur and Steurer showed that this property can be tested by picking two sets $S_1,S_2$ with intersection $k/2$, and comparing the answers of the black box on their intersection.
In this paper we generalize the results of Dinur and Steurer to the high-dimensional case, in which the black box outputs a bit for any $d$-tuple of coordinates, and the task is to determine whether the answers of the black box correspond to a 2-coloring of the complete $d$-uniform hypergraph on $n$ vertices. We show that for any constant $d$, the test proposed by Dinur and Steurer generalizes to this setting as well.
At the heart of our proof is a new hypergraph pruning lemma of independent interest. The lemma states that given a probability $p$, any $d$-uniform hypergraph $H$ can be pruned to a subhypergraph $H’$ such that $q_{=1}(H’)=\Omega(q_{>0}(G))$, where $q_{=1}(H’)$ is the probability that the subhypergraph of $H’$ induced by a random subset of the vertices chosen according to $\mu_p$ contains exactly one hyperedge, and $q_{>0}(H)$ is defined analogously. Crucially, the hidden constant depends on $d$ but not on $p$.
In the companion work Low degree almost Boolean functions are sparse juntas, we apply the new agreement test to prove a Kindler–Safra theorem for the biased Boolean cube. The merged version of both papers also contains another application: a generalization of the low degree test for the biased Boolean cube.
The conference version contains a mistake, which we have since corrected.
Lifting theorems are used for transferring lower bounds between Boolean function complexity measures. Given a lower bound on a complexity measure $A$ for some function $f$, we compose $f$ with a carefully chosen gadget $g$ and get essentially the same lower bound on a complexity measure $B$ for the lifted function $f \diamond g$. Lifting theorems have a number of applications in many different areas such as circuit complexity, communication complexity, proof complexity, and so on.
One of the main question in the context of lifting is how to choose a suitable gadget $g$. Generally, to get better results, that is, to minimize the loss when transferring lower bounds, we need the gadget to be of constant size (number of inputs). Unfortunately, in many settings we only know lifting results for gadgets whose size grows with the size of $f$, and it is unclear whether it can be improved to a constant size gadget. This motivates us to identify the properties of gadgets that make lifting possible.
In this paper, we systematically study the question “For which gadgets does the lifting result hold?” in the following four settings:
In all cases, we give a complete classification of gadgets by exposing the properties of gadgets that make lifting results hold. The structure of the results shows that there are no intermediate cases—for every gadget there is either a polynomial lifting or no lifting at all. As a byproduct of our studies, we prove the log-rank conjecture for the class of functions that can be represented as $f \diamond \mathrm{OR} \diamond \mathrm{XOR}$ for some function $f$.
We consider the following question of bounded simultaneous messages (BSM) protocols: Can computationally unbounded Alice and Bob evaluate a function $f(x,y)$ of their inputs by sending polynomial-size messages to a computationally bounded Carol? The special case where $f$ is the mod-2 inner-product function and Carol is bounded to $\mathsf{AC}^0$ has been studied in previous works. The general question can be broadly motivated by applications in which distributed computation is more costly than local computation, including secure two-party computation.
In this work, we initiate a more systematic study of the BSM model, with different functions $f$ and computational bounds on Carol. In particular, we give evidence against the existence of BSM protocols with polynomial-size Carol for naturally distributed variants of NP-complete languages.
Viola considered the complexity of (approximately) sampling from a given distribution. We consider the uniform distribution over vectors in $\{0,1\}^n$ of weight $k$. We show that when $k$ is constant, any approximate sampler must have locality $\tilde\Omega(\log n)$, almost matching the upper bound $O(\log n)$.
Beyersdorff et al. considered the complexity of generating from a given set. One natural example is the set of vectors in $\{0,1\}^n$ with a majority of ones. They gave a “proof system” with locality $O(\log^2 n)$, and proved a lower bound of $\Omega(\log^* n)$. We improve the lower bound to $\Omega(\sqrt{\log n})$.
Braverman’s celebrated theorem states that $\mathsf{polylog}(n)$-independent sources fool $\mathsf{AC^0}$. In contrast, since the approximate degree of $\mathsf{OR}$ is $\Theta(\sqrt{n})$, we can construct $\Theta(\sqrt{n})$-indistinguishable sources which can be told apart by $\mathsf{OR}$.
In this work, we attempt to bridge this gap by considering what happens when the $k$-indistinguishable sources are simple: have low degree over $\mathbb{F}_2$, or can be sampled in low depth.
Our main positive result constructs $\Theta(\sqrt{n})$-indistinguishable sources distinguishable by $\mathsf{OR}$ which are samplable (a) by polynomial size decision tree (and so by DNFs/CNFs), and (b) by polynomials of degree $O(\log n)$.
In contrast, we show that $O(1)$-indistinguishable sources of constant degree do fool $\mathsf{OR}$, and $O(\log^{10}(n/\epsilon))$-indistinguishable quadratic sources $\epsilon$-fool unambiguous DNFs, even if only one of the two sources is simple. We also show that $O(\log\log(n/\epsilon))$-indistinguishable depth 1 sources are $\epsilon$-close in statistical distance.
Lower bounds on concrete functions are very hard to prove. The best known lower bound on the circuit complexity of any concrete function is only linear. In contrast, we have a nearly cubic bound on the formula complexity of a concrete function, namely Andreev’s function. This lower bound is proved via shrinkage, and was first obtained in this strong form by Håstad. Later on, Avishay Tal reproved the result using different techniques, shaving some lower order factors.
Shrinkage is the phenomenon in which a de Morgan formula shrinks by a factor of $O(p^2)$ (in expectation and with high probability) when subjected to a random restriction which leaves only a $p$-fraction of its values alive. In this regard, shrinkage is similar to the switching lemma, which is about simplification of DNFs and CNFs. However, in contrast to the switching lemma, which is known to hold for a variety of distributions, shrinkage has only been considered for standard random restrictions.
In this paper we prove a more general switching lemma, which works for two types of random projections (generalizing random restrictions) which we call fixing and hiding. As an application, we prove a (nearly) cubic formula lower bound for an Andreev-like function in $\mathsf{AC^0}$. Using a different kind of random restriction is necessary in this case, since standard random restrictions are known to drastically simplify functions in $\mathsf{AC^0}$, by the switching lemma. Our proof closely follows Håstad’s, and could serve as an exposition of his proof.
We construct an explicit family of 3XOR instances which are hard for $O(\sqrt{\log n})$ levels of Sum-of-Squares. Our constructions are based on the LSV complexes, and rely on two of their crucial properties: cosystolic expansion and local nonpositive curvature (via Gromov’s filling inequality, which generalizes the isoperimetric inequality in $\mathbb{R}^n$)).
In contrast to many other constructions, our variables correspond to edges in the complex. Curiously, Alev, Jeronimo and Tulsiani showed that if variables correspond to vertices, then instances based on high-dimensional expanders are easy.
Using a different chain complex, Max Hopkins and Ting–Chun Lin were able to produce an instance which is hard for $\Omega(n)$ levels.
It is well-known that $\mathsf{AC^0}$ circuits cannot compute inner product (since parity is hard for $\mathsf{AC^0}$).
What if we allow each of the parties to preprocess their input?
If we could show that bounded depth circuits of quasipolynomial size cannot computer inner product even with arbitrary polynomial length preprocessing, then this would imply that inner product is outside the polynomial hierarchy of communication complexity ($\mathsf{PH^{cc}}$).
We show that $\mathsf{AC^0}$ circuits cannot compute inner product even if one party has unlimited preprocessing, and the other one is limited to preprocessing of length $n+n/\log^{\omega(1)} n$.
Our lower bound also applies to pseudorandom functions.
In ongoing work, we extend these results to correlation bounds.
Coppersmith and Winograd gave an $O(n^{2.376})$ algorithm for matrix multiplication in 1990. Their algorithm relies on an identity known as the Coppersmith–Winograd identity. Analyzing the identity as-is using Strassen’s laser method and an ingenious construction, Coppersmith and Winograd obtained an $O(n^{2.388})$ algorithm. The tensor square of the basic identity leads to the improved algorithm.
Recently there has been a surge of activity in the area. Stothers, Vassilevska-Williams and Le Gall studied higher and higher tensor powers of the basic identity, culminating in Le Gall’s $O(n^{2.3728639})$ algorithm. How far can this approach go?
We describe a framework, laser method with merging, which encompasses all the algorithms just described, and is at once more general and amenable to analysis. We show that taking the $N$th tensor power for an arbitrary $N$ cannot obtain an algorithm with running time $O(n^{2.3725})$ for the exact identity used in state-of-the-art algorithms.
An approximate computation of a Boolean function f by a circuit or switching network M is a computation in which M computes f correctly on the majority of the inputs (rather than on all of them). Besides being interesting in their own right, lower bounds for approximate computation have proved useful in many subareas of complexity theory such as cryptography and derandomization. Lower bounds for approximate computation are also known as correlation bounds or average case hardness.
We obtain the first average case monotone depth lower bounds for a function in monotone $\mathsf{P}$. We tolerate errors up to $1/2 – 1/n^{1/3-\delta}$. Specifically, we prove average case exponential lower bounds on the size of monotone switching networks for the GEN function. As a corollary, we establish that for every $i$ there are functions that can be computed with no error in monotone $\mathsf{NC}^{i+1}$ but that cannot be computed without large error by monotone circuits in $\mathsf{NC}^i$. We provide a similar separation between monotone $\mathsf{NC}$ and monotone $\mathsf{P}$.
Our proof extends and simplifies the Fourier-analytic technique due to Potechin and further developed by Chan and Potechin.
As a corollary of our main lower bound, we prove that the communication complexity approach for monotone depth lower bounds does not naturally generalize to the average case setting.
In 1990 Subramanian defined the complexity class $\mathsf{CC}$ as the set of problems log-space reducible to the comparator circuit value problem (CCV). He and Mayr showed that $\mathsf{NL}\subseteq \mathsf{CC}\subseteq \mathsf{P}$, and proved that in addition to CCV several other problems are complete for $\mathsf{CC}$, including the stable marriage problem, and finding the lexicographically first maximal matching in a bipartite graph. Although the class has not received much attention since then, we are interested in $\mathsf{CC}$ because we conjecture that it is incomparable with the parallel class $\mathsf{NC}$ which also satisfies $\mathsf{NL}\subseteq \mathsf{NC}\subseteq \mathsf{P}$; this implies that $\mathsf{CC}$-complete problems don’t have an efficient polylog time parallel algorithm. We provide evidence for our conjecture by giving oracle settings in which relativized $\mathsf{CC}$ and relativized $\mathsf{NC}$ are incomparable.
We give several alternative definitions of $\mathsf{CC}$, including (among others) the class of problems computed by uniform polynomial-size families of comparator circuits supplied with copies of the input and its negation, the class of problems $\mathsf{AC^0}$-reducible to CCV, and the class of problems computed by uniform $\mathsf{AC^0}$ circuits with CCV gates. We also give a machine model for $\mathsf{CC}$ which corresponds to its characterizations as log-space uniform polynomial-size families of comparator circuits. The various characterizations show that $\mathsf{CC}$ is a robust class. Our techniques also show that the corresponding function class FCC is closed under composition. The main technical tool we employ is universal comparator circuits.
Other results include a simpler proof of $\mathsf{NL}\subseteq \mathsf{CC}$, a more careful analysis showing that the lexicographically first maximal matching problem and its variants are $\mathsf{CC}$-complete under $\mathsf{AC^0}$ many-one reductions, and an explanation of the relation between the Gale–Shapley algorithm and Subramanian’s algorithm for stable marriage.
This paper continues previous work of Cook, Lê and Ye which focused on Cook–Nguyen style uniform proof complexity, answering several open questions raised in that paper.
The preprint contains more results than the arXiv version, and the presentation is different.
We prove a query-to-communication lifting theorem in both the deterministic and randomized settings, using inner product as the lifting gadget.
Our result improves on the lifting theorem of Göös, Pitassi and Watson for randomized protocols, which used the indexing gadget.
Moreover, we reprove the lifting theorem of Chattopadhyay, Koucký, Loff and Mukhopadhyay for deterministic protocols using inner product.
Whereas Chattopadhyay et al. proved their result using thickness as the pseudorandomness notion (following the seminal work of Raz and McKenzie), we are able to use blockwise min-entropy, the same pseudorandomness notion used by Göös et al. to prove their randomized lifting theorem.
We discuss the following general question: How does the information complexity of a function change if we allow non-negligible error?
Our answers have implications to the communication complexity of set disjointness with non-negligible error.
We find the optimal protocol (from the point of view of information complexity) for the AND function in a multiparty setting, extending the buzzer protocol described by Braverman et al.
We also fill in some gaps in the arguments of Braverman et al.
Tree-like Resolution proves the unsatisfiability of a CNF $\varphi$ by giving a decision tree for the falsified clause problem. The leaves of the free form a partition of $\{0,1\}^n$ into “monochromatic” subcubes, each of which is a strengthening of a negation of a term of $\varphi$.
We consider the HITTING proof system, in which a CNF is refuted by giving a partition of $\{0,1\}^n$ into monochromatic subcubes, and analyze its relation to other proof systems. We also consider a linear analog of HITTING which is a generalization of Tree-like Resolution over linear forms.
The work is part of a three paper series. The first part is about partitions of $\mathbb{F}_q^n$ into affine subspaces, and the second part is about partitions of $\{0,1\}^n$ into subcubes.
MaxSAT Resolution is a version of Resolution designed to solve the MaxSAT problem. However, it is also possible to use it as a refutation system.
We analyze MaxSAT Resolution (MaxRes), MaxRes with weakening (MaxResW), and a related proof system, SubCubeSums, which is a special case of Sherali–Adams:
Alekhnovich and Razborov (Lower bounds for polynomial calculus: non-binomial case) gave a sophisticated degree lower bound for proofs in the polynomial calculus proof system. However, their proof is somewhat opaque. We present a different, more intuitive proof. We also adapt our proof to obtain related results of Galesi and Lauria.
Work done while visiting KTH in December 2012.
In 2003, Atserias and Dalmau resolved a major open question about the resolution proof system by establishing that the space complexity of formulas is always an upper bound on the width needed to refute them. Their proof is beautiful but somewhat mysterious in that it relies heavily on tools from finite model theory.
We give an alternative, completely elementary, proof that works by simple syntactic manipulations of resolution refutations. As a by-product, we develop a “black-box” technique for proving space lower bounds via a “static” complexity measure that works against any resolution refutation — previous techniques have been inherently adaptive.
We conclude by showing that the related question for polynomial calculus (i.e., whether space is an upper bound on degree) seems unlikely to be resolvable by similar methods.
During the last decade, an active line of research in proof complexity has been into the space complexity of proofs and how space is related to other measures. By now these aspects of resolution are fairly well understood, but many open problems remain for the related but stronger polynomial calculus (PC/PCR) proof system. For instance, the space complexity of many standard “benchmark formulas” is still open, as well as the relation of space to size and degree in PC/PCR.
We prove that if a formula requires large resolution width, then making XOR substitution yields a formula requiring large PCR space, providing some circumstantial evidence that degree might be a lower bound for space. More importantly, this immediately yields formulas that are very hard for space but very easy for size, exhibiting a size-space separation similar to what is known for resolution. Using related ideas, we show that if a graph has good expansion and in addition its edge set can be partitioned into short cycles, then the Tseitin formula over this graph requires large PCR space. In particular, Tseitin formulas over random 4-regular graphs almost surely require space at least $\Omega(\sqrt{n})$.
Our proofs use techniques recently introduced in [Bonacina-Galesi ’13]. Our final contribution, however, is to show that these techniques provably cannot yield non-constant space lower bounds for the functional pigeonhole principle, delineating the limitations of this framework and suggesting that we are still far from characterizing PC/PCR space.
Cutting planes is a proof system in which lines are linear inequalities. It has two main variants: syntactic cutting planes, in which specific derivation rules are given, and semantic cutting planes, in which any bounded fan-in derivation which is semantically correct (for all zero-one assignments to variables) is allowed. Only the syntactic version is a Cook–Reckhow proof system, since verifying a semantic cutting planes proof is coNP-complete.
Extending earlier work of Pudlák, we give an exponential lower bounds for semantic cutting planes.
We also show that semantic cutting planes is exponentially stronger than syntactic cutting planes, and exhibit two contradictory lines which take exponentially long to refute in syntactic cutting planes.
This work is a combination of two earlier preprints: a preprint of Pavel Hrubeš proving the exponential lower bound for semantic cutting planes, and a preprint of Massimo Lauria and myself proving the exponential separation between semantic and syntactic cutting planes.
During the last decade, an active line of research in proof complexity has been to study space complexity and time-space trade-offs for proofs. Besides being a natural complexity measure of intrinsic interest, space is also an important issue in SAT solving, and so research has mostly focused on weak systems that are used by SAT solvers.
There has been a relatively long sequence of papers on space in resolution, which is now reasonably well understood from this point of view. For other natural candidates to study, however, such as polynomial calculus or cutting planes, very little has been known. We are not aware of any nontrivial space lower bounds for cutting planes, and for polynomial calculus the only lower bound has been for CNF formulas of unbounded width in Alekhnovich et al., where the space lower bound is smaller than the initial width of the clauses in the formulas. Thus, in particular, it has been consistent with current knowledge that polynomial calculus could be able to refute any $k$-CNF formula in constant space.
We prove several new results on space in polynomial calculus (PC), and in the extended proof system polynomial calculus resolution (PCR) studied by Alekhnovich et al.:
We give a general transformation which turns polynomial-size Frege proofs to subexponential-size $\mathsf{AC^0}$-Frege proofs. This indicates that proving exponential lower bounds for $\mathsf{AC^0}$-Frege is hard, since it is a longstanding open problem to probe super-polynomial lower bounds for Frege. Our construction is optimal for tree-like proofs.
As a consequence of our main result, we are able to shed some light on the question of weak automatizability for bounded-depth Frege systems. First, we present a simpler proof of the results of Bonet et al. showing that under cryptographic assumptions, bounded-depth Frege proofs are not weakly automatizable. Second, we show that because our proof is more general, under the right cryptographic assumptions it could resolve the weak automatizatbility question for lower depth Frege systems.
The proceedings version contains several small mistakes, corrected in the preprint version. These mistakes slightly affect the constants in the main theorem.
The greedy algorithm gives a $1-1/e$ approximation for maximum coverage subject to a cardinality constraint, and this known to be optimal (unless P=NP). The same algorithm also gives a $1-1/e$ approximation for the more general problem of monotone submodular maximization subject to a cardinality constraint, and this is also known to be optimal in the value oracle model.
What happens when the cardinality constraint $k$ is a fixed fraction $c$ of the total number of sets $n$, that is, $k = cn$? A random solution gives a $c$ approximation, which already improves on $1-1/e$ when $c > 1-1/e$. We show that for every $c > 0$, both approximation algorithms can be improved.
In the case of monotone submodular maximization subject to a cardinality constraint, we show that the measured continuous greedy algorithm gives a $1-(1-c)^{1/c}$ approximation, which is tight when $1/c$ is an integer; we conjecture it to be tight for all $c$.
In the case of maximum coverage subject to a cardinality constraint, we first show that the natural LP gives a $1-(1-c)^{1/c}$ approximation when $1/c$ is an integer and a better approximation when $1/c$ is fractional. We give an integrality gap which matches our rounding scheme. The integrality gap also translates to a value oracle hardness for monotone submodular maximization subject to a cardinality constraint.
We then show how to improve on the LP using an SDP for $c = 1/2$ (provably) and for $1/2 < c < 1$ (conjecturally and numerically). This separates the two problems, showing that maximum coverage is more approximable than monotone submodular maximization in this setting. To the best of our knowledge, this is the first such separation in a natural setting.
We study the problem of maximizing XOS functions, which are functions that can be written as a maximum of linear functions. The number of linear functions is known as the width.
We show that the threshold inapproximability is $\Theta(n/\log n)$ in the general case, and $k-1$ for width $k$ XOS functions (where $k \geq 2$).
We analyze random order greedy, showing that its approximation ratio is at least 0.5096 for any number of parts.
We analyze random order greedy, a variant of the greedy algorithm for partition matroids, when the number of parts is small.
We show that the approximation ratio in the case of 2 parts is 2/3, and in the case of 3 parts is 7/12. We also give bounds on the approximation ratio in the case of 4 parts.
We present an optimal, combinatorial $1-1/e$ approximation algorithm for monotone submodular optimization over a matroid constraint. Compared to the continuous greedy algorithm due to Calinescu, Chekuri, Pál and Vondrák, our algorithm is extremely simple and requires no rounding. It consists of the greedy algorithm followed by local search. Both phases are run not on the actual objective function, but on a related auxiliary potential function, which is also monotone submodular.
In our previous work on maximum coverage (the preceding paper), the potential function gives more weight to elements covered multiple times. We generalize this approach from coverage functions to arbitrary monotone submodular functions. When the objective function is a coverage function, both definitions of the potential function coincide.
Our approach generalizes to the case where the monotone submodular function has restricted curvature. For any curvature $c$, we adapt our algorithm to produce a $(1-e^{-c})/c$ approximation. This matches results of Vondrák, who has shown that the continuous greedy algorithm produces a $(1-e^{-c})/c$ approximation when the objective function has curvature $c$ with respect to the optimum, and proved that achieving any better approximation ratio is impossible in the value oracle model.
The paper exists in several different versions:
The journal version supersedes the preceding versions.
We present an optimal, combinatorial $1-1/e$ approximation algorithm for Maximum Coverage over a matroid constraint, using non-oblivious local search. Calinescu, Chekuri, Pál and Vondrák have given an optimal $1-1/e$ approximation algorithm for the more general problem of monotone submodular maximization over a matroid constraint. The advantage of our algorithm is that it is entirely combinatorial, and in many circumstances also faster, as well as conceptually simpler.
Following previous work on satisfiability problems by Alimonti and by Khanna, Motwani, Sudan and Vazirani, our local search algorithm is non-oblivious. That is, our algorithm uses an auxiliary linear objective function to evaluate solutions. This function gives more weight to elements covered multiple times. We show that the locality ratio of the resulting local search procedure is at least $1-1/e$. Our local search procedure only considers improvements of size 1. In contrast, we show that oblivious local search, guided only by the problem’s objective function, achieves an approximation ratio of only $\frac{n-1}{2n-1-k}$ when improvements of size $k$ are considered.
In general, our local search algorithm could take an exponential amount of time to converge to an exact local optimum. We address this situation by using a combination of approximate local search and the same partial enumeration techniques used by Calinescu et al., resulting in a clear (1−1/e)-approximation algorithm running in polynomial time.
We obtained our auxiliary linear objective function using linear programming. This is detailed in Ward’s thesis.
We consider the NP-complete optimization problem Bandwidth. Anupam Gupta gave an $O(\log^{2.5} n)$ approximation algorithm for trees, and showed that his algorithm has an approximation ratio of $O(\log n)$ on caterpillars, trees composed of a central path and paths emanating from it. We show that the same approximation ratio is obtained on trees composed of a central path and caterpillars emanating from it.
Our result relies on the following lemma.
Definition. A sequence $a_1,\ldots,a_n$ has thickness $\Theta$ if the sum of any $d$ consecutive elements is at most $d\Theta$, for $1 \leq d \leq n$.
Lemma. If a sequence has thickness $\Theta$, then the sequence obtained by ordering the elements in non-decreasing order also has thickness $\Theta$.
A function $f\colon \{0,1\}^n \to \{0,1\}$ is a polymorphism of a predicate $P \subseteq \{0,1\}^m$ if whenever $x^{(1)},\dots,x^{(n)} \in P$, then also $f(x^{(1)},\dots,x^{(n)}) = (f(x^{(1)}_1,\dots,x^{(n)}_1),\dots,f(x^{(1)}_m,\dots,x^{(n)}_m)) \in P$. Predicates and their polymorphisms are classified by Post’s lattice.
A special case of this framework is the truth-functional setting, in which $P = \{(x,y) : y = g(x)\}$, for some function $g\colon \{0,1\}^{m-1} \to \{0,1\}$. Stated differently (and switching from $m-1$ to $m$), a function $f$ is a polymorphism of $g$ if for every $n \times m$ array filled with $0,1$ entries, the following two operations always give the same result:
It is known that the only “polymorphic pairs” $f,g$ are ANDs, ORs and XORs. A similar result holds if we allow $f$ to depend on the column, that is, there are $m+1$ many different $f$’s (the extra one appears when first computing $g$ on each row). In symbols, we can express this as $f_0 \circ g = g \circ (f_1,\dots,f_m)$.
In this work, we solve the more general problem in which both $f$ depends on the column and $g$ depends on the row: $f_0 \circ (g_1,\dots,g_n) = g_0 \circ (f_1,\dots,f_m)$. The space of solutions becomes substantially more complicated. To prove the characterization, we think of both sides of the identity as functions from $\{0,1\}^{nm} \to \{0,1\}$, and consider their Fourier expansions, which must coincide.
We study the distribution of Shapley values in weighted voting games. The Shapley values measure the voting power collective decision making systems. While easy to estimate empirically given the parameters of a weighted voting game, the Shapley values are hard to reason about analytically.
We propose a probabilistic approach, in which the agent weights are drawn i.i.d. from some known exponentially decaying distribution. We provide a general closed-form characterization of the highest and lowest expected Shapley values in such a game, as a function of the parameters of the underlying distribution. To do so, we give a novel reinterpretation of the stochastic process that generates the Shapley variables as a renewal process. We demonstrate the use of our results on the uniform and exponential distributions.
We study the Shapley value in weighted voting games. The Shapley value has been used as an index for measuring the power of individual agents in decision-making bodies and political organizations, where decisions are made by a majority vote process. We characterize the impact of changing the quota (i.e., the minimum number of seats in the parliament that are required to form a coalition) on the Shapley values of the agents. Contrary to previous studies, which assumed that the agent weights (corresponding to the size of a caucus or a political party) are fixed, we analyze new domains in which the weights are stochastically generated, modeling, for example, elections processes.
We examine a natural weight generation process: the Balls and Bins model, with uniform as well as exponentially decaying probabilities. We also analyze weights that admit a super-increasing sequence, answering several open questions pertaining to the Shapley values in such games.
Our results for the balls and bins model with exponentially decaying probabilities rely on a formula for the Shapley values of super-increasing sequences. Curiously, this formula gives rise to a continuous function reminiscent of Minkowski’s question mark function.
Many voting rules require the voters to give a complete preference order over the candidates. This is cumbersome, leading to the notion of top-$k$ voting, in which the voters only give the length-$k$ prefixes of their rankings. The question that we ask in this paper is: given a voting rule, for what value of $k$ is it possible to predict the overall winner given only the length-$k$ prefixes, with high probability, given enough voters?
We first consider the case of an impartial culture, in which the voters choose their preference profiles uniformly at random over all permutations. For positional scoring rules (like Borda) we give a nearly-tight threshold theorem for $k$. We also prove a strong, though non-optimal, lower bound for Copeland.
When the preference profiles are drawn from a biased distribution, such as the Mallows distribution, we show that the candidate toward which the distribution is biased wins the elections, for both positional scoring rules and Copeland, with high probability.
Finally, we consider adversarially-chosen preference distributions. We show that for positional scoring rules with geometrically decaying scores, $k = O(\log n)$ suffices to predict the winner with high probability.
Top-$k$ voting is an especially natural form of partial vote elicitation in which only length-$k$ prefixes of rankings are elicited. We analyze the ability of top-$k$ vote elicitation to correctly determine true winners with high probability, given probabilistic models of voter preferences and candidate availability. We provide bounds on the minimal value of $k$ required to determine the correct winner under the plurality and Borda voting rules, considering both worst-case preference profiles and profiles drawn from the impartial culture and Mallows probabilistic models. We also derive conditions under which the special case of zero elicitation (i.e., $k=0$) produces the correct winner. We provide empirical results that confirm the value of top-$k$ voting.
The proof of Theorem 10 is incomplete, but the issue is fixed in subsequent work.
The problem of influence maximization deals with choosing the optimal set of nodes in a social networks so as to maximize the resulting spread of a technology (opinion, product ownership and so on), given a model of diffusion of influence in a network. A natural extension is a competitive setting, in which the goal is to maximize the spread of our technology in the presence of one or more competitors.
We suggest several natural extensions to the well-studied linear threshold model, showing that the original greedy approach cannot be used.
Furthermore, we show that for a broad family of competitive influence models, it is NP-hard to achieve an approximation that is better than a square root of the optimal solution; the same proof can also be applied to give a negative result for a conjecture in Carnes et al. about a general cascade model for competitive diffusion.
Finally, we suggest a natural model that is amenable to the greedy approach.
Consider the domain of multiclass classification within the adversarial online setting. What is the price of relying on bandit feedback as opposed to full information? To what extent can an adaptive adversary amplify the loss compared to an oblivious one? To what extent can a randomized learner reduce the loss compared to a deterministic one? We study these questions in the mistake bound model and provide nearly tight answers.
We demonstrate that the optimal mistake bound under bandit feedback is at most $O(k)$ times higher than the optimal mistake bound in the full information case, where k represents the number of labels. This bound is tight and provides an answer to an open question previously posed and studied by Daniely and Helbertal [’13] and by Long [’17, ’20], who focused on deterministic learners.
Moreover, we present nearly optimal bounds of $\tilde\Theta(k)$ on the gap between randomized and deterministic learners, as well as between adaptive and oblivious adversaries in the bandit feedback setting. This stands in contrast to the full information scenario, where adaptive and oblivious adversaries are equivalent, and the gap in mistake bounds between randomized and deterministic learners is a constant multiplicative factor of 2.
In addition, our results imply that in some cases the optimal randomized mistake bound is approximately the square-root of its deterministic parallel. Previous results show that this is essentially the smallest it can get.
The Littlestone dimension of a hypothesis class captures the optimal mistake bound in online learning when the learner is deterministic.
In this work, we define a related parameter, the randomized Littlestone dimension, which captures the optimal mistake bound when the learner is randomized.
Using the new parameter, we prove nearly optimal bounds on prediction with expert advice when the learner is randomized, complementing past work on the deterministic setting.
Given a learning task where the data is distributed among several parties, communication is one of the fundamental resources which the parties would like to minimize.
We present a distributed boosting algorithm which is resilient to a limited amount of noise.
Our algorithm is similar to classical boosting algorithms, although it is equipped with a new component, inspired by Impagliazzo’s hard-core lemma (Impagliazzo, 1995), adding a robustness quality to the algorithm.
We also complement this result by showing that resilience to any asymptotically larger noise is not achievable by a communication-efficient algorithm.
An integer sequence $(a_n)_{n \in \mathbb{N}}$ is MC-finite if for every $m \ge 1$, the sequence $a_n \bmod m$ is eventually periodic. We discuss two methods for proving MC-finiteness: exhibiting a suitable recurrence relation, and the Specker–Blatter theorem. We also give an interesting example of an integer sequence $a_n$ such that $a_n \bmod m$ is eventually periodic iff $m$ is odd, namely the sequence A086714.
We analyze the complexity of Conflict-Based Search by setting up a bivariate recurrence and estimating its asymptotic rate of growth using the methods of Pemantle and Wilson.
In a paper on graph coloring, Grimmett and McDiarmid described a heuristic that finds a large clique in a $G(n,1/2)$ random graph. The heuristic simply scans the vertices in arbitrary order, adding any vertex adjacent to all vertices previously chosen.
Grimmett and McDiarmid showed that with high probability this produces a clique whose size is asymptotically $\log_2 n$, compared to the maximum clique whose size is asymptotically $2\log_2 n$.
We determine the asymptotic distribution of the size of the clique produced by the algorithm, which is obtained by taking the logarithm of an infinite sum of exponential random variables.
Prodinger mentions that the size of the clique has the same distribution as that of the Morris counter, analyzed by Flajolet. In particular, our formulas appear (in a different form) in Flajolet’s paper.
We study various aspects of the Bhattacharya–Mesner rank of third order hypermatrices.
We study the spectral theory of hypermatrices, initiated by Gnang, Elgammal and Retakh.
Here are some of our results:
Ellul, Krawetz, Shallit and Wang prove an exponential lower bound on the size of any context-free grammar generating the language of all permutations over some alphabet. We generalize their method and obtain exponential lower bounds for many other languages, among them the set of all squares of given length, and the set of all words containing each symbol at most twice.
The version below corrects two typos in the proof of Proposition 6: $w_1 \sim w_2$ should be $x(w_1) \sim x(w_2)$, and in the following sentence $N^{-1}(A)$ should be $x(N^{-1}(A))$; and a typo in the statement of Theorem 9: the exponent should be $t$ rather than $n$.
A code of the natural numbers is a uniquely-decodable binary code of the natural numbers with non-decreasing codeword lengths, which satisfies Kraft’s inequality tightly. We define a natural partial order on the set of codes, and show how to construct effectively a code better than a given sequence of codes, in a certain precise sense. As an application, we prove that the existence of a scale of codes (a well-ordered set of codes which contains a code better than any given code) is independent of ZFC.
We devise a method for proving inequalities on submodular functions, with a term rewriting flavor. Our method comprises of the following steps:
The crucial third step is non-constructive, since it uses compactness of the dual cone of submodular functions. Its proof uses the classical uncrossing technique with a quadratic potential function.
We prove several inequalities using our method, and use them to tightly analyze the performance of two natural (but non-optimal) algorithms for submodular maximization, the random set algorithm and local search.
In this demonstration, we showcase the technologies that we are building at Yahoo! for web-scale information extraction. Given any new website, containing semi-structured information about a pre-specified set of schemas, we show how to populate objects in the corresponding schema by automatically extracting information from the website.
Work done while I was a summer intern in Yahoo! Tel Aviv.