Online learning is a classical task in which a student is challenged by her teacher to answer a sequence of Yes/No questions. After each reply, the student is told the correct answer, and her goal is to minimize the number of mistakes that she makes. In order to make the task solvable, we assume that the teacher’s answers are consistent with some hypothesis class.
Littlestone defined in 1988 a combinatorial dimension named after him which captures the optimal mistake bound as a function of the hypothesis class. His work assumed that the student chooses her answer deterministically.
We extend Littlestone’s work to the more realistic scenario in which the student is allowed to toss coins.As an application, we determine the optimal randomized mistake bound for the classical problem of prediction with expert advice.
Joint work with Steve Hanneke, Idan Mehalel, and Shay Moran.
Littlestone dimension gives the optimal mistake bound in online learning. The randomized Littlestone dimension gives the optimal mistake bound when the learner is randomized.
Whereas the Littlestone dimension is the maximum minimum depth of a shattered tree, the randomized Littlestone dimension is half the maximum expected depth of a shattered tree, where the expectation is with respect to the random process which starts at the root and at each step, chooses a child at random.
Joint work with Steve Hanneke, Idan Mehalel, and Shay Moran.
A family of permutations $\mathcal{F} \subseteq S_n$ is $2$-intersecting if any two permutations agree on at least $2$ points.
Ellis, Friedgut and Pilpel showed that for large enough $n$, such families contain at most $(n-2)!$ permutations, and Ellis characterized the extremal families.
Meagher and Razafimahatratra extended this to all $n \geq 2$, but were unable to characterize the extremal families. Using ideas borrowed from theoretical computer science, we characterize the extremal families.
Joint work with Gilad Chase, Neta Dafni, Noam Lifshitz, Nathan Lindzey, and Marc Vinyals.
A linear function on the symmetric group $S_n$ is specified by an $n \times n$ real matrix $A$, and its value at a permutation $\pi$ is
\[ \sum_i A(i,\pi(i)). \]
Such functions arise naturally in the analysis of the Erdős–Ko–Rado theorem on the symmetric group.
In this talk, we address the following questions:
Based on this paper.
Joint work with Nathan Lindzey (Technion).
Consider an $n\times n$ real matrix such that if we sum the values on a random “generalized diagonal” (a choice of one entry from each row and column), then
What can we say about the matrix?
Is this really the correct question to ask?
Bogdanov, Ishai, Viola, and Williamson constructed a pair of $\sqrt{n}$-indistinguishable sources $X,Y$ which OR tells apart.
In contrast, Braverman showed that if $X,Y$ are $\mathrm{polylog}(n)$-indistinguishable and $Y$ is the uniform distribution, then $X,Y$ fool all of $\mathsf{AC^0}$.
For which sources $Y$ beside the uniform distribution does a Braverman–style theorem hold?
Braverman’s theorem states that $\mathrm{polylog}(n)$-wise independence fools $\mathsf{AC^0}$. In contrast, since the approximate degree of OR is $\sqrt{n}$, there are two $\sqrt{n}$-wise indistinguishable distributions which do not even fool OR!
Motivated by cryptographic applications, and continuing earlier work of Bogdanov, Ishai, Viola and Williamson, we ask whether Braverman-style results can be recovered for simple distributions, such as ones samplable locally or using low degree polynomials.
We relate these questions to the notorious problem of computing inner product using $\mathsf{AC^0}$ circuits with parity gates at the bottom. This suggests that proving Braverman-style results for $\mathsf{AC^0}$ is hard even for linear sources, and so we turn to consider the OR function.
We prove a Braverman-style theorem for OR which holds for constant degree sources and for local sources; in contrast to Braverman’s theorem, we only require $O(1)$-wise indistinguishability. On the other hand, we show that logarithmic degree suffices to sample a pair of $\sqrt{n}$-wise indistinguishable sources which can be distinguished by OR. Our construction also results in a new visual secret sharing scheme.
Joint work with Andrej Bogdanov (CUHK), Krishnamoorthy Dinesh (CUHK), Yuval Ishai (Technion), Avi Kaplan (Technion), Akshayaram Srinivasan (TIFR).
Let $X,Y$ be $\mathrm{polylog}(n)$-indistinguishable. Braverman famously showed that if $Y$ is uniform then $X,Y$ fool $\mathsf{AC^0}$. In contrast, there exists a pair of $\Theta(\sqrt{n})$-indistinguishable distributions which do not even fool OR.
Under what conditions on $X,Y$ can we recover a Braverman-style theorem?
Joint work with Andrej Bogdanov, K. Dinesh, Yuval Ishai, Avi Kaplan, and Akshay Srinivasan.
Analysis of Boolean functions is a topic at the intersection of computer science and combinatorics, which uses tools and techniques from probability theory and functional analysis.
It has become part of the fundamental topic of theoreticians working in computational complexity theory. Traditionally, the functions being studied live on the Boolean hypercube $\{0,1\}^n$.
Recently, attention has been drawn to other domains, with some spectacular applications, such as the recent proof of the 2-to-2 conjecture (a weaker variant of Khot’s Unique Games Conjecture).
In the talk, I will survey some elements of the classical theory, and then describe some recent progress.
Boolean function analysis traditionally studies Boolean functions on the Boolean cube, using Fourier analysis on the group $\mathbb{Z}_2^n$. Other domains of interest include the biased Boolean cube, other abelian groups, and Gaussian space. In all cases, the focus is on results which are independent of the dimension.
Recently, Boolean function analysis has expanded in scope to include other domains such as the symmetric group, the “slice”, and the Grassmann scheme. The latter has figured in the recent proof of the 2-to-1 conjecture (a step toward the unique games conjecture) by Dinur, Khot, Kindler, Minzer, and Safra.
In this talk, we systematically survey Boolean function analysis beyond the Boolean cube, focusing on two main points of view: association schemes and differential posets. We also highlight work on the slice, which is the most widely studied non-product domain.
Boolean function analysis traditionally studies Boolean functions on the Boolean cube, using Fourier analysis on the group $\mathbb{Z}_2^n$. Other domains of interest include the biased Boolean cube, other abelian groups, and Gaussian space. In all cases, the focus is on results which are independent of the dimension.
Recently, Boolean function analysis has expanded in scope to include other domains such as the symmetric group, the “slice”, and the Grassmann scheme. The latter has figured in the recent proof of the 2-to-1 conjecture (a step toward the unique games conjecture) by Dinur, Khot, Kindler, Minzer, and Safra.
In this talk, we first survey classical Boolean function analysis, to set the scene. We then describe in some detail the recent work on the 2-to-1 conjecture, focusing on the connection to Boolean function analysis.
The $(n,k)$-slice is the subset of the Boolean cube $\{0,1\}^n$ consisting of all vectors of weight $k$.
It is the natural home for objects such as uniform intersecting families and $G(n,m)$ random graphs.
Fourier analysis has proven itself very useful on the Boolean cube.
Can we extend it to the slice?
Does it teach us anything new about the cube?
How different is the slice from the cube, after all?
(For the answers, you will have to attend the talk.)
Analysis of Boolean functions is the study of 0/1-valued functions using spectral and analytic techniques. Lying at the interface of combinatorics, probability theory, functional analysis and theoretical computer science, it has been applied to random graph theory, percolation theory, coding theory, social choice theory, extremal combinatorics, and theoretical computer science.
Traditionally the functions being considered are on the Boolean cube $\{0,1\}^n$, or more rarely on other product domains, usually finite. We explore functions on more exotic domains such as finite groups and association schemes, concentrating on two examples: the symmetric group and the Johnson association scheme (the “slice”), which consists of all vertices in the Boolean cube with a specified weight.
We will survey a few classical results in analysis of Boolean functions on the Boolean cube and their generalizations to more exotic domains. On the way, we will explore questions such as: Which functions on the symmetric group are “dictatorships” (depend on one “coordinate”) or “juntas” (depend on a few “coordinates”)? Is there a “Fourier expansion” for functions on the slice? Is the middle slice a “representative” section of the Boolean cube?
Joint work with David Ellis, Ehud Friedgut, Guy Kindler, Elchanan Mossel, and Karl Wimmer.
Motivated by Kalai’s approximate Arrow’s theorem, Friedgut, Kalai and Naor proved the fundamental FKN theorem, which states that Boolean functions on the Boolean cube $\{0, 1\}^n$ which are close to degree 1 with respect to the uniform measure are close to Boolean degree 1 functions.
In the first lecture, we discussed this application and provided a new proof of the FKN theorem, using the Berry–Esseen theorem.
In the second lecture, we stated and proved the biased FKN theorem, which extends the FKN theorem to the Boolean cube under biased product measures $\mu_p$ for small $p$.
In the third lecture, we completed the proof of the biased FKN theorem by proving the junta agreement theorem, and we described the FKN theorem for the symmetry group.
In the fourth and final lecture, we discussed generalizations of the foregoing to degree $d$.
The FKN theorem states that if a Boolean function on the Boolean cube is close to degree 1, then it is close to a dictator. Similarly, the Kindler–Safra theorem states that if a Boolean function on the Boolean cube is close to degree $d$, then it is close to a junta.
Another setting we might discuss is functions on the symmetric group. If a Boolean function is 𝜀-close to degree 1 then it is $O(\sqrt{\varepsilon})$-close to a dictator (suitably defined), and $O(\varepsilon)$-close (up to negation) to a union of “mostly disjoint” cosets. A similar statement should hold for degree $d$, where the function ought to be close to a (suitably defined) decision tree of depth $\mathit{poly}(d)$.
Joint work with Irit Dinur (Weizmann institute) and Prahladh Harsha (TIFR).
Boolean function analysis studies (mostly) Boolean functions on $\{0,1\}^n$. Two basic concepts in the field are degree and junta. A function has degree $d$ if it can be written as a degree $d$ polynomial. A function is a $d$-junta if it depends on d coordinates. Clearly, a $d$-junta has degree $d$. What about the converse (for Boolean functions)? What if the Boolean function is only close to degree $d$?
The questions above were answered by Nisan–Szegedy, Friedgut–Kalai–Naor, and Kindler–Safra. We consider what happens when we ask the same questions on different domains. The domains of interest include the biased Boolean cube, the slice, the Grassmann scheme, the symmetric group, high-dimensional expanders, and others.
Joint work with Yotam Dikstein, Irit Dinur, Prahladh Harsha, and Ferdinand Ihringer.
A talk covering Friedgut–Kalai–Naor theorems for the symmetric group (EFF1, EFF2) and for the slice.
The talk also covers applications to Erdős–Ko–Rado theory.
The classical Friedgut–Kalai–Naor (FKN) theorem states that if most of the Fourier spectrum of a boolean function on $\{0,1\}^n$ is concentrated on the first two levels then the function essentially depends on one coordinate. We extend this result to boolean functions on $S_n$ whose Fourier spectrum is concentrated on the first two irreducible representations. When the function is balanced, we show that it essentially depends on either the image or the inverse image of one point. When the function is skewed, we show that it is close to a disjunction of double cosets (indicators of events of the type “$i$ maps to $j$”).
A function $f$ which satisfies $f(x \oplus y) = f(x) \oplus f(y)$ for all $x,y$ is an XOR of a subset of the coordinates.
Moreover, if a function $f$ satisfies this for most inputs, then it is close to an XOR — a classical stability result known in theoretical computer science as linearity testing.
We discuss what happens if we replace ⊕ with a different operation (not necessarily binary), showing that stability still holds (with a wrinkle).
The proof involves Jones’ regularity lemma (a regularity lemma for Boolean functions) as well as the It Ain’t Over Till It’s Over theorem.
Joint work with Gilad Chase, Dor Minzer, Elchanan Mossel, and Nitin Saurabh.
An $n$-bit function $f$ is a polymorphism of an $m$-bit function $g$ if $f \circ g^n = g o f^m$.
For example, an $n$-bit function $f$ is a polymorphism of the 2-bit function $g=\mathsf{XOR}$ if $f(x + y) = f(x) + f(y)$ for all $x,y$.
It is known that all exact polymorphisms of XOR are XORs, and furthermore, all approximate polymorphisms of XOR are close to XORs — this is the classical linearity testing.
We determine all approximate polymorphisms of $g$ for an arbitrary function $g$.
In addition, we consider “list decoding” variants of this question.
Linearity testing states that if $\Pr[f(x \oplus y) = f(x) \oplus f(y)] \approx 1$ then $f \approx \mathsf{XOR}$.
Together with Noam Lifshitz, Dor Minzer and Elchanan Mossel, we showed that if $\Pr[f(x \land y) = f(x) \land f(y)] \approx 1$ then $f \approx \mathsf{AND}$.
Together with Gilad Chase, Dor Minzer and Nitin Saurabh, we extended this for arbitrary inner functions (replacing $\oplus$ or $\land$).
I’m thinking of a person in the audience. How long will it take you to find whom, using only Yes/No questions?
We consider this puzzle, which underlies search theory, with a twist:
I’m choosing the person according to a known distribution, and your goal is to minimize the expected number of questions.
How does the performance of your strategy depend on the type of question you’re allowed to ask? On the type of distribution I am allowed to choose? What happens if I can lie?
Joint work with Yuval Dagan (MIT), Ariel Gabizon (Zcash), Daniel Kane (UCSD), Shay Moran (IAS).
Joint work with Yuval Dagan (Technion), Ariel Gabizon (Zcash) and Shay Moran (UCSD).
An optimal strategy for Bob could potentially use arbitrary questions.
What happens when we restrict the set of allowed questions?
Joint work with Yuval Dagan (Technion), Ariel Gabizon (Zerocash) and Shay Moran (UCSD).
Huffman coding has a search-theoretic interpretation as the optimal strategy for the twenty questions game.
In this game, Alice chooses $x \in \{1,\dots,n\}$ according to a distribution $\mu$, and Bob identifies $x$ using yes/no questions. Bob’s goal is to use the minimum number of questions in expectation. A strategy for Bob corresponds to a prefix code for $\{1,\dots,n\}$, and this shows that Bob’s optimal strategy uses a Huffman code for $\mu$. However, this strategy could use arbitrary yes/no questions. What happens when we restrict the set of yes/no questions that Bob is allowed to use?
Joint work with Yuval Dagan, Ariel Gabizon, and Shay Moran.
Joint work with Yuval Dagan (Technion), Ariel Gabizon (Zcash) and Shay Moran (IAS).
A classic topic in Information Theory is that of minimum redundancy
encoding. Given a distribution $\mu$, how many bits are needed on average to encode data distributed according to $\mu$?
One optimal solution to this problem is given by Huffman’s algorithm,
which produces a code whose average codeword length is at most $H(\mu) + 1$, where $H(\mu)$ is the entropy of $\mu$.
The process of encoding data using a prefix-free code can be represented by a decision tree. Codes produced by Huffman’s algorithm, while having minimum redundancy, use arbitrary queries at the nodes of the resulting decision tree.
In this talk we ask what happens when we restrict the set of queries.
We answer the following two questions, among others:
On the way, we tightly analyze the redundancy of binary comparison
search trees.
Joint work with Yuval Dagan (Technion), Ariel Gabizon (Technion), and Shay Moran (Technion & UCSD).
Ever since Strassen’s $O(n^{2.81})$ matrix multiplication algorithm stunned the mathematical community, the quest for fast matrix multiplication algorithms has been a holy grail in computer science. At first progress was fast, culminating in Coppersmith and Winograd’s $O(n^{2.376})$ algorithm of 1987. Recently interest in the area has reawakened due to work of Stothers, Vassilevska-Williams and Le Gall, who managed to improve the exponent slightly from 2.376 to 2.373.
Roughly speaking, Coppersmith and Winograd constructed an edifice turning an “identity” (some kind of mathematical object) into a fast matrix multiplication algorithm. Applying their method on the identity $T_{CW}$, they obtained an $O(n^{2.388})$ algorithm. Applying it to $T_{CW}^2$ (identities can be squared!), they obtained their $O(n^{2.376})$ algorithm. They stopped there since computing was a lot more cumbersome in the 1980s. Using modern computers, Stothers, Vassilevska-Williams and Le Gall were able to analyze higher powers of $T_{CW}$ (up to $T_{CW}^{32}$), and so reduced the exponent to 2.373.
Our talk answers the following question:
What is the limit of this approach?
We show that this approach cannot yield an exponent smaller than 2.372.
No prior knowledge will be assumed.
Joint work with Andris Ambainis (University of Latvia) and François Le Gall (University of Tokyo).
Ever since Strassen’s $O(n^{2.81})$ matrix multiplication algorithm stunned the mathematical community, the quest for fast matrix multiplication algorithms has been a holy grail in computer science. At first progress was fast, culminating in Coppersmith and Winograd’s $O(n^{2.376})$ algorithm of 1987. Recently interest in the area has reawakened due to work of Stothers, Vassilevska-Williams and Le Gall, who managed to improve the exponent slightly from 2.376 to 2.373.
Roughly speaking, Coppersmith and Winograd constructed an edifice turning an “identity” (some kind of mathematical object) into a fast matrix multiplication algorithm. Applying their method on the identity $T_{CW}$, they obtained an $O(n^{2.388})$ algorithm. Applying it to $T_{CW}^2$ (identities can be squared!), they obtained their $O(n^{2.376})$ algorithm. They stopped there since computing was a lot more cumbersome in the 1980s. Using modern computers, Stothers, Vassilevska-Williams and Le Gall were able to analyze higher powers of $T_{CW}$ (up to $T_{CW}^{32}$), and so reduced the exponent to 2.373.
Our talk answers the following question:
What is the limit of this approach?
We show that this approach cannot yield an exponent smaller than 2.372.
No prior knowledge will be assumed.
Joint work with Andris Ambainis (University of Latvia) and François Le Gall (University of Tokyo).
Based on our work together with Andris Ambainis and François Le Gall.
Feige showed in 1998 that the greedy algorithm is optimal for Set Cover and for its dual Max Cover. The goal of Max Cover is to choose a given number of sets which together cover the maximum number of elements. Partition Max Cover is the variant in which sets have a color, and we need to choose one set of each color; this generalizes Max SAT. Greedy is no longer optimal, and standard approaches use an LP or other relaxation followed by rounding. We describe an optimal combinatorial algorithm, which uses local search with foresight. Our algorithm can also be adapted to the more general case of monotone submodular optimization.
Joint work with Justin Ward (QMUL) from 2014.
The greedy algorithm gives a $1-1/e$ approximation to Maximum Coverage, which is best possible unless P=NP (Feige).
In contrast, the greedy algorithm only gives a 1/2 approximation to MSMM.
Joint work with Justin Ward (EPFL).
Maximizing a monotone submodular function over a matroid constraint is a fundamental combinatorial optimization problem, generalizing problems such as Maximum Coverage and MAX-SAT.
Calienscu, Chekuri, Pál and Vondrák devised the continuous greedy algorithm, which gives the optimal $1-1/e$ approximation ratio.
Together with Justin Ward, we have come up with an alternative algorithm which is more combinatorial in nature. We first devised an algorithm for the special case of maximum coverage, and then extended it to the general case.
Lecture notes for two guest lectures given at Prahladh Harsha’s class (and a summary of a third one).
The first lecture, on February 14, focused on the Hoffman/Lovász bound, with an application to the traffic light problem.
The second lecture, on February 20, discussed triangle-intersecting families.
The third lecture, on February 21, was on the Planted Clique problem.
A family of graphs (sharing the same finite vertex set) is triangle-intersecting if the intersection of any two graphs in the family contains a triangle. Simonovits and Sós conjectured in 1976 that such a family can contain at most 1/8 of the graphs; this bound is achieved by taking a “triangle-junta” (all graphs containing a fixed triangle).
Up to the present work, the best known upper bound on the size of a triangle-intersecting family was 1/4 (Chung, Graham, Frankl & Shearer 1981). We prove the conjecture, showing moreover that the only families achieving the bound are triangle-juntas.
The result follows a line of work by Friedgut and his coauthors. The proof uses rudimentary discrete Fourier analysis, and some “entry-level” graph theory; the talk will focus on the former.
Joint work with David Ellis (Cambridge) and Ehud Friedgut (Hebrew University).
An exposition of Triangle-intersecting families of graphs, our joint work with David Ellis and Ehud Friedgut, in which we show that if $\mathcal{F}$ is a family of graphs on $n$ points such that any two graphs in $\mathcal{F}$ contain a triangle, then $\mathcal{F}$ contains at most $1/8$ of the graphs.
The proof uses the spectral approach of Hoffman and Lovász.
We give an overview of Hoffman’s bound and its application to Erdős–Ko–Rado theory.
How many arithmetic operations does it take to multiply two square matrices? This question has captured the imagination of computer scientists ever since Strassen showed in 1969 that $O(n^{2.81})$ operations suffice. We survey the classical theory of fast matrix multiplication, starting with Strassen’s algorithm and culminating with the state-of-the-art Coppersmith–Winograd algorithm and its recent improvements. We also describe Coppersmith’s $O(n^2\log^2n)$ algorithm for multiplying rectangular matrices, used by Ryan Williams in his separation of ACC and NEXP.
This part of the two-lecture series, based on my earlier TSS talks, discusses the work of Coppersmith and Winograd, and its follow-ups.
How many arithmetic operations does it take to multiply two square matrices? This question has captured the imagination of computer scientists ever since Strassen showed in 1969 that $O(n^{2.81})$ operations suffice. We survey the classical theory of fast matrix multiplication, starting with Strassen’s algorithm and culminating with the state-of-the-art Coppersmith–Winograd algorithm and its recent improvements. We also describe Coppersmith’s $O(n^2\log^2n)$ algorithm for multiplying rectangular matrices, used by Ryan Williams in his separation of ACC and NEXP.
This part of the two-lecture series, based on my earlier TSS talks, culminates with Schönhage’s asymptotic sum inequality.
Bonet, Pitassi and Raz and, independently, Krajíček came up with the idea to use feasible interpolation as a vehicle for lower bounds on proof systems. Bonet et al. presented their construction in a somewhat ad hoc manner, and Krajíček used a theorem of Razborov generalizing Karchmer–Wigderson games. We provide a different interpretation of their arguments in terms games which are more suitable than the ones considered by Razborov.
Our account includes three flavors of arguments: ones following Bonet et al., ones following Krajíček, and ones following a suggestion of Neil Thapen. The different arguments prove lower bounds for slightly different formulas.
Introduction to my research, focusing on three topics: computational complexity, Boolean function analysis, and combinatorics.
Meant for prospective graduate students.
How difficult is it to determine whether a planar graph can be colored using 2 colors? 3 colors? 4 colors?
How difficult is it to convince somebody that a graph cannot be 3-colored? Can we do it without enumerating all possible 3-colorings?
Talk given at the Technion Math Club for an audience of high-schoolers.