Online learning is a classical task in which a student is challenged by her teacher to answer a sequence of Yes/No questions. After each reply, the student is told the correct answer, and her goal is to minimize the number of mistakes that she makes. In order to make the task solvable, we assume that the teacher’s answers are consistent with some hypothesis class.

Littlestone defined in 1988 a combinatorial dimension named after him which captures the optimal mistake bound as a function of the hypothesis class. His work assumed that the student chooses her answer deterministically.

We extend Littlestone’s work to the more realistic scenario in which the student is allowed to toss coins.As an application, we determine the optimal randomized mistake bound for the classical problem of prediction with expert advice.

Joint work with Steve Hanneke, Idan Mehalel, and Shay Moran.

Littlestone dimension gives the optimal mistake bound in online learning. The randomized Littlestone dimension gives the optimal mistake bound when the learner is randomized.

Whereas the Littlestone dimension is the maximum minimum depth of a shattered tree, the randomized Littlestone dimension is half the maximum *expected* depth of a shattered tree, where the expectation is with respect to the random process which starts at the root and at each step, chooses a child at random.

Joint work with Steve Hanneke, Idan Mehalel, and Shay Moran.

17 July 2024, Technion combinatorics seminar: Intersection theorems using certificates abstractnotes

A family of permutations $\mathcal{F} \subseteq S_n$ is *$2$-intersecting* if any two permutations agree on at least $2$ points.

Ellis, Friedgut and Pilpel showed that for large enough $n$, such families contain at most $(n-2)!$ permutations, and Ellis characterized the extremal families.

Meagher and Razafimahatratra extended this to all $n \geq 2$, but were unable to characterize the extremal families. Using ideas borrowed from theoretical computer science, we characterize the extremal families.

Joint work with Gilad Chase, Neta Dafni, Noam Lifshitz, Nathan Lindzey, and Marc Vinyals.

24 May 2023, Technion combinatorics seminar: Nearly linear functions on the symmetric group abstractnotes

A linear function on the symmetric group $S_n$ is specified by an $n \times n$ real matrix $A$, and its value at a permutation $\pi$ is

\[ \sum_i A(i,\pi(i)). \]

Such functions arise naturally in the analysis of the Erdős–Ko–Rado theorem on the symmetric group.

In this talk, we address the following questions:

- What can we say about $f$ if $f(\pi) \in \{0,1\}$ for all $\pi$?
- What can we say about $f$ if $f(\pi) \in \{0,1\}$ for most $\pi$?
- What can we say about $f$ if $f(\pi)$ is close to $\{0,1\}$ for all $\pi$?
- What can we say about $f$ if $f(\pi)$ is close to $\{0,1\}$ on average (in $L_2$)?

Based on this paper.

15 May 2023, AGT online seminar: Orthogonal basis of eigenvectors for the Johnson and Kneser graphs abstractnotes

The Johnson and Kneser graphs have the same eigenspaces.

How explicitly can we describe these eigenspaces?

We describe an explicit orthogonal basis for each eigenspace, which coincides with the Gelfand–Tsetlin basis. We also discuss related work on other graphs, including many open questions.

Joint work with Nathan Lindzey (Technion).

Boolean function analysis studies functions on the hypercube $\{-1,1\}^n$.

Its starting point is the *Fourier expansion*, which is the unique way to express a given function as a multilinear polynomial.

What happens if we study functions on other domains, such as the “slice” or the symmetric group?

The talk is based on this paper on the slice and this paper on the symmetric group.

Joint work with Nathan Lindzey (Technion).

Boolean function analysis studies functions on the hypercube $\{-1,1\}^n$.

Its starting point is the *Fourier expansion*, which is the unique way to express a given function as a multilinear polynomial.

What happens if we study functions on other domains, such as the “slice” or the symmetric group?

The talk is based on this paper on the slice and this paper on the symmetric group.

Joint work with Nathan Lindzey (Technion).

22 November 2021, Technion: Square matrices where a random generalized diagonal is close to 0/1 abstract

Consider an $n\times n$ real matrix such that if we sum the values on a random “generalized diagonal” (a choice of one entry from each row and column), then

- on average, it is close to $\{0,1\}$, in the sense that the expected squared distance is small; or
- with probability close to 1, it is equal to 0 or 1; or
- it is always within some small distance of 0 or 1.

What can we say about the matrix?

Is this really the correct question to ask?

18 October 2021, HUJI: When is a linear function on the symmetric group almost Boolean? abstractnotes

A real-valued function on $S_n$ is *linear* if it is a linear combination of indicators of the form “$i$ goes to $j$”.

Ellis, Friedgut and Pilpel showed that if a linear function on $S_n$ is Boolean then it is a *dictator*, that is, it depends on the image of some $i$ or on the inverse image of some $j$.

What can we say about linear functions on $S_n$ which are *almost* Boolean?

We answer this question completely for several notions of “almost”, improving on earlier results together with Ellis and Friedgut.

Talk based on this paper.

Suppose that $X,Y$ are two distributions on $n$ bits.

What amount of indistinguishability is needed to fool $\mathsf{AC^0}$?

If $X,Y$ are arbitrary sources, then $\sqrt{n}$-indistinguishability doesn’t even suffice to fool OR.

In contrast, if $Y$ is the uniform distribution, then Braverman showed that $\mathrm{polylog}(n)$-indistinguishability suffices to fool all of $\mathsf{AC^0}$.

We explore what happens when $Y$ is a simple distribution, say samplable in low degree or locality.

Joint work with Andrej Bogdanov & K. Dinesh (CUHK), Yuval Ishai and Avi Kaplan (Technion), and Akshay Srinivasan (TIFR).

Bogdanov, Ishai, Viola, and Williamson constructed a pair of $\sqrt{n}$-indistinguishable sources $X,Y$ which OR tells apart.

In contrast, Braverman showed that if $X,Y$ are $\mathrm{polylog}(n)$-indistinguishable and $Y$ is the uniform distribution, then $X,Y$ fool all of $\mathsf{AC^0}$.

For which sources $Y$ beside the uniform distribution does a Braverman–style theorem hold?

Braverman’s theorem states that $\mathrm{polylog}(n)$-wise independence fools $\mathsf{AC^0}$. In contrast, since the approximate degree of OR is $\sqrt{n}$, there are two $\sqrt{n}$-wise indistinguishable distributions which do not even fool OR!

Motivated by cryptographic applications, and continuing earlier work of Bogdanov, Ishai, Viola and Williamson, we ask whether Braverman-style results can be recovered for simple distributions, such as ones samplable locally or using low degree polynomials.

We relate these questions to the notorious problem of computing inner product using $\mathsf{AC^0}$ circuits with parity gates at the bottom. This suggests that proving Braverman-style results for $\mathsf{AC^0}$ is hard even for linear sources, and so we turn to consider the OR function.

We prove a Braverman-style theorem for OR which holds for constant degree sources and for local sources; in contrast to Braverman’s theorem, we only require $O(1)$-wise indistinguishability. On the other hand, we show that logarithmic degree suffices to sample a pair of $\sqrt{n}$-wise indistinguishable sources which can be distinguished by OR. Our construction also results in a new visual secret sharing scheme.

Joint work with Andrej Bogdanov (CUHK), Krishnamoorthy Dinesh (CUHK), Yuval Ishai (Technion), Avi Kaplan (Technion), Akshayaram Srinivasan (TIFR).

Let $X,Y$ be $\mathrm{polylog}(n)$-indistinguishable. Braverman famously showed that if $Y$ is uniform then $X,Y$ fool $\mathsf{AC^0}$. In contrast, there exists a pair of $\Theta(\sqrt{n})$-indistinguishable distributions which do not even fool OR.

Under what conditions on $X,Y$ can we recover a Braverman-style theorem?

Joint work with Andrej Bogdanov, K. Dinesh, Yuval Ishai, Avi Kaplan, and Akshay Srinivasan.

22 January 2019, CAALM (CMI): Discrete harmonic analysis: beyond the Boolean hypercube abstractslides

Analysis of Boolean functions is a topic at the intersection of computer science and combinatorics, which uses tools and techniques from probability theory and functional analysis.

It has become part of the fundamental topic of theoreticians working in computational complexity theory. Traditionally, the functions being studied live on the Boolean hypercube $\{0,1\}^n$.

Recently, attention has been drawn to other domains, with some spectacular applications, such as the recent proof of the 2-to-2 conjecture (a weaker variant of Khot’s Unique Games Conjecture).

In the talk, I will survey some elements of the classical theory, and then describe some recent progress.

Boolean function analysis traditionally studies Boolean functions on the Boolean cube, using Fourier analysis on the group $\mathbb{Z}_2^n$. Other domains of interest include the biased Boolean cube, other abelian groups, and Gaussian space. In all cases, the focus is on results which are independent of the dimension.

Recently, Boolean function analysis has expanded in scope to include other domains such as the symmetric group, the “slice”, and the Grassmann scheme. The latter has figured in the recent proof of the 2-to-1 conjecture (a step toward the unique games conjecture) by Dinur, Khot, Kindler, Minzer, and Safra.

In this talk, we systematically survey Boolean function analysis beyond the Boolean cube, focusing on two main points of view: association schemes and differential posets. We also highlight work on the slice, which is the most widely studied non-product domain.

Boolean function analysis traditionally studies Boolean functions on the Boolean cube, using Fourier analysis on the group $\mathbb{Z}_2^n$. Other domains of interest include the biased Boolean cube, other abelian groups, and Gaussian space. In all cases, the focus is on results which are independent of the dimension.

Recently, Boolean function analysis has expanded in scope to include other domains such as the symmetric group, the “slice”, and the Grassmann scheme. The latter has figured in the recent proof of the 2-to-1 conjecture (a step toward the unique games conjecture) by Dinur, Khot, Kindler, Minzer, and Safra.

In this talk, we first survey classical Boolean function analysis, to set the scene. We then describe in some detail the recent work on the 2-to-1 conjecture, focusing on the connection to Boolean function analysis.

9 June 2016, Bar Ilan: Fourier analysis on the slice (?)

The $(n,k)$-slice is the subset of the Boolean cube $\{0,1\}^n$ consisting of all vectors of weight $k$.

It is the natural home for objects such as uniform intersecting families and $G(n,m)$ random graphs.

Fourier analysis has proven itself very useful on the Boolean cube.

Can we extend it to the slice?

Does it teach us anything new about the cube?

How different is the slice from the cube, after all?

(For the answers, you will have to attend the talk.)

Analysis of Boolean functions is the study of 0/1-valued functions using spectral and analytic techniques. Lying at the interface of combinatorics, probability theory, functional analysis and theoretical computer science, it has been applied to random graph theory, percolation theory, coding theory, social choice theory, extremal combinatorics, and theoretical computer science.

Traditionally the functions being considered are on the Boolean cube $\{0,1\}^n$, or more rarely on other product domains, usually finite. We explore functions on more exotic domains such as finite groups and association schemes, concentrating on two examples: the symmetric group and the Johnson association scheme (the “slice”), which consists of all vertices in the Boolean cube with a specified weight.

We will survey a few classical results in analysis of Boolean functions on the Boolean cube and their generalizations to more exotic domains. On the way, we will explore questions such as: Which functions on the symmetric group are “dictatorships” (depend on one “coordinate”) or “juntas” (depend on a few “coordinates”)? Is there a “Fourier expansion” for functions on the slice? Is the middle slice a “representative” section of the Boolean cube?

Joint work with David Ellis, Ehud Friedgut, Guy Kindler, Elchanan Mossel, and Karl Wimmer.

27 February 2015, Toronto notes

Boolean function analysis concerns functions $f\colon \{0,1\}^n \to \{0,1\}$.

We often consider the behavior of $f$* *when its input is chosen uniformly at random from $\{0,1\}^n$.

Sometimes it is more natural to consider a biased input — the classical example being the random graph model $G(n,p)$.

Analyzing functions with respect to a biased measure often reveals phenomena that do not occur under an unbiased measure.

We describe one such example: the structure of functions which are almost of low degree.

Joint work with Irit Dinur (Weizmann institute) and Prahladh Harsha (TIFR).

The FKN theorem states that if a Boolean function on the Boolean cube is close to degree 1, then it is close to a dictator. Similarly, the Kindler–Safra theorem states that if a Boolean function on the Boolean cube is close to degree $d$, then it is close to a junta.

The picture becomes more interesting when we study functions on the $p$-biased Boolean cube.

If a Boolean function is $\varepsilon$-close to degree 1 then (up to negation) it is $O(\varepsilon)$-close to a maximum of $O(\sqrt{\varepsilon}/p+1)$ coordinates, and so $O(\sqrt{\varepsilon}+p)$-close to a constant function.

A similar statement holds for functions close to degree *d*, but the corresponding structure is more complicated.

Another setting we might discuss is functions on the symmetric group. If a Boolean function is 𝜀-close to degree 1 then it is $O(\sqrt{\varepsilon})$-close to a dictator (suitably defined), and $O(\varepsilon)$-close (up to negation) to a union of “mostly disjoint” cosets. A similar statement should hold for degree $d$, where the function ought to be close to a (suitably defined) decision tree of depth $\mathit{poly}(d)$.

Joint work with Irit Dinur (Weizmann institute) and Prahladh Harsha (TIFR).

10 January 2019, Bar Ilan: Almost Boolean functions of degree 1 on the symmetric group abstractnotes

One of the first results in Boolean function analysis is the FKN theorem, which states that an almost Boolean function of degree 1 on the hypercube is close to a dictator.

The question becomes more challenging in other domains, or indeed, even on the hypercube with respect to a biased measure.

In this talk we will focus on the symmetric group.

If we think of the symmetric group as the group of permutation matrices, then a function has degree 1 if it is a linear combination of input entries.

We show that in the balanced case, an analog of the FKN theorem holds for the symmetric group as well, with the correct definition of “dictator”.

Joint work with David Ellis and Ehud Friedgut, published in 2015 (EFF2).

24 March 2019, Tel Aviv: Structure of almost Boolean low degree functions abstract

11 March 2019, Hebrew University: Structure of (almost) low-degree Boolean functions abstract

Boolean function analysis studies (mostly) Boolean functions on $\{0,1\}^n$. Two basic concepts in the field are *degree* and *junta*. A function has degree $d$ if it can be written as a degree $d$ polynomial. A function is a $d$-junta if it depends on d coordinates. Clearly, a $d$-junta has degree $d$. What about the converse (for Boolean functions)? What if the Boolean function is only *close* to degree $d$?

The questions above were answered by Nisan–Szegedy, Friedgut–Kalai–Naor, and Kindler–Safra. We consider what happens when we ask the same questions on different domains. The domains of interest include the biased Boolean cube, the slice, the Grassmann scheme, the symmetric group, high-dimensional expanders, and others.

Joint work with Yotam Dikstein, Irit Dinur, Prahladh Harsha, and Ferdinand Ihringer.

A talk covering Friedgut–Kalai–Naor theorems for the symmetric group (EFF1, EFF2) and for the slice.

The talk also covers applications to Erdős–Ko–Rado theory.

2 December 2013, Simons institute: Friedgut–Kalai–Naor for Boolean functions on $S_n$ videoabstractnotes

The classical Friedgut–Kalai–Naor (FKN) theorem states that if most of the Fourier spectrum of a boolean function on $\{0,1\}^n$ is concentrated on the first two levels then the function essentially depends on one coordinate. We extend this result to boolean functions on $S_n$ whose Fourier spectrum is concentrated on the first two irreducible representations. When the function is balanced, we show that it essentially depends on either the image or the inverse image of one point. When the function is skewed, we show that it is close to a disjunction of double cosets (indicators of events of the type “$i$ maps to $j$”).

A Boolean function $f\colon \{0,1\}^n \to \{0,1\}$ satisfies $f(x \oplus y) = f(x) \oplus f(y)$ for all $x,y$ iff it is linear.

A classical result, *linearity testing*, states that if $f(x \oplus y) = f(x) \oplus f(y)$ holds for *most* $x,y$, then $f$ is close to a linear function.

We extend this result to arbitrary functions beyond XOR.

Joint work with Gilad Chase, Noam Lifshitz, Dor Minzer, Elchanan Mossel, and Nitin Saurabh.

3 November 2021, Technion: Approximate Polymorphisms

Which Boolean functions $f\colon \{0,1\}^n \to \{0,1\}$ satisfy $f(x⊕y)=f(x)⊕f(y)$

… for all $x,y$? linear functions (100% regime).

… for almost all $x,y$? functions close to linear functions (99% regime).

… for many $x,y$, more than random functions? functions correlating with a linear function (51% regime).

All of these results are classical.

We study what happens when the XOR function is replaced by an arbitrary Boolean function.

Based on joint work with Gilad Chase (Technion), Dor Minzer (MIT), Elchanan Mossel (MIT), Nitin Saurabh (IIIT Hyderabad).

An $n$-bit function $f$ is a polymorphism of an $m$-bit function $g$ if $f \circ g^n = g o f^m$.

For example, an $n$-bit function $f$ is a polymorphism of the 2-bit function $g=\mathsf{XOR}$ if $f(x + y) = f(x) + f(y)$ for all $x,y$.

It is known that all exact polymorphisms of XOR are XORs, and furthermore, all approximate polymorphisms of XOR are close to XORs — this is the classical linearity testing.

We determine all approximate polymorphisms of $g$ for an arbitrary function $g$.

In addition, we consider “list decoding” variants of this question.

Call $f\colon \{0,1\}^n \to \{0,1\}$ *linear* if $f(x\oplus y) = f(x)\oplus f(y)$ for all $x,y$, and *multiplicative* if $f(x\cdot y) = f(x)\cdot f(y)$ for all $x,y$.

A function $f$ is linear if it is an XOR of a subset of inputs, and multiplicative if it is an AND of a subset of inputs (or zero).

Blum, Luby and Rubinfeld (1993) showed that we can test whether $f$ is linear by checking the condition $f(x\oplus y) = f(x)\oplus f(y)$ for random $x,y$.

Answering (in a sense) a question of Parnas, Ron and Samorodnitsky (2002), and improving on results of Ilan Nehama (2013), we show that a similar statement holds for testing whether *f* is multiplicative.

Joint work (2020) with Noam Lifshitz (HUJI), Dor Minzer (MIT), and Elchanan Mossel (MIT).

A Boolean function $f\colon \{0,1\}^n \to \{0,1\}$ is a *polymorphism of XOR* if $f(x+y) = f(x)+f(y)$ for all $x,y$ (addition is modulo 2).

It is well-known that a function is a polymorphism of XOR iff it is an XOR of a subset of the coordinates.

A celebrated result in theoretical computer science (“linearity testing”) extends this to *approximate* polymorphisms of XOR:

if $f(x+y) = f(x)+f(y)$ for *most* $x,y$, then $f$ is close to an exact polymorphism.

We show that similar results hold for other functions, such as AND and Majority:

approximate polymorphisms of AND are close to ANDs, and approximate polymorphisms of Majority essentially depend on a single coordinate.

Joint work with Noam Lifshitz (HUJI), Dor Minzer (MIT), and Elchanan Mossel (MIT).

Arrow’s impossibility theorem states that the only voting rule satisfying certain natural requirements is a dictatorship.

Gil Kalai showed that even if we relax some of these requirements so that they only hold with high probability, we are not getting genuine new voting rules.

Arrow’s theorem is just one of many paradoxes in social choice theory.

Another example, from judgment aggregation, states that the only voting rules satisfying the equation $ f(x \land y) = f(x) \land f(y)$ for all $x,y \in \{0,1\}^n$ are oligarchies (ANDs).

Ilan Nehama asked whether relaxing this requirement so that it only holds with high probability can help.

He proved that the answer is negative – voting rules satisfying the relaxed condition must be close to an AND.

However, his result degrades with $n$, the number of voters.

We prove a similar result which is independent of $n$.

Our result can also be seen as the analog of linearity testing when changing $+$ to $\times$.

Joint work with Noam Lifshitz (HUJI), Dor Minzer (IAS) and Elchanan Mossel (MIT).

17 March 2020, Technion: Property Testing meets Universal Algebra: Oligarchy Testing slides

It is well-known that the only functions satisfying $f(x \oplus y) = f(x) \oplus f(y)$ are XORs, and moreover this holds robustly: if $f(x \oplus y) = f(x) \oplus f(y)$ for most $x,y$, then $f$ is close to an XOR.

We show that a similar statement holds with XOR replaced by AND.

Both of these results are examples of *robust judgment aggregation*, which also includes such seemingly unrelated statements as Kalai’s robust version of Arrow’s impossibility theorem.

Joint work with Noam Lifshitz (Hebrew University), Dor Minzer (IAS, Princeton) and Elchanan Mossel (MIT).

9 October 2019, Warwick: One-sided linearity testing

A classical property testing result states that if a Boolean function $f$ satisfies $f(x \oplus y) = f(x) \oplus f(y)$ for most inputs $x,y$, then $f$ is close to an XOR.

Prompted by an application to approximate judgment aggregation, we discuss what happens when replacing XOR with AND.

The solution involves a one-sided analog of the familiar noise operator of Boolean Function Analysis.

Joint work with Noam Lifshitz, Dor Minzer, and Elchanan Mossel.

Linearity testing states that if $\Pr[f(x \oplus y) = f(x) \oplus f(y)] \approx 1$ then $f \approx \mathsf{XOR}$.

Together with Noam Lifshitz, Dor Minzer and Elchanan Mossel, we showed that if $\Pr[f(x \land y) = f(x) \land f(y)] \approx 1$ then $f \approx \mathsf{AND}$.

Together with Gilad Chase, Dor Minzer and Nitin Saurabh, we extended this for arbitrary inner functions (replacing $\oplus$ or $\land$).

I am thinking of a number between 1 and 1000000.

How many Yes/No questions do you need to reveal my number?

This is the famous Twenty Questions game, usually not played with numbers.

To spice up the game, I am choosing my number according to some distribution which you know, and your goal is to minimize the *expected* number of questions.

This problem was famously solved by Huffman in 1951.

However, the algorithm his method implies involves rather cumbersome questions.

What can we accomplish using simple questions?

What if I am allowed to lie a few times?

Joint work with Yuval Dagan, Ariel Gabizon, Daniel Kane, and Shay Moran.

10 June 2019, Weizmann Institute abstract

20 questions is one of the simplest examples of a *combinatorial search game*: Alice thinks of an English word, and Bob’s job is to figure it out by asking Yes/No questions.

The worst-case complexity of the game is clearly $\log n$, so to spice things up, we assume that Alice chooses her input according to some distribution known to Bob, and Bob now aims to minimize the *expected* number of questions.

An optimal solution to this problem was found already in 1952.

However, the optimal strategy could ask arbitrarily complex Yes/No questions.

We ask what happens when we constrain Bob to asking only “simple” questions, and what happens if Alice is allowed to lie a few times.

Joint work with Yuval Dagan (MIT), Ariel Gabizon (?), Daniel Kane (UCSD) and Shay Moran (IAS).

29 January 2019, TIFR: Twenty Questions: Distributional Search Theory abstract

I’m thinking of a person in the audience. How long will it take you to find whom, using only Yes/No questions?

We consider this puzzle, which underlies search theory, with a twist:

I’m choosing the person according to a known distribution, and your goal is to minimize the *expected* number of questions.

How does the performance of your strategy depend on the type of question you’re allowed to ask? On the type of distribution I am allowed to choose? What happens if I can lie?

Joint work with Yuval Dagan (MIT), Ariel Gabizon (Zcash), Daniel Kane (UCSD), Shay Moran (IAS).

6 June 2018, HALG (Amsterdam) slides

13 June 2017, TAU abstract

The twenty questions game is a two-player game at the basis of combinatorial search theory.

Alice thinks of a number from 1 to $n$ according to a distribution known to Bob,

and Bob identifies the number by asking Alice several Yes/No questions.

Bob’s goal is to minimize the expected number of questions he asks.

Bob’s optimal strategy, which is often taught in undergraduate algorithms courses,

uses arbitrary Yes/No questions, which is a bit unrealistic.

What happens if we restrict Bob to a small “repertoire” of questions?

Can we recover the performance of the optimal algorithm with a smaller repertoire?

Joint work with Yuval Dagan (Technion), Ariel Gabizon (Zcash) and Shay Moran (UCSD).

4 June 2017, Ariel abstract

In the twenty questions game, Alice chooses an object according to a known distribution, and Bob aims to discover it using as few Yes/No questions on average as possible.

An optimal strategy for Bob could potentially use arbitrary questions.

What happens when we restrict the set of allowed questions?

Joint work with Yuval Dagan (Technion), Ariel Gabizon (Zerocash) and Shay Moran (UCSD).

28 May 2017, IMU abstract

Huffman coding has a search-theoretic interpretation as the optimal strategy for the twenty questions game.

In this game, Alice chooses $x \in \{1,\dots,n\}$ according to a distribution $\mu$, and Bob identifies $x$ using yes/no questions. Bob’s goal is to use the minimum number of questions in expectation. A strategy for Bob corresponds to a prefix code for $\{1,\dots,n\}$, and this shows that Bob’s optimal strategy uses a Huffman code for $\mu$. However, this strategy could use arbitrary yes/no questions. What happens when we restrict the set of yes/no questions that Bob is allowed to use?

Joint work with Yuval Dagan, Ariel Gabizon, and Shay Moran.

29 March 2017, Open University abstract

The twenty questions game is a two-player game at the basis of combinatorial search theory.

Alice thinks of a number from 1 to $n$ according to a distribution known to Bob, and Bob identifies the number by asking Alice several Yes/No questions.

Bob’s goal is to minimize the expected number of questions he asks.

Bob’s optimal strategy, which is often taught in undergraduate algorithms courses, uses arbitrary Yes/No questions, which is a bit unrealistic.

What happens if we restrict Bob to a small collection of questions?

Can we recover the performance of the optimal algorithm with a smaller collection?

Joint work with Yuval Dagan (Technion), Ariel Gabizon (Zcash) and Shay Moran (IAS).

The game of 20 questions is a search-theoretic interpretation of entropy.

Alice chooses a number $x \in \{1,\dots,n\}$ according to a known distribution $\mu$, and Bob identifies $x$ using Yes/No questions. Bob’s goal is to minimize the expected number of questions until x is identified, and his optimal strategy uses about $H(\mu)$ questions on average. However, it might invoke arbitrary Yes/No questions.

We explore this question from several different angles, uncovering a connection to binary comparison search trees, a variant of binary search trees.

Joint work with Yuval Dagan, Ariel Gabizon and Shay Moran, to appear in STOC 2017.

A classic topic in Information Theory is that of minimum redundancy

encoding. Given a distribution $\mu$, how many bits are needed on average to encode data distributed according to $\mu$?

One optimal solution to this problem is given by Huffman’s algorithm,

which produces a code whose average codeword length is at most $H(\mu) + 1$, where $H(\mu)$ is the entropy of $\mu$.

The process of encoding data using a prefix-free code can be represented by a decision tree. Codes produced by Huffman’s algorithm, while having minimum redundancy, use arbitrary queries at the nodes of the resulting decision tree.

In this talk we ask what happens when we restrict the set of queries.

We answer the following two questions, among others:

- How big should the set of allowed queries be to obtain codes whose cost is at most $H(\mu) + 1$?
- How big should the set of allowed queries be to match the

performance of Huffman codes?

On the way, we tightly analyze the redundancy of binary comparison

search trees.

Joint work with Yuval Dagan (Technion), Ariel Gabizon (Technion), and Shay Moran (Technion & UCSD).

10 May 2016, Ben-Gurion University: On the Coppersmith–Winograd approach to matrix multiplication abstract

Ever since Strassen’s $O(n^{2.81})$ matrix multiplication algorithm stunned the mathematical community, the quest for fast matrix multiplication algorithms has been a holy grail in computer science. At first progress was fast, culminating in Coppersmith and Winograd’s $O(n^{2.376})$ algorithm of 1987. Recently interest in the area has reawakened due to work of Stothers, Vassilevska-Williams and Le Gall, who managed to improve the exponent slightly from 2.376 to 2.373.

Roughly speaking, Coppersmith and Winograd constructed an edifice turning an “identity” (some kind of mathematical object) into a fast matrix multiplication algorithm. Applying their method on the identity $T_{CW}$, they obtained an $O(n^{2.388})$ algorithm. Applying it to $T_{CW}^2$ (identities can be squared!), they obtained their $O(n^{2.376})$ algorithm. They stopped there since computing was a lot more cumbersome in the 1980s. Using modern computers, Stothers, Vassilevska-Williams and Le Gall were able to analyze higher powers of $T_{CW}$ (up to $T_{CW}^{32}$), and so reduced the exponent to 2.373.

Our talk answers the following question:

*What is the limit of this approach?*

We show that this approach cannot yield an exponent smaller than 2.372.

No prior knowledge will be assumed.

Joint work with Andris Ambainis (University of Latvia) and François Le Gall (University of Tokyo).

6 April 2016, Ariel: On the Coppersmith–Winograd approach to matrix multiplication abstract

Ever since Strassen’s $O(n^{2.81})$ matrix multiplication algorithm stunned the mathematical community, the quest for fast matrix multiplication algorithms has been a holy grail in computer science. At first progress was fast, culminating in Coppersmith and Winograd’s $O(n^{2.376})$ algorithm of 1987. Recently interest in the area has reawakened due to work of Stothers, Vassilevska-Williams and Le Gall, who managed to improve the exponent slightly from 2.376 to 2.373.

Roughly speaking, Coppersmith and Winograd constructed an edifice turning an “identity” (some kind of mathematical object) into a fast matrix multiplication algorithm. Applying their method on the identity $T_{CW}$, they obtained an $O(n^{2.388})$ algorithm. Applying it to $T_{CW}^2$ (identities can be squared!), they obtained their $O(n^{2.376})$ algorithm. They stopped there since computing was a lot more cumbersome in the 1980s. Using modern computers, Stothers, Vassilevska-Williams and Le Gall were able to analyze higher powers of $T_{CW}$ (up to $T_{CW}^{32}$), and so reduced the exponent to 2.373.

Our talk answers the following question:

*What is the limit of this approach?*

We show that this approach cannot yield an exponent smaller than 2.372.

No prior knowledge will be assumed.

Joint work with Andris Ambainis (University of Latvia) and François Le Gall (University of Tokyo).

4 January 2015, Tel Aviv

How fast can we multiply two $n \times n$ matrices?

Strassen (1969) showed how to beat the trivial $O(n^3)$ algorithm: he gave an $O(n^{2.81})$ algorithm.

Until recently, the fastest known algorithm, due to Coppersmith and Winograd (1987), ran in time $O(n^{2.376})$.

Recent improvements, extending the work of Coppersmith and Winograd, have culminated in an algorithm of Le Gall (2014) running in time $O(n^{2.3729})$.

We show that this approach cannot yield an algorithm running faster than $O(n^{2.3725})$.

Coppersmith and Winograd use Strassen’s laser method together with an identity and its square to obtain their algorithm.

The recent improvements analyze higher powers of the identity, the 32nd resulting in Le Gall’s algorithm.

We introduces a new framework, the laser method with merging, which can be used to analyze all powers of the identity at once.

Using our new framework, we can show that no power of the identity can result in an $O(n^{2.3725})$ algorithm or better.

Our new framework can potentially lead to even faster algorithms not corresponding to powers of the identity.

Joint work with Andris Ambainis (University of Latvia) and Francois Le Gall (University of Tokyo).

No familiarity with algebraic complexity theory will be assumed.

24 December 2014, Hebrew University

Based on our work together with Andris Ambainis and François Le Gall.

Feige showed in 1998 that the greedy algorithm is optimal for Set Cover and for its dual Max Cover. The goal of Max Cover is to choose a given number of sets which together cover the maximum number of elements. Partition Max Cover is the variant in which sets have a color, and we need to choose one set of each color; this generalizes Max SAT. Greedy is no longer optimal, and standard approaches use an LP or other relaxation followed by rounding. We describe an optimal combinatorial algorithm, which uses local search with foresight. Our algorithm can also be adapted to the more general case of monotone submodular optimization.

Joint work with Justin Ward (QMUL) from 2014.

8 June 2016, Ben-Gurion University abstract

Monotone submodular maximization over a matroid (MSMM) is a generalization of the classical NP-complete problem of Maximum Coverage, in which the objective coverage function is replaced by an arbitrary monotone submodular function, and the cardinality constraint is replaced by an arbitrary matroid constraint.

The greedy algorithm gives a $1-1/e$ approximation to Maximum Coverage, which is best possible unless P=NP (Feige).

In contrast, the greedy algorithm only gives a 1/2 approximation to MSMM.

Vondrák and his coauthors came up with an elegant $1-1/e$ approximation algorithm for MSMM called Continuous Greedy, which is an “interior point” algorithm.

We describe a “simplex” algorithm that also gives a $1-1/e$ approximation for MSMM, using the neglected technique of non-oblivious local search.

Joint work with Justin Ward (EPFL).

22 January 2014, DIMACS summary

We present an optimal $1-1/e$ approximation algorithm for maximum coverage over a matroid constraint using non-oblivious local search.

Calinescu, Chekuri, Pál and Vondrák have given an optimal $1-1/e$ approximation algorithm for the more general problem of monotone submodular optimization over a matroid constraint. Our algorithm is much simpler and faster, and is completely combinatorial.

We are working on extending our algorithm to handle arbitrary monotone submodular functions.

Joint work with Justin Ward (University of Toronto).

Maximizing a monotone submodular function over a matroid constraint is a fundamental combinatorial optimization problem, generalizing problems such as Maximum Coverage and MAX-SAT.

Calienscu, Chekuri, Pál and Vondrák devised the continuous greedy algorithm, which gives the optimal $1-1/e$ approximation ratio.

Together with Justin Ward, we have come up with an alternative algorithm which is more combinatorial in nature. We first devised an algorithm for the special case of maximum coverage, and then extended it to the general case.

Lecture notes for two guest lectures given at Prahladh Harsha’s class (and a summary of a third one).

The first lecture, on February 14, focused on the Hoffman/Lovász bound, with an application to the traffic light problem.

The second lecture, on February 20, discussed triangle-intersecting families.

The third lecture, on February 21, was on the Planted Clique problem.

5 December 2015, CMS winter meeting (Montréal) slides

29 January 2015, Yale

4 February 2014, Rutgers notes

8 November 2013, Simons institute notes

A family of graphs (sharing the same finite vertex set) is *triangle-intersecting* if the intersection of any two graphs in the family contains a triangle. Simonovits and Sós conjectured in 1976 that such a family can contain at most 1/8 of the graphs; this bound is achieved by taking a “triangle-junta” (all graphs containing a fixed triangle).

Up to the present work, the best known upper bound on the size of a triangle-intersecting family was 1/4 (Chung, Graham, Frankl & Shearer 1981). We prove the conjecture, showing moreover that the only families achieving the bound are triangle-juntas.

The result follows a line of work by Friedgut and his coauthors. The proof uses rudimentary discrete Fourier analysis, and some “entry-level” graph theory; the talk will focus on the former.

Joint work with David Ellis (Cambridge) and Ehud Friedgut (Hebrew University).

An exposition of Triangle-intersecting families of graphs, our joint work with David Ellis and Ehud Friedgut, in which we show that if $\mathcal{F}$ is a family of graphs on $n$ points such that any two graphs in $\mathcal{F}$ contain a triangle, then $\mathcal{F}$ contains at most $1/8$ of the graphs.

The proof uses the spectral approach of Hoffman and Lovász.

A subcube partition is a partition of $\{0,1\}^n$ into axis-aligned subcubes.

A partition is irreducible if the union of no nontrivial subpartition is itself a subcube.

How small can an irreducible subcube partition be? How large?

Do their exist irreducible subcube partitions consisting only of edges?

What if we replace “subcube” with “affine subspace”?

Joint work with Edward Hirsch (Ariel), Sascha Kurz (Bayreuth), Ferdinand Ihringer (Gent), Artur Riazanov (EPFL), Alexander Smal (Technion), Marc Vinyals (Auckland, NZ).

Alice holds an input $x$ and Bob holds an input $y$. Each party only initially knows their own input.

They wish to compute a function $f(x,y)$ depending on both inputs (allowing a negligible error), minimizing the amount of bits exchanged between the parties.

The minimum amount of communication needed to compute $f$ is known as its *communication complexity*, which is a prominent concept in computational complexity.

Braverman and Rao (2011) showed that if Alice and Bob want to compute the value of $f$ on *many* different inputs, then the amortized communication complexity is given by a measure called *information complexity*, which had been defined by Bar-Yossef, Jayram, Kumar and Sivakumar (2004) as a lower bound technique.

Perhaps the most spectacular application of information complexity is the work of Braverman, Garg, Pankratov and Weinstein (2012), in which they determined the communication complexity of set disjointness (given two subsets $x,y$ of $\{1,\dots,n\}$, determine whether they are disjoint) to be $cn + o(n)$, where $c \approx 0.4827$ is an explicitly computable constant.

Information complexity has also been applied to other areas of computational complexity, though I won’t have time to describe these connections.

Some examples include extended formulations (Braverman and Moitra, 2012) and parallel repetition (Braverman and Garg, 2014).

We give an overview of Hoffman’s bound and its application to Erdős–Ko–Rado theory.

The Sauer–Shelah–Perles (SSP) lemma is a fundamental result in VC theory, with important applications in statistical learning theory.

It bounds the number of sets in a family in terms of the size of the universe and the VC dimension.

We generalize the SSP lemma to some lattices, such as the lattice of subspaces of a finite-dimensional vector space over a finite field.

The SSP lemma fails for some lattices, and we identify a local obstruction which we conjecture is the only reason for such failure.

The talk will not assume any familiarity with VC theory or with statistical learning theory.

Joint work with Stijn Cambie (Raboud University Nijmegen), Bogdan Chornomaz (Vanderbilt University), Zeev Dvir (Princeton University) and Shay Moran (Technion).

Noam Nisan conjectured in 1991 that the degree of a Boolean function on {0,1}^n can be polynomially bounded in terms of its sensitivity.

Last year, Hao Huang proved this conjecture using a short but cunning argument.

As a result, we now know that many different complexity measures for Boolean functions on $\{0,1\}^n$ are polynomially related.

These include degree, approximate degree, decision tree complexity, certificate complexity, sensitivity, block sensitivity, and many more.

Do these questions make sense for functions on other domains?

We show that the entire theory can be generalized to the symmetric group and many other domains.

Joint work with Noam Lifshitz (HUJI), Nathan Lindzey (CU Boulder), Marc Vinyals (Technion).

Many questions in extremal combinatorics can be framed as asking for the maximum independent sets in a graph.

The prototypical example is the Erdős-Ko-Rado theorem and many of its generalizations.

The Hoffman bound, closely related to the Lovász theta function, has been used to solve many of these questions.

Other questions correspond to maximum independent sets in *hypergraphs*.

Examples include the Erdős matching conjecture and *s*-wise intersecting families.

Bachoc, Gundert and Passuello generalized the theta function to complexes.

We give a different generalization, and use it to solve Frankl’s triangle problem, as well as give new proofs to several known results such as Mantel’s theorem.

Our bound is reminiscent of Garland’s method, especially in its version due to Alev and Lau (arXiv:2001.02827, Theorem 1.5).

Joint work with Konstantin Golubev and Noam Lifshitz.

Razborov showed that the randomized communication complexity of set disjointness is $\Theta(n)$.

What is the hidden constant? Amazingly, Braverman et al. were able to calculate it to be $0.4827\ldots$, using information complexity.

For their upper bound, Braverman et al. introduced the buzzer protocol, an information-optimal continuous-time protocol.

Following up on their ideas, we show how the buzzer protocol – indeed, any protocol – can be profitably viewed as a random walk.

As an application, we show that the randomized communication complexity of set disjointness with error epsilon is $0.4827\ldots n – \Theta(h(\epsilon) n)$.

Our work suggests the following challenge: show that there is an information-optimal random walk protocol for *every* function.

Joint work from 2017 with Yuval Dagan (then Technion, now MIT), Hamed Hatami (McGill) and Yaqiao Li (McGill).

How many arithmetic operations does it take to multiply two square matrices? This question has captured the imagination of computer scientists ever since Strassen showed in 1969 that $O(n^{2.81})$ operations suffice. We survey the classical theory of fast matrix multiplication, starting with Strassen’s algorithm and culminating with the state-of-the-art Coppersmith–Winograd algorithm and its recent improvements. We also describe Coppersmith’s $O(n^2\log^2n)$ algorithm for multiplying rectangular matrices, used by Ryan Williams in his separation of ACC and NEXP.

This part of the two-lecture series, based on my earlier TSS talks, discusses the work of Coppersmith and Winograd, and its follow-ups.

How many arithmetic operations does it take to multiply two square matrices? This question has captured the imagination of computer scientists ever since Strassen showed in 1969 that $O(n^{2.81})$ operations suffice. We survey the classical theory of fast matrix multiplication, starting with Strassen’s algorithm and culminating with the state-of-the-art Coppersmith–Winograd algorithm and its recent improvements. We also describe Coppersmith’s $O(n^2\log^2n)$ algorithm for multiplying rectangular matrices, used by Ryan Williams in his separation of ACC and NEXP.

This part of the two-lecture series, based on my earlier TSS talks, culminates with Schönhage’s asymptotic sum inequality.

Bonet, Pitassi and Raz and, independently, Krajíček came up with the idea to use feasible interpolation as a vehicle for lower bounds on proof systems. Bonet et al. presented their construction in a somewhat ad hoc manner, and Krajíček used a theorem of Razborov generalizing Karchmer–Wigderson games. We provide a different interpretation of their arguments in terms games which are more suitable than the ones considered by Razborov.

Our account includes three flavors of arguments: ones following Bonet et al., ones following Krajíček, and ones following a suggestion of Neil Thapen. The different arguments prove lower bounds for slightly different formulas.

Introduction to my research, focusing on three topics: computational complexity, Boolean function analysis, and combinatorics.

Meant for prospective graduate students.

How difficult is it to determine whether a planar graph can be colored using 2 colors? 3 colors? 4 colors?

How difficult is it to convince somebody that a graph cannot be 3-colored? Can we do it without enumerating all possible 3-colorings?

Talk given at the Technion Math Club for an audience of high-schoolers.