Talks

Randomized Littlestone dimension

12 March 2024, BGU colloquium: Randomized online learning abstractslides

Online learning is a classical task in which a student is challenged by her teacher to answer a sequence of Yes/No questions. After each reply, the student is told the correct answer, and her goal is to minimize the number of mistakes that she makes. In order to make the task solvable, we assume that the teacher’s answers are consistent with some hypothesis class.

Littlestone defined in 1988 a combinatorial dimension named after him which captures the optimal mistake bound as a function of the hypothesis class. His work assumed that the student chooses her answer deterministically.

We extend Littlestone’s work to the more realistic scenario in which the student is allowed to toss coins.As an application, we determine the optimal randomized mistake bound for the classical problem of prediction with expert advice.

Joint work with Steve Hanneke, Idan Mehalel, and Shay Moran.

Littlestone dimension gives the optimal mistake bound in online learning. The randomized Littlestone dimension gives the optimal mistake bound when the learner is randomized.

Whereas the Littlestone dimension is the maximum minimum depth of a shattered tree, the randomized Littlestone dimension is half the maximum expected depth of a shattered tree, where the expectation is with respect to the random process which starts at the root and at each step, chooses a child at random.

Joint work with Steve Hanneke, Idan Mehalel, and Shay Moran.

Boolean function analysis on the symmetric group

17 July 2024, Technion combinatorics seminar: Intersection theorems using certificates abstractnotes

A family of permutations $\mathcal{F} \subseteq S_n$ is $2$-intersecting if any two permutations agree on at least $2$ points.

Ellis, Friedgut and Pilpel showed that for large enough $n$, such families contain at most $(n-2)!$ permutations, and Ellis characterized the extremal families.

Meagher and Razafimahatratra extended this to all $n \geq 2$, but were unable to characterize the extremal families. Using ideas borrowed from theoretical computer science, we characterize the extremal families.

Joint work with Gilad Chase, Neta Dafni, Noam Lifshitz, Nathan Lindzey, and Marc Vinyals.

24 May 2023, Technion combinatorics seminar: Nearly linear functions on the symmetric group abstractnotes

A linear function on the symmetric group $S_n$ is specified by an $n \times n$ real matrix $A$, and its value at a permutation $\pi$ is
\[ \sum_i A(i,\pi(i)). \]
Such functions arise naturally in the analysis of the Erdős–Ko–Rado theorem on the symmetric group.

In this talk, we address the following questions:

What can we say about $f$ if $f(\pi) \in \{0,1\}$ for all $\pi$?
What can we say about $f$ if $f(\pi) \in \{0,1\}$ for most $\pi$?
What can we say about $f$ if $f(\pi)$ is close to $\{0,1\}$ for all $\pi$?
What can we say about $f$ if $f(\pi)$ is close to $\{0,1\}$ on average (in $L_2$)?

Based on this paper.

15 May 2023, AGT online seminar: Orthogonal basis of eigenvectors for the Johnson and Kneser graphs abstractnotes

23 April 2023, Bar-Ilan: "Fourier expansion" for functions on the symmetric group abstractnotes

Boolean function analysis studies functions on the hypercube $\{-1,1\}^n$.

Its starting point is the Fourier expansion, which is the unique way to express a given function as a multilinear polynomial.

What happens if we study functions on other domains, such as the “slice” or the symmetric group?

The talk is based on this paper on the slice and this paper on the symmetric group.

Joint work with Nathan Lindzey (Technion).

17 April 2023, HUJI: "Fourier expansion" for functions on the symmetric group abstractnotes

Boolean function analysis studies functions on the hypercube $\{-1,1\}^n$.

Its starting point is the Fourier expansion, which is the unique way to express a given function as a multilinear polynomial.

What happens if we study functions on other domains, such as the “slice” or the symmetric group?

The talk is based on this paper on the slice and this paper on the symmetric group.

Joint work with Nathan Lindzey (Technion).

22 November 2021, Technion: Square matrices where a random generalized diagonal is close to 0/1 abstract

Consider an $n\times n$ real matrix such that if we sum the values on a random “generalized diagonal” (a choice of one entry from each row and column), then

on average, it is close to $\{0,1\}$, in the sense that the expected squared distance is small; or
with probability close to 1, it is equal to 0 or 1; or
it is always within some small distance of 0 or 1.

What can we say about the matrix?

Is this really the correct question to ask?

18 October 2021, HUJI: When is a linear function on the symmetric group almost Boolean? abstractnotes

A real-valued function on $S_n$ is linear if it is a linear combination of indicators of the form “$i$ goes to $j$”.

Ellis, Friedgut and Pilpel showed that if a linear function on $S_n$ is Boolean then it is a dictator, that is, it depends on the image of some $i$ or on the inverse image of some $j$.

What can we say about linear functions on $S_n$ which are almost Boolean?

We answer this question completely for several notions of “almost”, improving on earlier results together with Ellis and Friedgut.

Talk based on this paper.

Bounded indistinguishability for simple sources

6 April 2021, TAU Theory Seminar abstractslides

Suppose that $X,Y$ are two distributions on $n$ bits.

What amount of indistinguishability is needed to fool $\mathsf{AC^0}$?

If $X,Y$ are arbitrary sources, then $\sqrt{n}$-indistinguishability doesn’t even suffice to fool OR.

In contrast, if $Y$ is the uniform distribution, then Braverman showed that $\mathrm{polylog}(n)$-indistinguishability suffices to fool all of $\mathsf{AC^0}$.

We explore what happens when $Y$ is a simple distribution, say samplable in low degree or locality.

Joint work with Andrej Bogdanov & K. Dinesh (CUHK), Yuval Ishai and Avi Kaplan (Technion), and Akshay Srinivasan (TIFR).

25 March 2021, Dagstuhl complexity wokshop abstractslides

11 March 2021, Oxford–Warwick Complexity Meetings abstractslides

Braverman’s theorem states that $\mathrm{polylog}(n)$-wise independence fools $\mathsf{AC^0}$. In contrast, since the approximate degree of OR is $\sqrt{n}$, there are two $\sqrt{n}$-wise indistinguishable distributions which do not even fool OR!

Motivated by cryptographic applications, and continuing earlier work of Bogdanov, Ishai, Viola and Williamson, we ask whether Braverman-style results can be recovered for simple distributions, such as ones samplable locally or using low degree polynomials.

We relate these questions to the notorious problem of computing inner product using $\mathsf{AC^0}$ circuits with parity gates at the bottom. This suggests that proving Braverman-style results for $\mathsf{AC^0}$ is hard even for linear sources, and so we turn to consider the OR function.

We prove a Braverman-style theorem for OR which holds for constant degree sources and for local sources; in contrast to Braverman’s theorem, we only require $O(1)$-wise indistinguishability. On the other hand, we show that logarithmic degree suffices to sample a pair of $\sqrt{n}$-wise indistinguishable sources which can be distinguished by OR. Our construction also results in a new visual secret sharing scheme.

Joint work with Andrej Bogdanov (CUHK), Krishnamoorthy Dinesh (CUHK), Yuval Ishai (Technion), Avi Kaplan (Technion), Akshayaram Srinivasan (TIFR).

Let $X,Y$ be $\mathrm{polylog}(n)$-indistinguishable. Braverman famously showed that if $Y$ is uniform then $X,Y$ fool $\mathsf{AC^0}$. In contrast, there exists a pair of $\Theta(\sqrt{n})$-indistinguishable distributions which do not even fool OR.

Under what conditions on $X,Y$ can we recover a Braverman-style theorem?

Joint work with Andrej Bogdanov, K. Dinesh, Yuval Ishai, Avi Kaplan, and Akshay Srinivasan.

Analysis of Boolean function on exotic domains

22 January 2019, CAALM (CMI): Discrete harmonic analysis: beyond the Boolean hypercube abstractslides

Analysis of Boolean functions is a topic at the intersection of computer science and combinatorics, which uses tools and techniques from probability theory and functional analysis.

It has become part of the fundamental topic of theoreticians working in computational complexity theory. Traditionally, the functions being studied live on the Boolean hypercube $\{0,1\}^n$.

Recently, attention has been drawn to other domains, with some spectacular applications, such as the recent proof of the 2-to-2 conjecture (a weaker variant of Khot’s Unique Games Conjecture).

In the talk, I will survey some elements of the classical theory, and then describe some recent progress.

6 March 2018, IAS: Boolean function analysis: beyond the Boolean cube (2/2) videoabstractnotes

Boolean function analysis traditionally studies Boolean functions on the Boolean cube, using Fourier analysis on the group $\mathbb{Z}_2^n$. Other domains of interest include the biased Boolean cube, other abelian groups, and Gaussian space. In all cases, the focus is on results which are independent of the dimension.

Recently, Boolean function analysis has expanded in scope to include other domains such as the symmetric group, the “slice”, and the Grassmann scheme. The latter has figured in the recent proof of the 2-to-1 conjecture (a step toward the unique games conjecture) by Dinur, Khot, Kindler, Minzer, and Safra.

In this talk, we systematically survey Boolean function analysis beyond the Boolean cube, focusing on two main points of view: association schemes and differential posets. We also highlight work on the slice, which is the most widely studied non-product domain.

5 March 2018, IAS: Boolean function analysis: beyond the Boolean cube (1/2) videoabstractslides

In this talk, we first survey classical Boolean function analysis, to set the scene. We then describe in some detail the recent work on the 2-to-1 conjecture, focusing on the connection to Boolean function analysis.

9 June 2016, Bar Ilan: Fourier analysis on the slice (?)

8 March 2016, Tel Aviv: Analysis on the slice abstractnotes

The $(n,k)$-slice is the subset of the Boolean cube $\{0,1\}^n$ consisting of all vectors of weight $k$.
It is the natural home for objects such as uniform intersecting families and $G(n,m)$ random graphs.
Fourier analysis has proven itself very useful on the Boolean cube.
Can we extend it to the slice?
Does it teach us anything new about the cube?
How different is the slice from the cube, after all?
(For the answers, you will have to attend the talk.)

7 December 2015, CMS winter meeting (Montréal) abstractslides

Analysis of Boolean functions is the study of 0/1-valued functions using spectral and analytic techniques. Lying at the interface of combinatorics, probability theory, functional analysis and theoretical computer science, it has been applied to random graph theory, percolation theory, coding theory, social choice theory, extremal combinatorics, and theoretical computer science.

Traditionally the functions being considered are on the Boolean cube $\{0,1\}^n$, or more rarely on other product domains, usually finite. We explore functions on more exotic domains such as finite groups and association schemes, concentrating on two examples: the symmetric group and the Johnson association scheme (the “slice”), which consists of all vertices in the Boolean cube with a specified weight.

We will survey a few classical results in analysis of Boolean functions on the Boolean cube and their generalizations to more exotic domains. On the way, we will explore questions such as: Which functions on the symmetric group are “dictatorships” (depend on one “coordinate”) or “juntas” (depend on a few “coordinates”)? Is there a “Fourier expansion” for functions on the slice? Is the middle slice a “representative” section of the Boolean cube?

Joint work with David Ellis, Ehud Friedgut, Guy Kindler, Elchanan Mossel, and Karl Wimmer.

27 February 2015, Toronto notes

23 September 2014, IAS: Analysis of Boolean functions on association schemes videoabstract

Structure theorems for low-degree functions

16–19 September 2024, HIM School (Bonn): Structure Theorems abstractvideosnotes

Motivated by Kalai’s approximate Arrow’s theorem, Friedgut, Kalai and Naor proved the fundamental FKN theorem, which states that Boolean functions on the Boolean cube $\{0, 1\}^n$ which are close to degree 1 with respect to the uniform measure are close to Boolean degree 1 functions.

In the first lecture, we discussed this application and provided a new proof of the FKN theorem, using the Berry–Esseen theorem.

In the second lecture, we stated and proved the biased FKN theorem, which extends the FKN theorem to the Boolean cube under biased product measures $\mu_p$ for small $p$.

In the third lecture, we completed the proof of the biased FKN theorem by proving the junta agreement theorem, and we described the FKN theorem for the symmetry group.

In the fourth and final lecture, we discussed generalizations of the foregoing to degree $d$.

5 February 2024, Probability and Analysis Webinar: Sparse "juntas" videoabstractslides

Boolean function analysis concerns functions $f\colon \{0,1\}^n \to \{0,1\}$.

We often consider the behavior of $f$ when its input is chosen uniformly at random from $\{0,1\}^n$.

Sometimes it is more natural to consider a biased input — the classical example being the random graph model $G(n,p)$.

Analyzing functions with respect to a biased measure often reveals phenomena that do not occur under an unbiased measure.

We describe one such example: the structure of functions which are almost of low degree.

Joint work with Irit Dinur (Weizmann institute) and Prahladh Harsha (TIFR).

28 July 2023, Simons institute: Low degree functions which are almost Boolean videoabstractnotes

The FKN theorem states that if a Boolean function on the Boolean cube is close to degree 1, then it is close to a dictator. Similarly, the Kindler–Safra theorem states that if a Boolean function on the Boolean cube is close to degree $d$, then it is close to a junta.

The picture becomes more interesting when we study functions on the $p$-biased Boolean cube.

If a Boolean function is $\varepsilon$-close to degree 1 then (up to negation) it is $O(\varepsilon)$-close to a maximum of $O(\sqrt{\varepsilon}/p+1)$ coordinates, and so $O(\sqrt{\varepsilon}+p)$-close to a constant function.

A similar statement holds for functions close to degree d, but the corresponding structure is more complicated.

Another setting we might discuss is functions on the symmetric group. If a Boolean function is 𝜀-close to degree 1 then it is $O(\sqrt{\varepsilon})$-close to a dictator (suitably defined), and $O(\varepsilon)$-close (up to negation) to a union of “mostly disjoint” cosets. A similar statement should hold for degree $d$, where the function ought to be close to a (suitably defined) decision tree of depth $\mathit{poly}(d)$.

Joint work with Irit Dinur (Weizmann institute) and Prahladh Harsha (TIFR).

10 January 2019, Bar Ilan: Almost Boolean functions of degree 1 on the symmetric group abstractnotes

One of the first results in Boolean function analysis is the FKN theorem, which states that an almost Boolean function of degree 1 on the hypercube is close to a dictator.

The question becomes more challenging in other domains, or indeed, even on the hypercube with respect to a biased measure.

In this talk we will focus on the symmetric group.

If we think of the symmetric group as the group of permutation matrices, then a function has degree 1 if it is a linear combination of input entries.

We show that in the balanced case, an analog of the FKN theorem holds for the symmetric group as well, with the correct definition of “dictator”.

Joint work with David Ellis and Ehud Friedgut, published in 2015 (EFF2).

24 March 2019, Tel Aviv: Structure of almost Boolean low degree functions abstract

11 March 2019, Hebrew University: Structure of (almost) low-degree Boolean functions abstract

Boolean function analysis studies (mostly) Boolean functions on $\{0,1\}^n$. Two basic concepts in the field are degree and junta. A function has degree $d$ if it can be written as a degree $d$ polynomial. A function is a $d$-junta if it depends on d coordinates. Clearly, a $d$-junta has degree $d$. What about the converse (for Boolean functions)? What if the Boolean function is only close to degree $d$?

The questions above were answered by Nisan–Szegedy, Friedgut–Kalai–Naor, and Kindler–Safra. We consider what happens when we ask the same questions on different domains. The domains of interest include the biased Boolean cube, the slice, the Grassmann scheme, the symmetric group, high-dimensional expanders, and others.

Joint work with Yotam Dikstein, Irit Dinur, Prahladh Harsha, and Ferdinand Ihringer.

8 April 2014, Columbia: Structure theorems for almost low degree functions abstractslides

2 December 2013, Simons institute: Friedgut–Kalai–Naor for Boolean functions on $S_n$ videoabstractnotes

The classical Friedgut–Kalai–Naor (FKN) theorem states that if most of the Fourier spectrum of a boolean function on $\{0,1\}^n$ is concentrated on the first two levels then the function essentially depends on one coordinate. We extend this result to boolean functions on $S_n$ whose Fourier spectrum is concentrated on the first two irreducible representations. When the function is balanced, we show that it essentially depends on either the image or the inverse image of one point. When the function is skewed, we show that it is close to a disjunction of double cosets (indicators of events of the type “$i$ maps to $j$”).

Papers covered: EFF1, EFF2.

Approximate polymorphisms

8 Oct 2024, HIM: Approximate Polymorphisms videoabstractnotes

A function $f$ which satisfies $f(x \oplus y) = f(x) \oplus f(y)$ for all $x,y$ is an XOR of a subset of the coordinates.

Moreover, if a function $f$ satisfies this for most inputs, then it is close to an XOR — a classical stability result known in theoretical computer science as linearity testing.

We discuss what happens if we replace ⊕ with a different operation (not necessarily binary), showing that stability still holds (with a wrinkle).

The proof involves Jones’ regularity lemma (a regularity lemma for Boolean functions) as well as the It Ain’t Over Till It’s Over theorem.

Joint work with Gilad Chase, Dor Minzer, Elchanan Mossel, and Nitin Saurabh.

2 May 2022, BGU: Approximate Polymorphisms abstractnotes

A Boolean function $f\colon \{0,1\}^n \to \{0,1\}$ satisfies $f(x \oplus y) = f(x) \oplus f(y)$ for all $x,y$ iff it is linear.

A classical result, linearity testing, states that if $f(x \oplus y) = f(x) \oplus f(y)$ holds for most $x,y$, then $f$ is close to a linear function.

We extend this result to arbitrary functions beyond XOR.

Joint work with Gilad Chase, Noam Lifshitz, Dor Minzer, Elchanan Mossel, and Nitin Saurabh.

3 November 2021, Technion: Approximate Polymorphisms

13 Aug 2021, CU Boulder: Approximate Polymorphisms abstractslides

Which Boolean functions $f\colon \{0,1\}^n \to \{0,1\}$ satisfy $f(x⊕y)=f(x)⊕f(y)$

… for all $x,y$? linear functions (100% regime).

… for almost all $x,y$? functions close to linear functions (99% regime).

… for many $x,y$, more than random functions? functions correlating with a linear function (51% regime).

All of these results are classical.

We study what happens when the XOR function is replaced by an arbitrary Boolean function.

Based on joint work with Gilad Chase (Technion), Dor Minzer (MIT), Elchanan Mossel (MIT), Nitin Saurabh (IIIT Hyderabad).

27 May 2021, Bar-Ilan: Approximate Polymorphisms abstractnotes

An $n$-bit function $f$ is a polymorphism of an $m$-bit function $g$ if $f \circ g^n = g o f^m$.
For example, an $n$-bit function $f$ is a polymorphism of the 2-bit function $g=\mathsf{XOR}$ if $f(x + y) = f(x) + f(y)$ for all $x,y$.

It is known that all exact polymorphisms of XOR are XORs, and furthermore, all approximate polymorphisms of XOR are close to XORs — this is the classical linearity testing.

We determine all approximate polymorphisms of $g$ for an arbitrary function $g$.

In addition, we consider “list decoding” variants of this question.

12 April 2021, MIT online seminar videoabstractslides

Call $f\colon \{0,1\}^n \to \{0,1\}$ linear if $f(x\oplus y) = f(x)\oplus f(y)$ for all $x,y$, and multiplicative if $f(x\cdot y) = f(x)\cdot f(y)$ for all $x,y$.

A function $f$ is linear if it is an XOR of a subset of inputs, and multiplicative if it is an AND of a subset of inputs (or zero).

Blum, Luby and Rubinfeld (1993) showed that we can test whether $f$ is linear by checking the condition $f(x\oplus y) = f(x)\oplus f(y)$ for random $x,y$.

Answering (in a sense) a question of Parnas, Ron and Samorodnitsky (2002), and improving on results of Ilan Nehama (2013), we show that a similar statement holds for testing whether f is multiplicative.

Joint work (2020) with Noam Lifshitz (HUJI), Dor Minzer (MIT), and Elchanan Mossel (MIT).

7 March 2021, Tel Aviv: Approximate Polymorphisms abstractslides

A Boolean function $f\colon \{0,1\}^n \to \{0,1\}$ is a polymorphism of XOR if $f(x+y) = f(x)+f(y)$ for all $x,y$ (addition is modulo 2).

It is well-known that a function is a polymorphism of XOR iff it is an XOR of a subset of the coordinates.

A celebrated result in theoretical computer science (“linearity testing”) extends this to approximate polymorphisms of XOR:

if $f(x+y) = f(x)+f(y)$ for most $x,y$, then $f$ is close to an exact polymorphism.

We show that similar results hold for other functions, such as AND and Majority:

approximate polymorphisms of AND are close to ANDs, and approximate polymorphisms of Majority essentially depend on a single coordinate.

Joint work with Noam Lifshitz (HUJI), Dor Minzer (MIT), and Elchanan Mossel (MIT).

25 May 2020, HUJI: Oligarchy testing abstractnotes

Arrow’s impossibility theorem states that the only voting rule satisfying certain natural requirements is a dictatorship.

Gil Kalai showed that even if we relax some of these requirements so that they only hold with high probability, we are not getting genuine new voting rules.

Arrow’s theorem is just one of many paradoxes in social choice theory.

Another example, from judgment aggregation, states that the only voting rules satisfying the equation $ f(x \land y) = f(x) \land f(y)$ for all $x,y \in \{0,1\}^n$ are oligarchies (ANDs).

Ilan Nehama asked whether relaxing this requirement so that it only holds with high probability can help.

He proved that the answer is negative – voting rules satisfying the relaxed condition must be close to an AND.

However, his result degrades with $n$, the number of voters.

We prove a similar result which is independent of $n$.

Our result can also be seen as the analog of linearity testing when changing $+$ to $\times$.

Joint work with Noam Lifshitz (HUJI), Dor Minzer (IAS) and Elchanan Mossel (MIT).

17 March 2020, Technion: Property Testing meets Universal Algebra: Oligarchy Testing slides

28 February 2020, TIFR: Universal Algebra meets Property Testing: Oligarchy Testing abstractnotes

It is well-known that the only functions satisfying $f(x \oplus y) = f(x) \oplus f(y)$ are XORs, and moreover this holds robustly: if $f(x \oplus y) = f(x) \oplus f(y)$ for most $x,y$, then $f$ is close to an XOR.

We show that a similar statement holds with XOR replaced by AND.

Both of these results are examples of robust judgment aggregation, which also includes such seemingly unrelated statements as Kalai’s robust version of Arrow’s impossibility theorem.

Joint work with Noam Lifshitz (Hebrew University), Dor Minzer (IAS, Princeton) and Elchanan Mossel (MIT).

9 October 2019, Warwick: One-sided linearity testing

27 September 2019, Oxford: One-sided linearity testing abstractnotes

A classical property testing result states that if a Boolean function $f$ satisfies $f(x \oplus y) = f(x) \oplus f(y)$ for most inputs $x,y$, then $f$ is close to an XOR.

Prompted by an application to approximate judgment aggregation, we discuss what happens when replacing XOR with AND.

The solution involves a one-sided analog of the familiar noise operator of Boolean Function Analysis.

Joint work with Noam Lifshitz, Dor Minzer, and Elchanan Mossel.

Linearity testing states that if $\Pr[f(x \oplus y) = f(x) \oplus f(y)] \approx 1$ then $f \approx \mathsf{XOR}$.

Together with Noam Lifshitz, Dor Minzer and Elchanan Mossel, we showed that if $\Pr[f(x \land y) = f(x) \land f(y)] \approx 1$ then $f \approx \mathsf{AND}$.

Together with Gilad Chase, Dor Minzer and Nitin Saurabh, we extended this for arbitrary inner functions (replacing $\oplus$ or $\land$).

Twenty (simple) questions

26 April 2021, HUJI Computer Science seminar videoabstractslides

I am thinking of a number between 1 and 1000000.

How many Yes/No questions do you need to reveal my number?

This is the famous Twenty Questions game, usually not played with numbers.

To spice up the game, I am choosing my number according to some distribution which you know, and your goal is to minimize the expected number of questions.

This problem was famously solved by Huffman in 1951.

However, the algorithm his method implies involves rather cumbersome questions.

What can we accomplish using simple questions?

What if I am allowed to lie a few times?

Joint work with Yuval Dagan, Ariel Gabizon, Daniel Kane, and Shay Moran.

10 June 2019, Weizmann Institute abstract

20 questions is one of the simplest examples of a combinatorial search game: Alice thinks of an English word, and Bob’s job is to figure it out by asking Yes/No questions.

The worst-case complexity of the game is clearly $\log n$, so to spice things up, we assume that Alice chooses her input according to some distribution known to Bob, and Bob now aims to minimize the expected number of questions.

An optimal solution to this problem was found already in 1952.

However, the optimal strategy could ask arbitrarily complex Yes/No questions.

We ask what happens when we constrain Bob to asking only “simple” questions, and what happens if Alice is allowed to lie a few times.

Joint work with Yuval Dagan (MIT), Ariel Gabizon (?), Daniel Kane (UCSD) and Shay Moran (IAS).

29 January 2019, TIFR: Twenty Questions: Distributional Search Theory abstract

I’m thinking of a person in the audience. How long will it take you to find whom, using only Yes/No questions?

We consider this puzzle, which underlies search theory, with a twist:
I’m choosing the person according to a known distribution, and your goal is to minimize the expected number of questions.
How does the performance of your strategy depend on the type of question you’re allowed to ask? On the type of distribution I am allowed to choose? What happens if I can lie?

Joint work with Yuval Dagan (MIT), Ariel Gabizon (Zcash), Daniel Kane (UCSD), Shay Moran (IAS).

6 June 2018, HALG (Amsterdam) slides

13 June 2017, TAU abstract

The twenty questions game is a two-player game at the basis of combinatorial search theory.

Alice thinks of a number from 1 to $n$ according to a distribution known to Bob,

and Bob identifies the number by asking Alice several Yes/No questions.

Bob’s goal is to minimize the expected number of questions he asks.

Bob’s optimal strategy, which is often taught in undergraduate algorithms courses,

uses arbitrary Yes/No questions, which is a bit unrealistic.

What happens if we restrict Bob to a small “repertoire” of questions?

Can we recover the performance of the optimal algorithm with a smaller repertoire?

Joint work with Yuval Dagan (Technion), Ariel Gabizon (Zcash) and Shay Moran (UCSD).

4 June 2017, Ariel abstract

28 May 2017, IMU abstract

Huffman coding has a search-theoretic interpretation as the optimal strategy for the twenty questions game.

In this game, Alice chooses $x \in \{1,\dots,n\}$ according to a distribution $\mu$, and Bob identifies $x$ using yes/no questions. Bob’s goal is to use the minimum number of questions in expectation. A strategy for Bob corresponds to a prefix code for $\{1,\dots,n\}$, and this shows that Bob’s optimal strategy uses a Huffman code for $\mu$. However, this strategy could use arbitrary yes/no questions. What happens when we restrict the set of yes/no questions that Bob is allowed to use?

Joint work with Yuval Dagan, Ariel Gabizon, and Shay Moran.

29 March 2017, Open University abstract

The twenty questions game is a two-player game at the basis of combinatorial search theory.

Alice thinks of a number from 1 to $n$ according to a distribution known to Bob, and Bob identifies the number by asking Alice several Yes/No questions.

Bob’s goal is to minimize the expected number of questions he asks.

Bob’s optimal strategy, which is often taught in undergraduate algorithms courses, uses arbitrary Yes/No questions, which is a bit unrealistic.

What happens if we restrict Bob to a small collection of questions?

Can we recover the performance of the optimal algorithm with a smaller collection?

Joint work with Yuval Dagan (Technion), Ariel Gabizon (Zcash) and Shay Moran (IAS).

2 March 2017, MIT abstractnotes

The game of 20 questions is a search-theoretic interpretation of entropy.

Alice chooses a number $x \in \{1,\dots,n\}$ according to a known distribution $\mu$, and Bob identifies $x$ using Yes/No questions. Bob’s goal is to minimize the expected number of questions until x is identified, and his optimal strategy uses about $H(\mu)$ questions on average. However, it might invoke arbitrary Yes/No questions.

Are there restricted sets of questions that match the optimal strategy, either exactly or approximately?

We explore this question from several different angles, uncovering a connection to binary comparison search trees, a variant of binary search trees.

Joint work with Yuval Dagan, Ariel Gabizon and Shay Moran, to appear in STOC 2017.

23 September 2016, Toronto: Complexity-aware encoding abstractnotes

A classic topic in Information Theory is that of minimum redundancy
encoding. Given a distribution $\mu$, how many bits are needed on average to encode data distributed according to $\mu$?
One optimal solution to this problem is given by Huffman’s algorithm,
which produces a code whose average codeword length is at most $H(\mu) + 1$, where $H(\mu)$ is the entropy of $\mu$.

The process of encoding data using a prefix-free code can be represented by a decision tree. Codes produced by Huffman’s algorithm, while having minimum redundancy, use arbitrary queries at the nodes of the resulting decision tree.
In this talk we ask what happens when we restrict the set of queries.

We answer the following two questions, among others:

How big should the set of allowed queries be to obtain codes whose cost is at most $H(\mu) + 1$?
How big should the set of allowed queries be to match the
performance of Huffman codes?

On the way, we tightly analyze the redundancy of binary comparison
search trees.

Joint work with Yuval Dagan (Technion), Ariel Gabizon (Technion), and Shay Moran (Technion & UCSD).

Based on our work with Yuval Dagan, Ariel Gabizon and Shay Moran, and on our work with Yuval Dagan, Daniel Kane and Shay Moran.

Limitations of the Coppersmith–Winograd technique

10 May 2016, Ben-Gurion University: On the Coppersmith–Winograd approach to matrix multiplication abstract

Ever since Strassen’s $O(n^{2.81})$ matrix multiplication algorithm stunned the mathematical community, the quest for fast matrix multiplication algorithms has been a holy grail in computer science. At first progress was fast, culminating in Coppersmith and Winograd’s $O(n^{2.376})$ algorithm of 1987. Recently interest in the area has reawakened due to work of Stothers, Vassilevska-Williams and Le Gall, who managed to improve the exponent slightly from 2.376 to 2.373.

Roughly speaking, Coppersmith and Winograd constructed an edifice turning an “identity” (some kind of mathematical object) into a fast matrix multiplication algorithm. Applying their method on the identity $T_{CW}$, they obtained an $O(n^{2.388})$ algorithm. Applying it to $T_{CW}^2$ (identities can be squared!), they obtained their $O(n^{2.376})$ algorithm. They stopped there since computing was a lot more cumbersome in the 1980s. Using modern computers, Stothers, Vassilevska-Williams and Le Gall were able to analyze higher powers of $T_{CW}$ (up to $T_{CW}^{32}$), and so reduced the exponent to 2.373.

Our talk answers the following question:

What is the limit of this approach?

We show that this approach cannot yield an exponent smaller than 2.372.
No prior knowledge will be assumed.

Joint work with Andris Ambainis (University of Latvia) and François Le Gall (University of Tokyo).

6 April 2016, Ariel: On the Coppersmith–Winograd approach to matrix multiplication abstract

Our talk answers the following question:

What is the limit of this approach?

We show that this approach cannot yield an exponent smaller than 2.372.
No prior knowledge will be assumed.

Joint work with Andris Ambainis (University of Latvia) and François Le Gall (University of Tokyo).

4 January 2015, Tel Aviv

31 December 2014, Technion abstractslides

How fast can we multiply two $n \times n$ matrices?

Strassen (1969) showed how to beat the trivial $O(n^3)$ algorithm: he gave an $O(n^{2.81})$ algorithm.

Until recently, the fastest known algorithm, due to Coppersmith and Winograd (1987), ran in time $O(n^{2.376})$.

Recent improvements, extending the work of Coppersmith and Winograd, have culminated in an algorithm of Le Gall (2014) running in time $O(n^{2.3729})$.

We show that this approach cannot yield an algorithm running faster than $O(n^{2.3725})$.

Coppersmith and Winograd use Strassen’s laser method together with an identity and its square to obtain their algorithm.

The recent improvements analyze higher powers of the identity, the 32nd resulting in Le Gall’s algorithm.

We introduces a new framework, the laser method with merging, which can be used to analyze all powers of the identity at once.

Using our new framework, we can show that no power of the identity can result in an $O(n^{2.3725})$ algorithm or better.

Our new framework can potentially lead to even faster algorithms not corresponding to powers of the identity.

Joint work with Andris Ambainis (University of Latvia) and Francois Le Gall (University of Tokyo).

No familiarity with algebraic complexity theory will be assumed.

24 December 2014, Hebrew University

Based on our work together with Andris Ambainis and François Le Gall.

Monotone submodular maximization via non-oblivious local search

15 January 2020, FeigeFest: Foresight in submodular optimization abstractslides

Feige showed in 1998 that the greedy algorithm is optimal for Set Cover and for its dual Max Cover. The goal of Max Cover is to choose a given number of sets which together cover the maximum number of elements. Partition Max Cover is the variant in which sets have a color, and we need to choose one set of each color; this generalizes Max SAT. Greedy is no longer optimal, and standard approaches use an LP or other relaxation followed by rounding. We describe an optimal combinatorial algorithm, which uses local search with foresight. Our algorithm can also be adapted to the more general case of monotone submodular optimization.

Joint work with Justin Ward (QMUL) from 2014.

8 June 2016, Ben-Gurion University abstract

Monotone submodular maximization over a matroid (MSMM) is a generalization of the classical NP-complete problem of Maximum Coverage, in which the objective coverage function is replaced by an arbitrary monotone submodular function, and the cardinality constraint is replaced by an arbitrary matroid constraint.

The greedy algorithm gives a $1-1/e$ approximation to Maximum Coverage, which is best possible unless P=NP (Feige).

In contrast, the greedy algorithm only gives a 1/2 approximation to MSMM.

Vondrák and his coauthors came up with an elegant $1-1/e$ approximation algorithm for MSMM called Continuous Greedy, which is an “interior point” algorithm.

We describe a “simplex” algorithm that also gives a $1-1/e$ approximation for MSMM, using the neglected technique of non-oblivious local search.

Joint work with Justin Ward (EPFL).

5 October 2015, HIM (Bonn) videoslides

7 October 2014, IAS videonotes

22 January 2014, DIMACS summary

10 October 2011, China Theory Week (Aarhus) abstractslides

We present an optimal $1-1/e$ approximation algorithm for maximum coverage over a matroid constraint using non-oblivious local search.

Calinescu, Chekuri, Pál and Vondrák have given an optimal $1-1/e$ approximation algorithm for the more general problem of monotone submodular optimization over a matroid constraint. Our algorithm is much simpler and faster, and is completely combinatorial.

We are working on extending our algorithm to handle arbitrary monotone submodular functions.

Joint work with Justin Ward (University of Toronto).

Maximizing a monotone submodular function over a matroid constraint is a fundamental combinatorial optimization problem, generalizing problems such as Maximum Coverage and MAX-SAT.

Calienscu, Chekuri, Pál and Vondrák devised the continuous greedy algorithm, which gives the optimal $1-1/e$ approximation ratio.

Together with Justin Ward, we have come up with an alternative algorithm which is more combinatorial in nature. We first devised an algorithm for the special case of maximum coverage, and then extended it to the general case.

Triangle-intersecting families of graphs

20 February 2019, TIFR: Applications of the Hoffman/Lovász bound abstractnotes

5 December 2015, CMS winter meeting (Montréal) slides

29 January 2015, Yale

4 February 2014, Rutgers notes

8 November 2013, Simons institute notes

6 May 2011, Ontario Combinatorics Workshop (Ryerson) abstractslides

A family of graphs (sharing the same finite vertex set) is triangle-intersecting if the intersection of any two graphs in the family contains a triangle. Simonovits and Sós conjectured in 1976 that such a family can contain at most 1/8 of the graphs; this bound is achieved by taking a “triangle-junta” (all graphs containing a fixed triangle).

Up to the present work, the best known upper bound on the size of a triangle-intersecting family was 1/4 (Chung, Graham, Frankl & Shearer 1981). We prove the conjecture, showing moreover that the only families achieving the bound are triangle-juntas.

The result follows a line of work by Friedgut and his coauthors. The proof uses rudimentary discrete Fourier analysis, and some “entry-level” graph theory; the talk will focus on the former.

Joint work with David Ellis (Cambridge) and Ehud Friedgut (Hebrew University).

An exposition of Triangle-intersecting families of graphs, our joint work with David Ellis and Ehud Friedgut, in which we show that if $\mathcal{F}$ is a family of graphs on $n$ points such that any two graphs in $\mathcal{F}$ contain a triangle, then $\mathcal{F}$ contains at most $1/8$ of the graphs.

The proof uses the spectral approach of Hoffman and Lovász.

Miscellaneous

3 September 2023, IMU: Irreducible subcube partitions abstractslides

A subcube partition is a partition of $\{0,1\}^n$ into axis-aligned subcubes.

A partition is irreducible if the union of no nontrivial subpartition is itself a subcube.

How small can an irreducible subcube partition be? How large?

Do their exist irreducible subcube partitions consisting only of edges?

What if we replace “subcube” with “affine subspace”?

Joint work with Edward Hirsch (Ariel), Sascha Kurz (Bayreuth), Ferdinand Ihringer (Gent), Artur Riazanov (EPFL), Alexander Smal (Technion), Marc Vinyals (Auckland, NZ).

28 July 2022, Dagstuhl: Information complexity abstractslides

Alice holds an input $x$ and Bob holds an input $y$. Each party only initially knows their own input.

They wish to compute a function $f(x,y)$ depending on both inputs (allowing a negligible error), minimizing the amount of bits exchanged between the parties.

The minimum amount of communication needed to compute $f$ is known as its communication complexity, which is a prominent concept in computational complexity.

Braverman and Rao (2011) showed that if Alice and Bob want to compute the value of $f$ on many different inputs, then the amortized communication complexity is given by a measure called information complexity, which had been defined by Bar-Yossef, Jayram, Kumar and Sivakumar (2004) as a lower bound technique.

Perhaps the most spectacular application of information complexity is the work of Braverman, Garg, Pankratov and Weinstein (2012), in which they determined the communication complexity of set disjointness (given two subsets $x,y$ of $\{1,\dots,n\}$, determine whether they are disjoint) to be $cn + o(n)$, where $c \approx 0.4827$ is an explicitly computable constant.

Information complexity has also been applied to other areas of computational complexity, though I won’t have time to describe these connections.

Some examples include extended formulations (Braverman and Moitra, 2012) and parallel repetition (Braverman and Garg, 2014).

6 December 2021, AIM: Intersecting families and Hoffman's bound abstractslides

24 November 2020, UCLA: Sauer–Shelah–Perles lemma for lattices videoabstractslides

The Sauer–Shelah–Perles (SSP) lemma is a fundamental result in VC theory, with important applications in statistical learning theory.

It bounds the number of sets in a family in terms of the size of the universe and the VC dimension.

We generalize the SSP lemma to some lattices, such as the lattice of subspaces of a finite-dimensional vector space over a finite field.

The SSP lemma fails for some lattices, and we identify a local obstruction which we conjecture is the only reason for such failure.

The talk will not assume any familiarity with VC theory or with statistical learning theory.

Joint work with Stijn Cambie (Raboud University Nijmegen), Bogdan Chornomaz (Vanderbilt University), Zeev Dvir (Princeton University) and Shay Moran (Technion).

24 June 2020, HUJI: Degree and sensitivity on the symmetric group abstractnotes

Noam Nisan conjectured in 1991 that the degree of a Boolean function on {0,1}^n can be polynomially bounded in terms of its sensitivity.

Last year, Hao Huang proved this conjecture using a short but cunning argument.

As a result, we now know that many different complexity measures for Boolean functions on $\{0,1\}^n$ are polynomially related.

These include degree, approximate degree, decision tree complexity, certificate complexity, sensitivity, block sensitivity, and many more.

Do these questions make sense for functions on other domains?

We show that the entire theory can be generalized to the symmetric group and many other domains.

Joint work with Noam Lifshitz (HUJI), Nathan Lindzey (CU Boulder), Marc Vinyals (Technion).

30 January 2020, Princeton: Hoffman bound for complexes abstractnotes

Many questions in extremal combinatorics can be framed as asking for the maximum independent sets in a graph.

The prototypical example is the Erdős-Ko-Rado theorem and many of its generalizations.

The Hoffman bound, closely related to the Lovász theta function, has been used to solve many of these questions.

Other questions correspond to maximum independent sets in hypergraphs.

Examples include the Erdős matching conjecture and s-wise intersecting families.

Bachoc, Gundert and Passuello generalized the theta function to complexes.

We give a different generalization, and use it to solve Frankl’s triangle problem, as well as give new proofs to several known results such as Mantel’s theorem.

Our bound is reminiscent of Garland’s method, especially in its version due to Alev and Lau (arXiv:2001.02827, Theorem 1.5).

Joint work with Konstantin Golubev and Noam Lifshitz.

27 December 2018, Bar Ilan: Information complexity: a random walk perspective abstractnotes

Razborov showed that the randomized communication complexity of set disjointness is $\Theta(n)$.

What is the hidden constant? Amazingly, Braverman et al. were able to calculate it to be $0.4827\ldots$, using information complexity.

For their upper bound, Braverman et al. introduced the buzzer protocol, an information-optimal continuous-time protocol.

Following up on their ideas, we show how the buzzer protocol – indeed, any protocol – can be profitably viewed as a random walk.

As an application, we show that the randomized communication complexity of set disjointness with error epsilon is $0.4827\ldots n – \Theta(h(\epsilon) n)$.

Our work suggests the following challenge: show that there is an information-optimal random walk protocol for every function.

Joint work from 2017 with Yuval Dagan (then Technion, now MIT), Hamed Hatami (McGill) and Yaqiao Li (McGill).

2 December 2015, Technion: An instance of subset sum hard for cutting planes abstractnotes

4 March 2014, IAS: Fast matrix multiplication (2/2) videoabstractnoteshandout

How many arithmetic operations does it take to multiply two square matrices? This question has captured the imagination of computer scientists ever since Strassen showed in 1969 that $O(n^{2.81})$ operations suffice. We survey the classical theory of fast matrix multiplication, starting with Strassen’s algorithm and culminating with the state-of-the-art Coppersmith–Winograd algorithm and its recent improvements. We also describe Coppersmith’s $O(n^2\log^2n)$ algorithm for multiplying rectangular matrices, used by Ryan Williams in his separation of ACC and NEXP.

This part of the two-lecture series, based on my earlier TSS talks, discusses the work of Coppersmith and Winograd, and its follow-ups.

25 February 2014, IAS: Fast matrix multiplication (1/2) videoabstractnoteshandout

This part of the two-lecture series, based on my earlier TSS talks, culminates with Schönhage’s asymptotic sum inequality.

25 June 2011, LCC (part of LICS): Monotone feasible interpolation as games abstractslidesnotes

Bonet, Pitassi and Raz and, independently, Krajíček came up with the idea to use feasible interpolation as a vehicle for lower bounds on proof systems. Bonet et al. presented their construction in a somewhat ad hoc manner, and Krajíček used a theorem of Razborov generalizing Karchmer–Wigderson games. We provide a different interpretation of their arguments in terms games which are more suitable than the ones considered by Razborov.

Our account includes three flavors of arguments: ones following Bonet et al., ones following Krajíček, and ones following a suggestion of Neil Thapen. The different arguments prove lower bounds for slightly different formulas.

Talks in Hebrew

23 December 2018, Technion: Intro to research abstractslides

19 December 2018, Technion Math Club: Complexity of graph coloring abstractnotes