Chapter 4: Multilinear Extensions

In 1971, the Mariner 9 probe became the first spacecraft to orbit another planet. Its mission: map the surface of Mars. But transmitting high-resolution images across 100 million miles of static-filled space was a nightmare. A single burst of cosmic noise could turn a crater into a glitch.

NASA didn't send raw pixels. They used a code developed years earlier by Irving Reed and David Muller: treat the pixel data as evaluations of a multivariate polynomial. The Reed-Muller code could correct up to seven bit errors per 32-bit word. When Mariner 9 arrived to find Mars engulfed in a planet-wide dust storm, mission control reprogrammed the spacecraft from Earth and waited. When the dust cleared, the code delivered 7,329 images, mapping 85% of the Martian surface.

Why not Reed-Solomon? In Chapter 2, we encoded $n$ values as a univariate polynomial of degree $n - 1$ . That works when $n$ is modest. But Mariner's data was indexed by bit positions: a 32-bit word has $2^{5}$ bit combinations, a memory address space has $2^{64}$ locations, a boolean formula with 100 variables has $2^{100}$ possible assignments. Encoding $2^{100}$ values as a univariate polynomial means degree $2^{100} - 1$ . Impossible.

The solution: let each bit be its own variable. A 100-bit index becomes 100 coordinates, each 0 or 1. The polynomial has 100 variables instead of degree $2^{100}$ . Data lives not on a line but on a hypercube. This chapter develops that theory.

In Chapter 2, we turned data into polynomials via Lagrange interpolation: given $n$ values, construct the unique degree- $(n - 1)$ univariate polynomial passing through them. That was interpolation over a line.

Now we need interpolation over a hypercube. The data lives at $2^{n}$ vertices, indexed by bit strings. The polynomial must agree with the data at these vertices and extend smoothly to all of $F^{n}$ . The construction is analogous to univariate Lagrange, but the geometry is different, and the efficiency implications are dramatic.

This chapter develops the theory of multilinear extensions: the canonical way to extend functions from the Boolean hypercube ${0, 1}^{n}$ to polynomials over $F^{n}$ . These extensions are the workhorses of sum-check-based proof systems, encoding everything from circuit wire values to constraint satisfaction.

The Boolean Hypercube

Consider the set ${0, 1}^{n}$ , all $n$ -bit binary strings. This is the Boolean hypercube, and it contains exactly $2^{n}$ points.

n = 2:
    (1,1)
     /  \
 (0,1)  (1,0)
     \  /
    (0,0)

n = 3: A cube with 8 vertices

Any function $f : {0, 1}^{n} \to F$ assigns a field element to each vertex of this hypercube. There are $2^{n}$ vertices, so $f$ is essentially a table of $2^{n}$ values.

For example:

A vector $(v_{1}, \dots, v_{2^{n}})$ can be viewed as $f (b) = v_{1 + bin (b)}$ where $bin (b)$ converts the bit string to an index
The output values of a layer of circuit gates
A database of $2^{n}$ records indexed by $n$ -bit keys

Why does the hypercube matter? Because computation is fundamentally boolean. A memory address is a bit string. A circuit's inputs are bits. A satisfying assignment to a boolean formula is a point in ${0, 1}^{n}$ . When we want to verify a computation, the objects we care about (wire values, memory contents, constraint satisfaction) are naturally indexed by binary strings. The hypercube ${0, 1}^{n}$ is where computational problems live.

But polynomials live over fields, not just ${0, 1}$ . We want a polynomial that agrees with $f$ on the hypercube but extends smoothly to all of $F^{n}$ . This extension is what lets us apply the algebraic machinery (Schwartz-Zippel, sum-check) that makes verification efficient.

Why Multilinear?

In Chapter 2, we used univariate polynomials (Reed-Solomon). Why switch to multivariate now?

The problem with univariate encoding is degree: if you encode $N = 2^{20}$ data points into a single-variable polynomial $p (x)$ , that polynomial has degree about one million. Manipulating degree-million polynomials is expensive, requiring heavy FFT operations.

Multilinear polynomials avoid this. If you encode the same $2^{20}$ points into a 20-variable multilinear polynomial, the degree in each variable is just 1. The total degree is only 20. By increasing the number of variables, we drastically lower the per-variable degree. This tradeoff (more variables, lower degree) enables the linear-time prover algorithms that power modern systems like HyperPlonk and Lasso, avoiding the expensive FFTs required by univariate approaches.

A polynomial in $n$ variables has terms like $X_{1}^{a_{1}} X_{2}^{a_{2}} \dots X_{n}^{a_{n}}$ with various exponents. The degree in variable $X_{i}$ is the maximum exponent of $X_{i}$ across all terms.

A polynomial is multilinear if its degree in every variable is at most 1. Every term looks like a product of distinct variables (or subsets thereof). We write $\tilde{f}$ (with a tilde) to denote the multilinear extension of a function $f$ :

$\tilde{f} (X_{1}, \dots, X_{n}) = S \subseteq {1, \dots, n} \sum c_{S} i \in S \prod X_{i}$

For example, with $n = 2$ : $\tilde{f} (X_{1}, X_{2}) = c_{\emptyset} + c_{{1}} X_{1} + c_{{2}} X_{2} + c_{{1, 2}} X_{1} X_{2}$

There are $2^{n}$ possible subsets $S$ , hence $2^{n}$ coefficients. A multilinear polynomial in $n$ variables is fully specified by $2^{n}$ numbers, exactly matching the number of points in the hypercube.

This is not a coincidence. It's the key theorem:

Theorem (Multilinear Extension). For any function $f : {0, 1}^{n} \to F$ , there exists a unique multilinear polynomial $f : F^{n} \to F$ such that $f (b) = f (b)$ for all $b \in {0, 1}^{n}$ .

The function $\tilde{f}$ is called the multilinear extension (MLE) of $f$ .

Constructing the Multilinear Extension

The theorem claims uniqueness. How do we actually construct $\tilde{f}$ ?

The Lagrange Basis

For each point $w \in {0, 1}^{n}$ , define the Lagrange basis polynomial:

$L_{w} (X) = i = 1 \prod n (w_{i} \cdot X_{i} + (1 - w_{i}) (1 - X_{i}))$

Here $w = (w_{1}, \dots, w_{n})$ is a fixed boolean vector, where each $w_{i} \in {0, 1}$ . You can read $w$ as the binary representation of an index from 0 to $2^{n} - 1$ , addressing one of the $2^{n}$ vertices of the hypercube. Meanwhile $X = (X_{1}, \dots, X_{n})$ is a vector of formal variables where each $X_{i}$ ranges over all of $F$ . Geometrically, $w$ lives at a corner of the unit hypercube, while $X$ can be any point in $F^{n}$ , including points "between" corners. The polynomial $L_{w}$ is defined over all of $F^{n}$ , but it has a special property on the hypercube: it equals 1 at $w$ and 0 at every other boolean point.

To see why, consider what happens at point $w$ :

If $w_{i} = 1$ : the factor is $1 \cdot X_{i} + 0 \cdot (1 - X_{i}) = X_{i}$ , which evaluates to $1$
If $w_{i} = 0$ : the factor is $0 \cdot X_{i} + 1 \cdot (1 - X_{i}) = 1 - X_{i}$ , which evaluates to $1$

Every factor equals 1, so $L_{w} (w) = 1$ .

At any other point $b \neq = w$ :

There exists some coordinate $i$ where $b_{i} \neq = w_{i}$
If $w_{i} = 1$ and $b_{i} = 0$ : the factor $X_{i}$ evaluates to $0$
If $w_{i} = 0$ and $b_{i} = 1$ : the factor $1 - X_{i}$ evaluates to $0$

One factor is zero, so $L_{w} (b) = 0$ .

The Extension Formula

The multilinear extension is now simply:

$\tilde{f} (X) = w \in {0, 1}^{n} \sum f (w) \cdot L_{w} (X)$

At any hypercube point $b$ : $\tilde{f} (b) = w \sum f (w) \cdot L_{w} (b) = f (b) \cdot 1 + w \neq = b \sum f (w) \cdot 0 = f (b)$

The extension agrees with $f$ on the hypercube. Since it's a sum of multilinear terms (each $L_{w}$ is multilinear), $\tilde{f}$ is multilinear.

Uniqueness

Claim: If a multilinear polynomial $p$ vanishes on all of ${0, 1}^{n}$ , then $p \equiv 0$ .

Proof by induction on $n$ :

Base case ( $n = 1$ ): A multilinear polynomial in one variable has form $p (X) = a + b X$ . If $p (0) = 0$ and $p (1) = 0$ , then $a = 0$ and $a + b = 0$ , so $b = 0$ . Thus $p \equiv 0$ .

Inductive step: Write $p (X_{1}, \dots, X_{n}) = q_{0} (X_{2}, \dots, X_{n}) + X_{1} \cdot q_{1} (X_{2}, \dots, X_{n})$ where $q_{0}, q_{1}$ are multilinear in $n - 1$ variables. Evaluating at $X_{1} = 0$ : $p (0, X_{2}, \dots, X_{n}) = q_{0} (X_{2}, \dots, X_{n})$ . Since $p$ vanishes on all of ${0, 1}^{n}$ , in particular $q_{0}$ vanishes on ${0, 1}^{n - 1}$ . By induction, $q_{0} \equiv 0$ . Similarly, $p (1, X_{2}, \dots, X_{n}) = q_{1} (X_{2}, \dots, X_{n})$ vanishes on ${0, 1}^{n - 1}$ , so $q_{1} \equiv 0$ . Thus $p \equiv 0$ . $□$

Corollary: If two multilinear polynomials agree on ${0, 1}^{n}$ , their difference vanishes there, hence is identically zero, so they are equal.

The Equality Polynomial

One Lagrange basis polynomial deserves special attention: the equality polynomial.

$eq (X, Y) = i = 1 \prod n (X_{i} Y_{i} + (1 - X_{i}) (1 - Y_{i}))$

This is the MLE of the equality function: $eq (a, b) = {10 if a = b otherwise$

for $a, b \in {0, 1}^{n}$ .

The Lagrange basis polynomials are just the equality polynomial with one input fixed: $L_{w} (X) = eq (w, X)$

The equality polynomial appears constantly in sum-check-based protocols, through the identity:

$x \in {0, 1}^{n} \sum eq (τ, x) \cdot f (x) = f (τ)$

This follows directly from the Lagrange formula: $f (τ) = \sum_{x} f (x) \cdot L_{x} (τ) = \sum_{x} f (x) \cdot eq (τ, x)$ . Summing $f$ weighted by $eq (τ, \cdot)$ over the hypercube gives the MLE of $f$ evaluated at $τ$ . This means evaluating an MLE at a random challenge $τ$ reduces to a sum-check on $g (x) = eq (τ, x) \cdot f (x)$ .

This immediately gives a powerful zero test. Suppose the verifier wants to check that $f$ vanishes on the entire Boolean hypercube. By the identity above, checking that all $f (x)$ values are zero is the same as checking that $f (τ) = 0$ . The verifier picks a random $τ \in F^{n}$ and runs sum-check on:

$x \in {0, 1}^{n} \sum eq (τ, x) \cdot f (x) = 0$

This is a random linear combination of all $f (x)$ values. If $f$ truly vanishes on the hypercube, then $f \equiv 0$ (by the uniqueness theorem above), so the sum is always 0. If even one value $f (x^{*}) \neq = 0$ , then $f$ is a nonzero multilinear polynomial, and Schwartz-Zippel guarantees $f (τ) \neq = 0$ with probability at least $1 - n /∣ F ∣$ . Over a 254-bit field, this is negligible. This "zero-on-hypercube" test is the foundation of Spartan and related sum-check-based proof systems.

Worked Example: A 2-Variable Function

Let's trace through a complete example.

Consider $f : {0, 1}^{2} \to F$ defined by the table:

$(X_{1}, X_{2})$	$f (X_{1}, X_{2})$
$(0, 0)$	$3$
$(0, 1)$	$7$
$(1, 0)$	$2$
$(1, 1)$	$5$

The Lagrange basis polynomials are:

$L_{(0, 0)} (X) = (1 - X_{1}) (1 - X_{2})$ $L_{(0, 1)} (X) = (1 - X_{1}) \cdot X_{2}$ $L_{(1, 0)} (X) = X_{1} \cdot (1 - X_{2})$ $L_{(1, 1)} (X) = X_{1} \cdot X_{2}$

The multilinear extension is then:

$\tilde{f} (X_{1}, X_{2}) = 3 \cdot (1 - X_{1}) (1 - X_{2}) + 7 \cdot (1 - X_{1}) X_{2} + 2 \cdot X_{1} (1 - X_{2}) + 5 \cdot X_{1} X_{2}$

Expanding:

$= 3 (1 - X_{1} - X_{2} + X_{1} X_{2}) + 7 (X_{2} - X_{1} X_{2}) + 2 (X_{1} - X_{1} X_{2}) + 5 X_{1} X_{2}$ $= 3 - 3 X_{1} - 3 X_{2} + 3 X_{1} X_{2} + 7 X_{2} - 7 X_{1} X_{2} + 2 X_{1} - 2 X_{1} X_{2} + 5 X_{1} X_{2}$ $= 3 + (- 3 + 2) X_{1} + (- 3 + 7) X_{2} + (3 - 7 - 2 + 5) X_{1} X_{2}$ $= 3 - X_{1} + 4 X_{2} - X_{1} X_{2}$

We can verify this matches the table:

$\tilde{f} (0, 0) = 3 - 0 + 0 - 0 = 3$ (matches)
$\tilde{f} (0, 1) = 3 - 0 + 4 - 0 = 7$ (matches)
$\tilde{f} (1, 0) = 3 - 1 + 0 - 0 = 2$ (matches)
$\tilde{f} (1, 1) = 3 - 1 + 4 - 1 = 5$ (matches)

What happens at a non-boolean point? Evaluating at $(0.5, 0.3)$ : $\tilde{f} (0.5, 0.3) = 3 - 0.5 + 4 (0.3) - (0.5) (0.3) = 3 - 0.5 + 1.2 - 0.15 = 3.55$

This value has no "meaning" on the hypercube; $(0.5, 0.3)$ isn't a Boolean point. But this is exactly what we want: the polynomial is defined everywhere, and random evaluation is the key to probabilistic verification.

Efficient Evaluation

Given the table of values ${f (w) : w \in {0, 1}^{n}}$ and a query point $r \in F^{n}$ , how fast can we compute $\tilde{f} (r)$ ?

The naive approach sums over all $2^{n}$ terms: $\tilde{f} (r) = w \in {0, 1}^{n} \sum f (w) \cdot L_{w} (r)$

Each $L_{w} (r)$ takes $O (n)$ to compute. Total: $O (n \cdot 2^{n})$ .

We can do better with streaming evaluation. $\tilde{f} (r)$ is computable in $O (2^{n})$ time with the following observation.

Define $T_{k}$ as the "partial extension" using only the first $k$ variables of $r$ :

$T_{k} (x_{k + 1}, \dots, x_{n}) = (b_{1}, \dots, b_{k}) \in {0, 1}^{k} \sum f (b_{1}, \dots, b_{k}, x_{k + 1}, \dots, x_{n}) \cdot i = 1 \prod k L_{b_{i}} (r_{i})$

At $k = 0$ : $T_{0} = f$ (the original table).

At $k = n$ : $T_{n} = \tilde{f} (r)$ (a single value).

The recursion from $T_{k}$ to $T_{k + 1}$ :

$T_{k + 1} (x_{k + 2}, \dots, x_{n}) = (1 - r_{k + 1}) \cdot T_{k} (0, x_{k + 2}, \dots) + r_{k + 1} \cdot T_{k} (1, x_{k + 2}, \dots)$

Each step halves the table size. Total work: $2^{n} + 2^{n - 1} + \dots + 1 = O (2^{n})$ .

This is linear in the table size, optimal for any algorithm that must touch all values.

Worked Example: Streaming Evaluation

Let's trace through this algorithm with our earlier function $f : {0, 1}^{2} \to F$ :

$(b_{1}, b_{2})$	$f (b_{1}, b_{2})$
$(0, 0)$	$3$
$(0, 1)$	$7$
$(1, 0)$	$2$
$(1, 1)$	$5$

We want to compute $\tilde{f} (r_{1}, r_{2})$ at the point $r = (0.4, 0.7)$ .

Step 0: Initialize $T_{0}$

$T_{0}$ is just the original table, a function of both variables: $T_{0} (x_{1}, x_{2}) = f (x_{1}, x_{2})$

Think of it as four values indexed by $(x_{1}, x_{2}) \in {0, 1}^{2}$ : $T_{0} = x_{1} = 0 x_{1} = 1 x_{2} = 0 32 x_{2} = 1 75$

Step 1: Compute $T_{1}$ by "folding in" $r_{1} = 0.4$

The recursion says: $T_{1} (x_{2}) = (1 - r_{1}) \cdot T_{0} (0, x_{2}) + r_{1} \cdot T_{0} (1, x_{2})$

This is a weighted combination of the two rows, using $1 - r_{1} = 0.6$ and $r_{1} = 0.4$ :

$T_{1} (0) = 0.6 \cdot T_{0} (0, 0) + 0.4 \cdot T_{0} (1, 0) = 0.6 \cdot 3 + 0.4 \cdot 2 = 1.8 + 0.8 = 2.6$
$T_{1} (1) = 0.6 \cdot T_{0} (0, 1) + 0.4 \cdot T_{0} (1, 1) = 0.6 \cdot 7 + 0.4 \cdot 5 = 4.2 + 2.0 = 6.2$

The table has shrunk from 4 values to 2 values: $T_{1} = [2.6, 6.2]$ .

Step 2: Compute $T_{2}$ by "folding in" $r_{2} = 0.7$

$T_{2} = (1 - r_{2}) \cdot T_{1} (0) + r_{2} \cdot T_{1} (1) = 0.3 \cdot 2.6 + 0.7 \cdot 6.2 = 0.78 + 4.34 = 5.12$

The table has shrunk from 2 values to 1 value. This single value is $\tilde{f} (0.4, 0.7) = 5.12$ .

We can verify using the explicit formula $f (X_{1}, X_{2}) = 3 - X_{1} + 4 X_{2} - X_{1} X_{2}$ : $f (0.4, 0.7) = 3 - 0.4 + 4 (0.7) - (0.4) (0.7) = 3 - 0.4 + 2.8 - 0.28 = 5.12 ✓$

This works because the Lagrange basis polynomial factorizes into independent pieces, one per coordinate: $L_{(b_{1}, b_{2})} (r_{1}, r_{2}) = L_{b_{1}} (r_{1}) \cdot L_{b_{2}} (r_{2})$

where $L_{0} (r) = 1 - r$ and $L_{1} (r) = r$ are univariate selectors. This factorization holds because the multilinear Lagrange formula is a product over coordinates:

$L_{w} (X) = i = 1 \prod n (w_{i} \cdot X_{i} + (1 - w_{i}) (1 - X_{i}))$

Each factor depends only on one coordinate of $w$ and one coordinate of $X$ . So evaluating at $(r_{1}, r_{2})$ gives a product of independent terms.

The algorithm exploits this factorization. The MLE evaluation is: $\tilde{f} (r_{1}, r_{2}) = b_{1}, b_{2} \in {0, 1} \sum f (b_{1}, b_{2}) \cdot L_{b_{1}} (r_{1}) \cdot L_{b_{2}} (r_{2})$

Rearranging the sum (grouping by $b_{2}$ ): $= b_{2} \sum L_{b_{2}} (r_{2}) \cdot T_{1} (b_{2}) (b_{1} \sum f (b_{1}, b_{2}) \cdot L_{b_{1}} (r_{1}))$

The inner sum is exactly what Step 1 computes: for each value of $b_{2}$ , it combines the two $b_{1}$ cases using weights $L_{0} (r_{1}) = 1 - r_{1}$ and $L_{1} (r_{1}) = r_{1}$ . The result $T_{1}$ has half as many entries. Step 2 then folds in the $r_{2}$ weights similarly.

An analogy helps here: think of a single-elimination tournament with $2^{n}$ players. In each round, pairs compete and half are eliminated. After $n$ rounds, one champion remains. The streaming algorithm works the same way: $2^{n}$ table entries enter, each round uses a random weight to combine pairs, and after $n$ rounds a single evaluation emerges. The tournament bracket is the structure of multilinear computation.

This pattern of using a random challenge to collapse pairs of values and halving the problem size will reappear throughout this book. In Chapter 10 (FRI), we'll name it folding and see it as one of the central techniques in zero-knowledge proofs.

Code: Streaming MLE Evaluation

The algorithm above translates directly to code. Each coordinate of $r$ folds the table in half.

def mle_eval(table, r):
    """
    Evaluate the multilinear extension of `table` at point `r`.

    Args:
        table: List of 2^n field elements (the function values on hypercube)
        r: Tuple of n coordinates (r_1, ..., r_n)

    Returns: The value of the MLE at r
    """
    T = table.copy()

    for r_i in r:
        half = len(T) // 2
        # Fold: T'[j] = (1 - r_i) * T[2j] + r_i * T[2j+1]
        T = [(1 - r_i) * T[2*j] + r_i * T[2*j + 1]
             for j in range(half)]

    return T[0]  # Single value remains

# Example from the worked example above
table = [3, 7, 2, 5]  # f(0,0)=3, f(0,1)=7, f(1,0)=2, f(1,1)=5
r = (0.4, 0.7)

result = mle_eval(table, r)
print(f"Streaming: MLE({r}) = {result}")

# Verify against explicit formula: f(X1,X2) = 3 - X1 + 4*X2 - X1*X2
explicit = 3 - 0.4 + 4*0.7 - 0.4*0.7
print(f"Explicit:  MLE({r}) = {explicit}")

Output:

Streaming: MLE((0.4, 0.7)) = 5.12
Explicit:  MLE((0.4, 0.7)) = 5.12

The streaming algorithm touches each table entry exactly once. For a table of size $N = 2^{n}$ , total work is $N /2 + N /4 + \dots + 1 = N - 1 = O (N)$ .

Tensor Product Structure

The factorization we used in the streaming algorithm generalizes to any number of variables. For $w = (w_{1}, \dots, w_{n}) \in {0, 1}^{n}$ :

$L_{w} (r_{1}, \dots, r_{n}) = i = 1 \prod n L_{w_{i}} (r_{i})$

where $L_{0} (r_{i}) = 1 - r_{i}$ and $L_{1} (r_{i}) = r_{i}$ .

This is a tensor product structure. To see what this means concretely, consider $n = 2$ . Define the vectors:

$v_{1} = (L_{0} (r_{1}), L_{1} (r_{1})) = (1 - r_{1}, r_{1})$ $v_{2} = (L_{0} (r_{2}), L_{1} (r_{2})) = (1 - r_{2}, r_{2})$

Their tensor product $v_{1} \otimes v_{2}$ is the $2 \times 2$ matrix (or equivalently, length-4 vector) of all pairwise products:

$v_{1} \otimes v_{2} = ((1 - r_{1}) (1 - r_{2}) r_{1} (1 - r_{2}) (1 - r_{1}) r_{2} r_{1} r_{2})$

Reading off the entries: $L_{(0, 0)} (r), L_{(0, 1)} (r), L_{(1, 0)} (r), L_{(1, 1)} (r)$ . The tensor product is the vector of Lagrange evaluations.

For general $n$ , the vector of all $2^{n}$ Lagrange evaluations is:

$(L_{0} (r_{1}), L_{1} (r_{1})) \otimes (L_{0} (r_{2}), L_{1} (r_{2})) \otimes \dots \otimes (L_{0} (r_{n}), L_{1} (r_{n}))$

The streaming algorithm exploits this tensor structure. Instead of computing all $2^{n}$ Lagrange values (expensive), it processes one coordinate at a time, folding the tensor product incrementally. This is why MLE evaluation costs $O (2^{n})$ instead of $O (n \cdot 2^{n})$ . The same tensor structure enables:

Efficient prover algorithms for sum-check (Chapter 19)
Recursive proof constructions
Memory-efficient streaming over large tables

Multilinear Extensions of Functions on Larger Domains

What if our function isn't defined on ${0, 1}^{n}$ ?

Suppose $f : {0, 1, \dots, m - 1} \to F$ for some $m = 2^{n}$ . We can interpret the domain as ${0, 1}^{n}$ via binary encoding:

$\tilde{f} (X_{1}, \dots, X_{n}) = MLE of (k \mapsto f (k)) with k = i \sum 2^{i - 1} X_{i}$

Any function on a power-of-two domain has a natural multilinear extension.

For domains not of size $2^{n}$ , we can pad with zeros or use more sophisticated encodings. The key insight: as long as the domain is finite, we can always encode it in binary and take the MLE.

Connection to Sum-Check

The sum-check protocol (Chapter 3) proves claims of the form:

$H = b \in {0, 1}^{n} \sum g (b)$

for some polynomial $g$ . When $g$ is the multilinear extension of a function $f$ , this sum equals $\sum_{b \in {0, 1}^{n}} f (b)$ , the sum of all function values on the hypercube.

As an example, suppose we want to prove that a vector $(v_{1}, \dots, v_{N})$ with $N = 2^{n}$ sums to a claimed value $H$ .

Let $v$ be the MLE encoding the vector. Then: $b \in {0, 1}^{n} \sum v (b) = i = 1 \sum N v_{i} = H$

Sum-check verifies this identity without the verifier seeing all of $v$ . The protocol reduces the sum to a single random evaluation $\tilde{v} (r)$ , which the prover supplies (with a commitment proof).

This is the bridge from "data" to "proof": encode data as an MLE, verify properties via sum-check, bind via polynomial commitment.

Evaluations and Coefficients

A perspective that clarifies many constructions:

A multilinear polynomial $\tilde{f}$ has $2^{n}$ coefficients (the $c_{S}$ values in the monomial expansion $\sum_{S} c_{S} \prod_{i \in S} X_{i}$ ). These coefficients live in an abstract "coefficient space."

But $\tilde{f}$ also has $2^{n}$ evaluations on the hypercube. These evaluations are just $f (w)$ , the original table values you started with.

These are not the same numbers. The table entry $f (0, 0) = 3$ in our worked example is not a coefficient of the polynomial. The polynomial $\tilde{f} (X_{1}, X_{2}) = 3 - X_{1} + 4 X_{2} - X_{1} X_{2}$ has coefficients ${3, - 1, 4, - 1}$ , while the table values are ${3, 7, 2, 5}$ . They're related by the Lagrange interpolation formula.

For multilinear polynomials, the evaluation table is a complete description. You can recover coefficients from evaluations and vice versa. They're just two bases for the same $2^{n}$ -dimensional vector space.

The transformation between bases is exactly the Lagrange interpolation formula and its inverse. Both can be computed in $O (2^{n})$ time.

This means:

Committing to a multilinear polynomial = committing to its evaluation table
Evaluating at a random point = a linear combination of table entries
Sum-check over an MLE = verifying global properties through local queries

The table has $2^{n}$ entries. The verifier touches $O (n)$ of them. The polynomial is what bridges the gap: it's a compressed representation that can be probed at random points, and those random probes reveal whether the full table satisfies the claimed property. Extension creates redundancy; redundancy enables compression; compression enables succinctness.

Polynomial Evaluation as Inner Product

There's a beautiful way to see this algebraically: polynomial evaluation is an inner product.

For a multilinear polynomial, the evaluation at any point $r$ is:

$\tilde{f} (r) = w \in {0, 1}^{n} \sum f (w) \cdot L_{w} (r) = ⟨ f, L (r)⟩$

where $f = (f (w))_{w \in {0, 1}^{n}}$ is the table of values and $L (r) = (L_{w} (r))_{w \in {0, 1}^{n}}$ is the vector of Lagrange basis evaluations at $r$ .

This linear algebra perspective is surprisingly powerful. For decades, sum-check was seen as a beautiful theoretical result with limited practical use. Then came the realization: polynomial evaluation is an inner product, and inner products interact beautifully with commitment schemes. No FFTs, no trusted setups, just vectors and dot products. Systems like Spartan, HyperPlonk, and Lasso all exploit this insight. Chapter 19 tells the full story of this "Sum-Check Renaissance."

The consequences are immediate:

Commitment: Committing to $\tilde{f}$ means committing to the vector $f$
Evaluation proof: Proving $\tilde{f} (r) = y$ means proving an inner product claim $⟨ f, L (r)⟩ = y$
The verifier knows $L (r)$ : Given $r$ , anyone can compute the Lagrange evaluations

This reduces polynomial evaluation proofs to inner product proofs, and inner products interact beautifully with homomorphic commitments. We'll exploit this connection in Chapters 6 and 9.

Key Takeaways

The Boolean hypercube ${0, 1}^{n}$ is the natural domain for multilinear polynomials. It has $2^{n}$ points.
Multilinear extension (MLE): The unique polynomial of degree at most 1 in each variable that agrees with $f$ on the hypercube.
Lagrange basis polynomials $L_{w} (X)$ equal 1 at $w$ and 0 elsewhere. The MLE is $\tilde{f} (X) = \sum_{w} f (w) \cdot L_{w} (X)$ .
The equality polynomial $eq (X, Y)$ is the MLE of the equality indicator. Lagrange bases are $L_{w} (X) = eq (w, X)$ .
Tensor product structure: $L_{w} (r) = \prod_{i} L_{w_{i}} (r_{i})$ . The basis factorizes, enabling fast algorithms.
Efficient evaluation: Given the table and a point, compute the MLE in $O (2^{n})$ time via streaming.
Sum over the hypercube: $\sum_{b} \tilde{f} (b) = \sum_{w} f (w)$ . Sum-check verifies such sums efficiently.
Evaluations = coefficients: For MLEs, the table of values completely determines the polynomial. They're dual representations.
Binary encoding: Any function on ${0, \dots, 2^{n} - 1}$ can be encoded as a function on ${0, 1}^{n}$ , then extended multilinearly.
The bridge to proofs: MLEs encode data; sum-check verifies properties; polynomial commitment binds the prover. This trinity underlies sum-check-based SNARKs.

Minimizing Trust