Chapter 19: Fast Sum-Check Proving
This chapter, along with Chapters 20 and 21, forms Part VI on prover optimization. These three chapters are optional on a first read: the rest of the book stands without them. They are essential for anyone designing or implementing a prover, and reward repeat reading once the foundations feel solid.
Most chapters in this book can be read with pencil and paper. This one assumes you've already internalized the sum-check protocol (Chapter 3) and multilinear extensions (Chapter 4), not as definitions to look up, but as tools you can wield. If those still feel foreign, consider this chapter a preview of where the road leads, and return when the foundations feel solid.
In 1992, the sum-check protocol solved the problem of succinct verification. Lund, Fortnow, Karloff, and Nisan had achieved something that sounds impossible: verifying a sum over terms while the verifier performs only work. Exponential compression in verification time. The foundation of succinct proofs.
Then, for three decades, almost nobody used it.
Why? Because everyone thought the prover was too slow. The total work across all rounds sums to (as Chapter 3 showed via the geometric series), but achieving this requires the prover to evaluate the partially-fixed polynomial efficiently at each round. Without a way to reuse work across rounds, each round's evaluations require going back to the original -entry table, inflating the cost to . For , that's over 30 billion operations per proof. Researchers chased other paths: PCPs, pairing-based SNARKs, trusted setups. Groth16 and PLONK took univariate polynomials, quotient-based constraints, FFT-driven arithmetic. Sum-check remained a theoretical marvel, admired in complexity circles but dismissed as impractical.
They were wrong.
It turned out that a simple algorithmic trick, available since the 90s but overlooked, made the prover linear time. With the right algorithms, sum-check proving runs in time, linear in the number of terms. For sparse sums where only terms are non-zero, prover time drops to . These are not approximations or heuristics; they're exact algorithms exploiting algebraic structure that was always present.
When this was rediscovered and popularized by Justin Thaler in the late 2010s, it triggered a revolution. The field realized it had been sitting on the "Holy Grail" of proof systems for three decades without noticing. This chapter explains the trick that woke up the giant, and then shows how it enables Spartan, the SNARK that proved sum-check alone suffices for practical zero-knowledge proofs. No univariate encodings. No pairing-based trusted setup. Just multilinear polynomials, sum-check, and a commitment scheme.
Why This Matters: The zkVM Motivation
These techniques find their most compelling application in zkVMs (zero-knowledge virtual machines): SNARKs that prove correct execution of arbitrary programs over an instruction set like RISC-V. A million CPU cycles at 50 constraints each yields 50 million constraints. At this scale, versus proving is the difference between minutes and seconds. Even the constant factor matters. Fast sum-check proving is what makes zkVMs practical.
The Prover's Apparent Problem
Let's examine the naive prover cost more carefully.
The sum-check protocol proves:
where is an -variate polynomial. The prover begins by sending the claimed sum (this is ). Then in round , the prover sends a univariate polynomial capturing the partial sum with left as a formal variable:
The polynomial is univariate in , with degree equal to the degree of in that variable. Call this degree . A degree- univariate is determined by evaluations, but the consistency check (where is the claim from the previous round) lets the verifier derive one evaluation for free, so the prover sends only values.
For simplicity, assume has individual degree in every variable (the common case in practice). Computing requires evaluating it at points, and each evaluation sums over terms of the form .
Here is the problem. In round 1, no variables have been fixed to challenges yet, so each term in the sum for has the form with all remaining coordinates Boolean. For , these are values of on the hypercube, which the prover computed before the protocol began. For (the non-Boolean evaluation points needed to determine ), the prover must interpolate, but only in the first variable. Round 1 is manageable. But from round 2 onward, the first variables are fixed to non-Boolean challenges . The values were never precomputed. Without a way to access them cheaply, the prover must recompute them from scratch each round by interpolating over the full Boolean evaluations. This costs per round, and over rounds the total is .
Notice, however, that round only sums over terms. The work should shrink by half each round, and the geometric series gives:
(using the geometric series identity with ). The bottleneck is not the number of terms but access: can the prover obtain the partially-fixed values for round without recomputing them from the original values each time? If not, each round costs regardless of how few terms it sums, and the total remains .
The Halving Trick
The answer to the access problem is a single identity from Chapter 4: multilinear folding. After each challenge , the prover can update a multilinear polynomial's table of Boolean evaluations in place, producing the restricted polynomial's table in half the space. No recomputation from scratch.
But folding applies to multilinear polynomials, and in the interesting sum-check instances has degree > 1 in each variable, so itself is not multilinear. The trick is that does not need to be multilinear as long as it decomposes into multilinear factors. If (or more generally a sum of products of MLEs), the prover folds each factor's table independently and recomputes 's values from the folded factors each round. This covers essentially all practical cases: GKR's layer reductions (Chapter 18) use products of MLEs with the equality polynomial, R1CS verification uses , and Spartan (later in this chapter) reduces to the same form.
We develop the algorithm for the simplest case: , a product of two multilinear polynomials over variables.
Since multilinear polynomials have degree at most 1 in each variable, their product has degree at most 2. So , and each round the prover sends two field elements (say and ); the verifier recovers from the consistency check.
The Multilinear Folding Identity
Recall from Chapter 4 the streaming evaluation identity: for any multilinear polynomial and field element ,
This is linear interpolation: restricted to is a line through and , given by . The identity evaluates that line at any . Chapter 4 used it with challenges in to evaluate MLEs in time. Here we also need non-Boolean points: setting gives , extrapolating the line beyond its defining points using only stored Boolean evaluations.
This fact enables folding: after receiving challenge , we can compute the restricted polynomial from the unrestricted polynomial in linear time.
The Algorithm
Initialization. Store all evaluations and for in arrays and .
Round 1. Compute three evaluations of :
For and , we read directly from the stored arrays. For , apply the folding identity with : , and similarly for .
Each evaluation sums over terms, so the three evaluations cost operations total. The prover sends two; the verifier recovers the third from .
Fold after round 1. Receive challenge . Create a new array of size , indexed by :
Discard the old array and rename . The array now stores the restricted polynomial evaluated on the -dimensional hypercube. Similarly fold .
Round (general). After folds, the arrays and have size , storing on the remaining Boolean hypercube. The array splits naturally into two halves of size : entries with and entries with . Then:
- : the sum over the half
- : the sum over the half
- : apply the folding identity with to get , then sum the products
Each evaluation sums over terms, costing operations total. The prover sends two values, then folds and using challenge , halving the arrays to size .
The arrays shrink by half each round: . By round , the arrays are singletons and the protocol terminates.
Folding solves the access problem. After each challenge , the prover updates the arrays in place via the folding identity, producing exactly the partially-fixed values needed for round . No recomputation from scratch. Round costs for evaluation and for folding, with a constant field operations per entry for the product . The total is the geometric series from above:
This is optimal: any prover must read all inputs at least once.
Worked Example: The Halving Trick with
Let's trace through a complete example. Take variables and consider the sum-check claim:
Suppose the tables are:
| Product | |||
|---|---|---|---|
| 2 | 3 | 6 | |
| 5 | 1 | 5 | |
| 4 | 2 | 8 | |
| 3 | 4 | 12 |
The true sum is .
Round 1: Compute .
We need three evaluations to specify this degree-2 polynomial:
- : First interpolate and at :
Verifier checks: .
Prover sends . Verifier sends challenge .
Fold after Round 1.
Update arrays using :
Similarly for :
Arrays now have size 2 (down from 4).
Round 2: Compute .
Verifier checks: .
This is the core consistency check of sum-check. The prover committed to before knowing the challenge . Now the verifier demands that (the next round's polynomial) sum to the value . If the prover lied about , the fabricated polynomial almost certainly evaluates incorrectly at the random point , and the check fails.
Compute from the degree-2 polynomial through points :
Using Lagrange interpolation:
At : .
Total operations: Round 1 touched 4 entries; Round 2 touched 2 entries. Total: 6 operations, not as naive analysis suggests. For larger , the savings compound: instead of .
Each round, the arrays halve in size. The total work across all rounds is the geometric series . This is optimal: any prover must read all inputs at least once.
Beyond Black-Box Arithmetic
The halving trick achieves field operations, which is optimal. For a textbook, the story could end here. But in practice, sum-check provers over 256-bit fields remain slow even at optimal operation count, because each field multiplication carries a different concrete cost depending on the size of its operands. The next three sections (this one, high-degree products, and small-value proving) progressively reduce the concrete cost by exploiting structure that asymptotic analysis ignores. All three build on the same observation: not all field multiplications are equal.
Not all field multiplications are equal. Over a 256-bit prime field (BN254, BLS12-381), multiplying two arbitrary field elements requires multi-limb integer arithmetic plus modular reduction. But when one operand fits in a single 64-bit machine word, the cost drops dramatically. Three classes emerge:
- big-big (bb): two arbitrary field elements. Roughly 8x the cost of sb.
- small-big (sb): one machine-word integer, one field element.
- small-small (ss): two machine-word integers. A single native multiplication, roughly 30x cheaper than bb.
A further optimization, delayed reduction, avoids redundant modular reductions when accumulating a linear combination with small coefficients . Instead of reducing each product separately, the prover accumulates unreduced integer products and performs a single reduction at the end. This nearly halves the cost of sb-dominated loops, which is precisely the structure of the sum-check prover's inner loop.
Why does this matter for sum-check? In round 1, all evaluations lie on the Boolean hypercube. In a zkVM, these are witness values (register contents, memory entries), typically 32- or 64-bit integers stored in a 256-bit field. Round 1 uses only ss and sb operations. After the verifier sends challenge , subsequent rounds involve full-width random elements and require bb multiplications.
Round 1 is not a small fraction of the work. By the geometric series, round 1 accounts for half the total operations ( out of ). Rounds 1 and 2 together account for three-quarters. The most expensive rounds are precisely those where values are small. This observation, that the prover's bottleneck rounds coincide with the regime where cheap arithmetic applies, is the starting point for small-value proving.
High-Degree Products
The halving trick as presented handles , a product of two multilinear factors with degree . Each round, the prover evaluates the product at points, spending a constant number of multiplications per summand. The same idea generalizes to , a product of multilinear factors: fold each factor independently, then multiply. The folding is unchanged; what changes is the cost of multiplying factors together at evaluation points.
Modern proof systems demand this generalization. In batch-evaluation arguments and lookup protocols, the sum-check polynomial is a product of multilinear factors where can be 16 or 32. The naive approach evaluates each factor at points by extrapolation from the Boolean evaluations at 0 and 1, then multiplies pointwise. This costs bb multiplications per summand. At , that is nearly 1000 bb multiplications per term per round, and the prover's cost balloons to .
The question is whether the factor is intrinsic to the problem.
Divide-and-Conquer via Extrapolation
It is not. A recursive algorithm reduces the bb cost from to per evaluation point.
We work entirely in evaluation representation: each polynomial is stored as its values at a fixed set of points. A linear polynomial (degree 1) is determined by two evaluations (at 0 and 1). A degree- product needs evaluations. Multiplying two polynomials in evaluation form is just pointwise multiplication of their values at each point: one field multiplication per point.
Given evaluations of linear polynomials at the two points , the goal is to compute evaluations of their product at points.
- Split the polynomials into two halves of size and .
- Recursively compute the product of each half. Each half-product has degree and is known at points.
- Extrapolate both half-products from their known points to the full set of points, using Lagrange interpolation. The interpolation weights are small integers (derived from the evaluation-point coordinates ), so each multiplication is sb. Cost: sb multiplications per polynomial.
- Multiply pointwise: at each of the points, multiply the two half-product values. Both values are arbitrary field elements (results of prior recursion), so each multiplication is bb. Cost: bb multiplications.
The only source of bb multiplications is step 4. Each level of recursion contributes pointwise products, and the two recursive calls handle the subproblems. Writing for the total bb count:
The extrapolation steps contribute sb multiplications total, but since sb is far cheaper than bb, the wall-clock cost is dominated by the bb multiplications.
This extends to the multivariate case. In the sum-check prover's inner loop, the product involves multilinear polynomials in variables, evaluated on a grid of points. Multivariate extrapolation reduces to repeated univariate extrapolation along each coordinate dimension. The bb cost becomes for , improving by a factor of over the naive .
When used as a subroutine in the linear-time sum-check prover, the total bb cost across all rounds drops from to . For , this represents roughly a 5x reduction in the dominant arithmetic cost.
Small-Value Round-Batching
There is a structural inefficiency hiding in the halving trick. In round 1, all evaluations lie on the Boolean hypercube: the values come from the witness table and fit in machine words. Round 1 uses only ss/sb operations. But the moment the prover binds (a random 256-bit challenge), every subsequent value becomes a full-width field element. From round 2 onward, bb multiplications are unavoidable.
Round 1 accounts for half the total work. Round 2 accounts for a quarter. The most expensive rounds are exactly where values are small. Can we extend the cheap-arithmetic regime beyond a single round?
The idea is delayed binding: instead of binding immediately, treat the first variables as symbolic and precompute the -variate polynomial:
Every summand has Boolean and small witness values, so the entire precomputation uses only ss multiplications. The polynomial is stored as its evaluations on a grid. Once computed, the prover answers rounds 1 through by evaluating at the received challenges (which does require bb, but only work per round instead of ). After rounds, the prover binds all challenges at once and resumes the standard halving trick on arrays of size .
The optimal window size balances the ss precomputation cost against the bb savings. For over 256-bit fields, or 5 rounds. The asymptotic complexity is unchanged, but the concrete runtime drops substantially because the largest rounds (which dominate the geometric series) now use the cheapest arithmetic.
Streaming provers
Round-batching generalizes beyond the small-value setting. For truly massive computations ( terms), even memory becomes prohibitive: a terabyte of field elements. The halving trick is optimal in time but demands linear space.
A streaming prover applies round-batching iteratively, processing the input in sequential passes with sublinear memory. Instead of batching only the first rounds, the streaming prover batches every group of rounds into windows. For a window of rounds, the prover scans the relevant terms in one pass, computes a -variate polynomial on a grid, answers rounds from it, then moves to the next window. Early windows are small (the input is large). Later windows grow larger (the remaining input shrinks exponentially). A final phase switches to the standard linear-time algorithm.
With a tunable parameter , the streaming prover achieves space and time. For : two passes and memory. This exploits the algebraic structure of sum-check directly, without recursive proof composition.
Sparse Sums
The halving trick solves the dense case: when all terms are present, we achieve optimal proving time. But many applications involve sparse sums, where only terms are non-zero, and here the halving trick falls short.
Consider a lookup table with possible indices but only actual lookups. The halving trick still touches all positions, folding arrays of zeros. We're wasting a factor of 1000 in both time and space.
Can the prover exploit sparsity?
Separable Product Structure
A clarification first: "sparse sum" means the input data is sparse (the table on the Boolean hypercube has mostly zeros). The multilinear extension of a sparse vector is typically dense over the continuous domain. Sparsity in the table is what we exploit. Doing so requires a specific factorization.
In lookup arguments and memory-checking protocols, the problem is constructed with a natural variable split: a prefix encodes an address or row index, and a suffix encodes a value or column index. The constraints on addresses and values are independent by design, but a sparse selector connects them: most (address, value) pairs are unused, and only entries are active. For instance, a memory with addresses and possible values has (address, value) pairs, but a program that performs memory accesses touches only of them.
This gives the factorization:
where is a sparse selector with only non-zero entries, depends only on prefix variables (dense, size ), and depends only on suffix variables (dense, size ). The separability is what makes sparsity exploitable: to compute an aggregate like , the prover touches only the positions where is non-zero.
Two-Stage Proving
Given the separable product structure, we prove the sum in two stages: an outer sum-check over the prefix variables (addresses) and an inner sum-check over the suffix variables (values) to verify the outer stage's evaluation claim. Each stage handles half the variables, building dense arrays of size by scanning only the sparse entries.
Stage 1 (outer): Sum-check over prefix variables.
Define aggregated arrays and , each of size , indexed by prefix bit-vectors :
The array pre-sums all suffix contributions into a single value per prefix. This is the key move: the suffix variables are absorbed before the protocol starts, collapsing the original double sum into a single sum over prefixes . Computing requires one pass over the non-zero entries: for each non-zero pair, add to .
To see why this is correct, expand the original claim:
So proving the original sum reduces to proving , a sum-check with only variables. Here and are the multilinear extensions of arrays and .
Run the dense halving algorithm on these -sized arrays. Time: to build from sparse entries, plus for the dense sum-check.
Stage 2 (inner): Sum-check over suffix variables.
Like any sum-check, Stage 1 ends with a final evaluation claim: "I claim ." The verifier can check via polynomial commitment. But is itself defined as a sum over suffix variables:
This is where Stage 2 comes in: it runs sum-check over the suffix variables to verify this claim. Stage 1 summed out the prefixes; Stage 2 sums out the suffixes. Together they cover all variables, but each stage operates on arrays of size instead of .
Define arrays and , each of size , indexed by suffix bit-vectors :
Here is the sparse selector with its prefix fixed to the random challenges: it answers "what is the selector's value at address ?" The factor is now a constant (computed once from the dense array) that multiplies the entire Stage 2 sum.
Computing requires the MLE interpolation identity: . For each sparse entry , we need the Lagrange coefficient to weight its contribution to .
(Recall from Chapter 4: is the multilinear Lagrange basis function, and .)
A naive approach computes each independently in field ops, giving total. But we can do better: precompute all values for every Boolean in time using the product structure of . Then each sparse entry requires only a table lookup plus one multiplication. Total: for precomputation, for the pass over sparse entries.
Run the dense halving algorithm on and for the remaining rounds. Time: for precomputing values, to accumulate into , plus for the dense sum-check.
The structure is two chained sum-checks:
- Stage 1 ( rounds): proves the sum equals , ends with evaluation claim about
- Stage 2 ( rounds): proves that evaluation claim, ends with evaluation of and
Together: rounds, matching the original -variable sum-check. But prover work is only:
Two passes over sparse terms (one per stage), plus two -sized dense sum-checks. With appropriate parameters, this can be much less than .
Worked Example: Sparse Sum with ,
Consider a table of size (so variables), but only entries are non-zero. We want to prove:
where is the 2-bit prefix and is the 2-bit suffix.
Suppose the only non-zero entries are:
| Index | Prefix | Suffix | Product | |||
|---|---|---|---|---|---|---|
| 5 | 3 | 2 | 4 | 24 | ||
| 9 | 5 | 1 | 4 | 20 | ||
| 14 | 2 | 3 | 7 | 42 |
True sum: .
A dense approach would store all 16 entries and fold arrays of size 16 → 8 → 4 → 2 → 1, touching all 16 positions even though 13 are zero. The sparse two-stage approach avoids this.
Stage 1: Build aggregated prefix array .
Scan the 3 non-zero terms and accumulate:
- Entry : Add to
- Entry : Add to
- Entry : Add to
Result: (indexed by prefix ).
Also store :
.
Run dense sum-check on for 2 rounds.
This is a size-4 sum-check (not size-16). Suppose after rounds 1-2, we get challenges .
Stage 2: Verify Stage 1's evaluation claim.
Stage 1 ended with the claim "." The verifier can check via polynomial commitment, but is defined as a sum:
Stage 2 is a second sum-check to prove this. Define arrays indexed by suffix :
To build , first precompute the Lagrange table for all 4 Boolean prefixes:
This takes field operations. Now scan the 3 sparse entries, looking up weights from the table:
- Entry , : Add to
- Entry , : Add to
- Entry , : Add to
Result: .
Building : Just copy from the values: .
Run dense sum-check on for 2 rounds to prove .
Work analysis:
- Stage 1: to build + for dense sum-check = 3 + 4 = 7 operations
- Stage 2: to precompute table + to build + for dense sum-check = 4 + 3 + 4 = 11 operations
- Total: = 18 operations instead of for the dense approach
(In this tiny example, sparse isn't faster because and are similar to . The win comes at scale.)
For realistic parameters (, ), the savings are dramatic: instead of , a 1000× speedup.
Generalization to Chunks
Split into chunks instead of 2. Each stage handles variables:
- Time:
- Space:
Choosing yields prover time with polylogarithmic space. The prover runs in time proportional to the number of non-zero terms, with only logarithmic overhead.
Spartan: Sum-Check for R1CS
What's the simplest possible SNARK?
Not in terms of assumptions (transparent or trusted setup, pairing-based or hash-based). In terms of conceptual machinery. What's the minimum set of ideas needed to go from "here's a constraint system" to "here's a succinct proof"?
Spartan (Setty, 2020) provides a surprisingly clean answer: sum-check plus polynomial commitments. Nothing else. No univariate encodings, no FFTs over roots of unity, no quotient polynomials, no PCP constructions. Just the two building blocks we've already developed.
The R1CS Setup
An R1CS instance consists of sparse matrices and a constraint: find a witness such that where denotes the Hadamard (entrywise) product. Each row of this equation is a constraint; the system has constraints over variables.
The Multilinear View
Interpret the witness as evaluations of a multilinear polynomial over the Boolean hypercube :
Similarly, view the matrices as bivariate functions: is the entry at row , column . Their multilinear extensions are defined over .
The constraint becomes: for every row index ,
Define the error at row :
The R1CS constraint is satisfied iff for all .
This multilinear view differs from the QAP approach in Chapter 12 (Groth16). There, R1CS matrices become univariate polynomials via Lagrange interpolation over roots of unity. The constraint transforms into a polynomial divisibility condition: , where is the vanishing polynomial over the evaluation domain. Proving satisfaction means exhibiting the quotient .
Spartan takes a different path. Instead of interpolating over roots of unity, it interprets vectors and matrices as multilinear extensions over the Boolean hypercube. Instead of checking divisibility by a vanishing polynomial, it checks that an error polynomial evaluates to zero on all Boolean inputs, via sum-check. No quotient polynomial, no FFT, no roots of unity. Just multilinear algebra and sum-check.
Both approaches reduce R1CS to polynomial claims. QAP reduces to divisibility; Spartan reduces to vanishing on the hypercube. The sum-check approach avoids the FFT costs and the trusted setup of pairing-based SNARKs, at the cost of larger proofs (logarithmic in the constraint count rather than constant).
The Zero-on-Hypercube Reduction
Spartan's R1CS encoding requires checking that vanishes on the Boolean hypercube, i.e., for all . The technique that handles this reduces separate equality checks to a single sum-check, and works for any polynomial, not just R1CS errors.
A natural first attempt is to prove via sum-check. If vanishes on the hypercube, this sum is indeed zero. But the converse fails. Suppose and with ; then despite being non-zero at two points. Positive and negative values cancel. A bare sum cannot distinguish "all zeros" from "zeros that happen to add up."
The fix is to weight each term with a pseudorandom coefficient so that accidental cancellation becomes overwhelmingly unlikely. Recall from Chapter 4 the equality polynomial :
On Boolean inputs, each factor equals 1 when and 0 when they differ, so the product is the indicator . The formula extends smoothly to all field elements: this is the multilinear extension of the equality indicator over . By the MLE evaluation formula (Chapter 4), for any with multilinear extension .
The reduction works as follows. The verifier samples random and asks the prover to demonstrate:
This sum is a random linear combination of , with coefficients determined by . If every , the sum is trivially zero. If even one , the sum is nonzero with high probability because the pseudorandom weights prevent cancellation. The equality polynomial turns "check values are all zero" into "check one random linear combination is zero."
We can bound the probability that a cheating prover passes this check. Define . Let . Then: Proof. If , then is a nonzero multilinear polynomial (since for some Boolean ). A nonzero multilinear polynomial has total degree at most . By Schwartz-Zippel, a nonzero polynomial of degree over has at most roots in . Thus the probability of hitting a root is at most .
This reduces "check vanishes on points" to "run sum-check on one random linear combination and verify it equals zero."
Spartan's outer sum-check
- Verifier sends random
- Prover claims , where
- Run sum-check on this claim
At the end, the verifier holds a random point and needs to evaluate . This requires three matrix-vector products: , , and .
Spartan's inner sum-checks
Each of these is itself a sum over the hypercube, requiring three more sum-checks. But now the sums are over , and the polynomials have the form for the fixed from the outer sum-check.
After running these three inner sum-checks (which can be batched into one using random linear combinations), the verifier holds a random point and needs to check:
- , , : evaluations of the matrix MLEs
- : evaluation of the witness MLE
The matrix evaluations are handled by SPARK (below). The witness evaluation is where polynomial commitments enter: the prover opens the committed at the random point , and the verifier checks the opening proof.
This is the full reduction: R1CS satisfaction → zero-on-hypercube (outer sum-check) → matrix-vector products (inner sum-checks) → point evaluations (polynomial commitment openings).
Handling sparse matrices with SPARK
The inner sum-checks end with evaluation claims: the verifier needs , , at the random points produced by the protocol. But the matrices , , are , and a dense representation costs space. Committing to them naively would dominate the entire protocol.
R1CS matrices are sparse. A circuit with constraints typically has only non-zero entries total, not . A sparse matrix with non-zero entries can be stored as a list of tuples at cost. The question is how to evaluate the matrix MLEs at from this sparse representation.
Applying the MLE evaluation formula to the bivariate function gives:
Since for most entries, this simplifies to a sum over only the non-zero entries:
For each non-zero entry , we need and . Computing directly from the formula costs . Over entries, total cost: .
SPARK reduces this to by precomputing lookup tables.
-
Precompute row weights. Build a table for all . This costs using the standard MLE evaluation algorithm (stream through bit-vectors, accumulate products).
-
Precompute column weights. Build a table for all . Cost: .
-
Evaluate via lookups. Initialize a running sum to zero. For each non-zero entry , look up and , then add to the running sum. After processing all entries, the sum equals . Cost: .
Total: , linear in the sparse representation size.
The remaining question is who checks the lookups. The prover claims to have read the correct values from the precomputed tables, but the verifier does not have those tables. SPARK resolves this with a memory-checking argument: a protocol that verifies the prover's reads against the table contents by comparing random fingerprints of both. If any lookup is incorrect, the fingerprints mismatch with high probability. Chapter 21 develops this technique in full. The overhead is in proof size and verification time, preserving SPARK's linear prover efficiency.
The full Spartan protocol
Putting it together:
-
Commitment phase. The prover commits to the witness using a multilinear polynomial commitment scheme. The matrices , , are public (part of the circuit description), so no commitment is needed for them.
-
Outer sum-check. The verifier sends random . The prover and verifier run sum-check on: This reduces to evaluating at a random point .
-
Inner sum-checks. Evaluating requires three matrix-vector products: , , and . The verifier sends random , and the parties run a single sum-check on the combined claim: where is the prover's claimed value for the batched sum. At the end of sum-check, the verifier holds a random point and a claimed evaluation of the polynomial at .
-
SPARK. The prover provides claimed values , , and proves they're consistent with the sparse matrix representation via memory-checking fingerprints.
-
Witness opening. The prover opens using the polynomial commitment scheme. The verifier checks the opening proof and obtains the value .
-
Final verification. The verifier computes using the values from steps 4-5, and checks that it equals the final claimed value from the inner sum-check. This is the "reduction endpoint": if the prover cheated anywhere in the sum-check, this equality fails with high probability.
Complexity
| Component | Prover | Verifier | Communication | Technique |
|---|---|---|---|---|
| Outer sum-check | Halving trick | |||
| Inner sum-checks | Halving trick + batching | |||
| SPARK | Precomputed tables + memory checking | |||
| Witness commitment | depends on PCS | depends on PCS | depends on PCS | Multilinear PCS (IPA, FRI, etc.) |
Why each step achieves its complexity:
-
Outer sum-check : The halving trick from earlier in this chapter. Instead of recomputing terms each round, fold the evaluation tables after each challenge. Total work across all rounds: .
-
Inner sum-checks : Same halving trick, but applied to three matrix-vector products at once. Batching with random coefficients combines the three sums into one sum-check, avoiding a overhead.
-
SPARK : Precompute for all row indices and for all column indices in time. Then each of the non-zero entries requires only two table lookups and one multiplication, with no logarithmic-cost computations per entry. Memory-checking fingerprints verify the lookups in additional work.
-
Verifier : The verifier never touches the full tables. In sum-check, it receives evaluations per round and performs field operations to check consistency. Over rounds, that's work. SPARK verification adds for the memory-checking fingerprint comparison.
With non-zero matrix entries, total prover work is , linear in the instance size. No trusted setup is required when using IPA or FRI as the polynomial commitment.
Why This Matters
Step back and consider what we've built. Spartan proves R1CS satisfaction, the standard constraint system for zkSNARKs, using only sum-check and polynomial commitments. No univariate polynomial encodings (like PLONK's permutation argument). No pairing-based trusted setup (like Groth16). No PCP constructions (like early STARKs).
The architecture is minimal: multilinear polynomials, sum-check, commitment scheme. Three ideas, combined cleanly. This simplicity is not merely aesthetic. It's the reason Spartan became the template for subsequent systems. Lasso added lookup arguments; Jolt extended further to prove virtual machine execution. Each built on the same foundation.
Notice the graph structure emerging. Spartan has two levels: an outer sum-check (over constraints) and inner sum-checks (over matrix-vector products). The outer sum-check ends with a claim; the inner sum-checks prove that claim. This is exactly the depth-two graph from the remark at the chapter's start. More complex protocols like Lasso (for lookups) and Jolt (for full RISC-V execution) extend this graph to dozens of nodes across multiple stages, but the pattern remains: sum-checks reducing claims to other sum-checks, bottoming out at committed polynomials.
When a construction is this simple, it becomes a building block for everything that follows.
The PCP Detour and Sum-Check's Return
Now that we've seen Spartan's architecture (sum-check plus commitments, nothing more), the historical question becomes pressing: why did the field spend two decades pursuing a different path?
The PCP path
In 1990, sum-check arrived. Two years later, the PCP theorem landed: every NP statement has a proof checkable by reading only a constant number of bits. This captured the field's imagination completely.
The PCP theorem seemed to obsolete sum-check. Why settle for logarithmic verification when you could have constant-query verification? Kilian showed how to compile PCPs into succinct arguments: commit to the PCP via Merkle tree, let the verifier query random locations, authenticate responses with hash paths. This became the template for succinct proofs.
Sum-check faded into the background, remembered as a stepping stone rather than a destination.
The redundant indirection
In hindsight, the PCP-based pipeline contained a redundancy. The PCP theorem transforms an interactive proof into a static proof string that the verifier queries non-adaptively. Interaction removed. But the proof string is enormous, so Kilian's construction has the prover commit to it via a Merkle tree and the verifier interactively requests query locations. Interaction reintroduced. Then Fiat-Shamir makes the protocol non-interactive. Interaction removed again.
The transformations: IP → PCP (remove interaction) → Kilian argument (add interaction back) → Fiat-Shamir (remove interaction again). Two removals of interaction. If Fiat-Shamir handles the final step anyway, why not apply it directly to the original interactive proof based on sum-check?
The return
Starting around 2018, the missing pieces fell into place: fast proving algorithms (the halving trick, sparse sums) and polynomial commitment schemes (KZG, FRI, IPA) that could handle multilinear polynomials directly. A wave of systems returned to sum-check:
- Hyrax (2018), Libra (2019): early sum-check-based SNARKs with linear-time provers
- Spartan (2020): sum-check for R1CS without trusted setup
- HyperPlonk (2023): sum-check meets Plonkish arithmetization
- Lasso/Jolt (2023-24): sum-check plus lookup arguments for zkVMs
- Binius (2024): sum-check over binary fields
The pattern: sum-check as the core interactive proof, polynomial commitments for cryptographic binding, Fiat-Shamir applied once.
What the PCP path got right
The architectural redundancy does not mean the PCP path was wasted. It produced STARKs, which remain among the most deployed proof systems. STARKs compile an IOP (the AIR + FRI pipeline from Chapter 15) using only hash functions, no elliptic curves. This gives them a property that sum-check-based systems struggle to match: post-quantum security out of the box.
Sum-check itself is information-theoretic and quantum-safe. But it produces evaluation claims that must be resolved by a polynomial commitment scheme, and the most mature multilinear PCS options (KZG, IPA, Dory) rely on discrete-log assumptions that Shor's algorithm breaks. Post-quantum alternatives exist: hash-based multilinear commitments and lattice-based schemes are active areas of research, but they remain less mature than the FRI-based commitments that STARKs use today.
The practical landscape reflects this. For applications where post-quantum security matters now (long-lived proofs, regulatory environments, sovereign infrastructure), STARKs offer a proven path. For applications where prover speed dominates and classical assumptions suffice, sum-check-based systems like Jolt and Binius achieve prover times closer to the witness computation itself. The two approaches are converging: Binius uses sum-check over binary fields with FRI-based commitments, combining both traditions. Chapter 20 develops the STARK-side optimization story in parallel with this chapter, showing how small-field techniques, NTT optimization, and FRI batching close the gap between STARK proving and witness computation from the other direction.
Key Takeaways
-
The halving trick achieves prover time. Fold evaluation tables after each challenge: via multilinear interpolation. Total work is the geometric series .
-
Not all field multiplications are equal. Over 256-bit fields, bb multiplications are roughly 8x more expensive than sb and 30x more expensive than ss. Delayed reduction amortizes modular reduction across linear combinations. These distinctions dominate wall-clock time despite being invisible in notation.
-
High-degree products cost , not . A divide-and-conquer algorithm splits factors in half, recurses, extrapolates via Lagrange (sb work), and multiplies pointwise (bb work). Only the pointwise step is expensive.
-
Small-value round-batching exploits the geometric series. The first rounds dominate total work and operate on small witness values. Treating these variables as symbolic replaces bb multiplications with ss, reducing the concrete cost of the most expensive portion of the protocol.
-
Streaming provers trade passes for memory. Applying round-batching iteratively gives space for any , without recursive proof composition.
-
Sparse sums exploit separable structure. When the polynomial factors into a sparse selector and dense prefix/suffix components, two chained sum-checks over variables each achieve cost instead of .
-
Spartan reduces R1CS to sum-check. The zero-on-hypercube reduction converts " vanishes on " into a single sum-check weighted by , which acts as a random linear combination preventing cancellation. An outer sum-check () plus batched inner sum-checks () plus SPARK () handle the full R1CS constraint system.
-
Sum-check graphs structure complex protocols. Each sum-check ends with evaluation claims. If the polynomial is committed, open it. If it is virtual, another sum-check proves the evaluation. The result is a DAG where depth determines sequential stages and width enables batching. Chapter 21 develops this perspective.
-
The PCP path and the sum-check path are converging. The IP → PCP → Kilian → Fiat-Shamir pipeline contains an architectural redundancy (interaction removed, reintroduced, removed again). Sum-check + Fiat-Shamir skips this. But the PCP lineage produced STARKs, which offer post-quantum security via hash-based commitments. Sum-check systems need a post-quantum PCS to match, and those remain less mature. Binius bridges both traditions: sum-check over binary fields with FRI-based commitments.