Chapter 25: Frontiers and Open Problems

In 1894, physicist Albert Michelson declared that “the more important fundamental laws and facts of physical science have all been discovered.” Physics was done; all that remained was measuring constants to more decimal places. Eleven years later, Einstein published Special Relativity. Then came quantum mechanics. Michelson had mistaken a plateau for a summit.

In 2020, SNARKs felt similarly settled. We had Groth16 for minimal proofs and PLONK for universal setups. The trade-offs seemed fixed, the design space mapped. Then came the Lookup Revolution (2020), Folding (2021), and Binary Fields (2023). Each reshaped what we thought possible.

This chapter is a reminder that we are not at the end of history. The “fundamental laws” of ZK are still being written. The techniques described here are the Special Relativity moments of our decade: advances that didn’t refine existing theory but rewrote it.

The frontiers span a remarkable range: from the algebraic structure of binary fields to the engineering of GPU kernels, from quantum threat models to the economics of decentralized proving markets. What unites them is a common goal: making proofs smaller, faster, and more trustworthy.

This chapter surveys those frontiers. Some involve hardness assumptions we cannot yet prove. Others involve efficiency gaps between what theory permits and what practice achieves. A few touch questions so deep that resolving them would reshape our understanding of computation itself. Think of what follows as a map of the territory we have not yet conquered.

Small-Field SNARKs and Binius

The Overhead Problem

Every proof system in this book operates over large prime fields (typically 254-bit or 256-bit elements). But most real-world data is small: booleans, bytes, 32-bit integers. Representing a single bit as a 256-bit field element wastes 255 bits of capacity.

This isn’t merely inelegant; it’s expensive. Field multiplications dominate prover time. Each multiplication operates on the full 256 bits even when the “meaningful” data is tiny. For bit-level operations (hashing, AES, bitwise logic) the overhead is a factor of 256×.

Binary Fields: The Natural Solution

Binius takes a radical approach: work over binary fields $\mathbb{F}_{2^k}$ where field elements are actual $k$-bit strings. A boolean is a 1-bit field element. A byte is an 8-bit field element. No padding, no waste.

The Native Language Analogy

Imagine a Spanish speaker forced to express every thought in German, even when talking to other Spanish speakers. Every sentence requires mental translation. Simple ideas become laborious.

This is what traditional SNARKs do. They force computer data (bits and bytes) into prime fields (large numbers). A single bit becomes a 256-bit integer. The computer “thinks” in binary, but the proof system demands a foreign representation.

Binius speaks the computer’s native language. It treats bits as bits. It doesn’t translate “0” into a 256-bit representation of zero; it keeps it as a single bit. No translation, no overhead.

The arithmetic of binary fields differs from prime fields. Addition is XOR (free in hardware). Multiplication uses polynomial arithmetic over $\mathbb{F}_2$. There are no “negative” elements; the field characteristic is 2. This seems like a step backward; binary fields lack the convenient structure of prime-order groups. But Binius recovers efficiency through clever protocol design.

The Binius Architecture

Binius combines several innovations:

Multilinear polynomials over towers of binary fields. Instead of a single large field, use a tower: $\mathbb{F}2 \subset \mathbb{F}{2^2} \subset \mathbb{F}{2^4} \subset \mathbb{F}{2^8} \subset \ldots$ Each level doubles the extension degree. Small values live in small fields; only the cryptographic randomness requires the full tower height.

What does “operating in a tower” mean concretely? Each level is a field extension: $\mathbb{F}2$ contains just ${0,1}$ with XOR addition; $\mathbb{F}{2^8}$ contains 8-bit elements (bytes); $\mathbb{F}_{2^{128}}$ provides cryptographic security. The key insight: elements of smaller fields are also elements of larger fields. A bit in $\mathbb{F}2$ can be viewed as an element of $\mathbb{F}{2^{128}}$; it’s just a very special element.

This enables a crucial optimization: store witness data in the smallest field that fits (bits stay bits, bytes stay bytes), perform arithmetic at the appropriate level, and only “lift” to the full tower when random challenges enter. The 256× overhead of representing a single bit as a 256-bit field element vanishes.

(Note: this “tower” is unrelated to the “tower of proofs” in Chapter 22’s recursion discussion. There, “tower” refers to proofs-of-proofs: $\pi_1 \to \pi_2 \to \pi_3$. Here, “tower” refers to nested field extensions. Both exploit hierarchical structure (recursion avoids re-proving entire computations; field towers avoid doing large-field arithmetic on small values) but the mechanisms are distinct.)

GKR-based multiplication. Binary field multiplication is non-trivial: it’s polynomial multiplication modulo an irreducible polynomial. Rather than encoding this as constraints (expensive), Binius uses the GKR protocol to verify multiplications. The prover commits only to inputs and outputs; intermediate multiplication steps are checked via sum-check.

FRI over binary fields (FRI-Binius). The FRI low-degree test adapts to binary domains, but the standard folding approach doesn’t directly transfer: the squaring map $x \mapsto x^2$ is not 2-to-1 on binary fields as it is over roots of unity. FRI-Binius instead uses subspace vanishing polynomials combined with an additive NTT to achieve the necessary folding structure. The technique draws a connection between the Novel Polynomial Basis and binary field FRI, enabling efficient commitment to polynomials over tiny fields like $\mathbb{F}_2$ with no embedding overhead.

Performance Implications

For bit-intensive computations, Binius achieves order-of-magnitude improvements:

Operation	Traditional (256-bit field)	Binius
SHA-256 hash	~25,000 constraints	~5,000 constraints
AES block	~10,000 constraints	~1,000 constraints
Bitwise AND	1 constraint + range check	1 native operation

The savings compound: fewer constraints mean smaller polynomials, faster FFTs, smaller proofs.

Current Status and Challenges

Binius is under active development. Polygon and Irreducible have been building production-grade Binius-based systems, with ongoing testing and iteration. But despite the compelling performance numbers, several challenges explain why existing zkVMs haven’t switched wholesale.

The tradeoff: prover speed vs proof size. Binius achieves much faster proving but produces larger proofs and slower verification than FRI-based systems. For on-chain verification where calldata costs dominate, this tradeoff matters. A 5× faster prover doesn’t help if verification gas costs double.

Recursion is harder. Verifying a Binius proof inside another Binius proof requires embedding binary field arithmetic, which is non-trivial when the verifier circuit itself uses binary fields. The algebraic structure that makes Binius fast for computation makes it awkward for recursive self-verification.

The workload mix. Binius shines for bit-intensive operations: hashing, AES, bitwise logic. But zkVMs also do 32/64-bit arithmetic, memory operations, control flow. The benefits are less dramatic for these. Some researchers suggest Binius may be better suited for precompiles (hash functions, signature verification) rather than full VM execution: use Binius where it wins, prime fields elsewhere.

Prover memory. The tower structure requires careful memory management. Naive implementations have high memory overhead.

Tooling gap. Existing circuit languages target prime fields. New frontends that exploit binary field structure are needed.

The broader lesson: matching the proof system’s field to the computation’s natural representation eliminates artificial overhead. Binius is the most developed example, but the principle applies generally. The future likely involves hybrid systems: Binius for hash-heavy components, traditional fields for arithmetic-heavy components, composed via the techniques from Chapter 22.

But field representation is only one axis of adaptation. Another looms larger on the horizon: the cryptographic assumptions themselves.

Post-Quantum SNARKs

The Coming Storm

Shor’s algorithm threatens the foundations of modern cryptography. Running in polynomial time on a quantum computer, it breaks discrete logarithm (the assumption underlying Schnorr, Pedersen, Bulletproofs), integer factoring (the assumption underlying RSA), and elliptic curve discrete log (the assumption underlying all pairing-based SNARKs).

What do these problems share? They all have hidden periodic structure in abelian groups. Factoring $N$ reduces to finding the period of $a^x \mod N$. Discrete log in $\langle g \rangle$ reduces to finding the period of $g^a h^b$. Shor’s algorithm applies the quantum Fourier transform (QFT) to extract this periodicity in polynomial time. The QFT is the key ingredient: it converts quantum superposition over exponentially many values into a measurement that reveals the period. Any problem with hidden abelian group structure falls to this attack.

Hash functions survive because they’re designed to have no exploitable structure: no periodicity, no algebraic relations. Grover’s algorithm provides only a generic $\sqrt{N}$ search speedup (quadratic, not exponential), which doubling the hash output neutralizes.

Every system in Part IV of this book (Groth16, PLONK, KZG-based constructions) will become insecure once cryptographically-relevant quantum computers exist.

Timeline estimates vary wildly: 10 years, 20 years, 30 years, never. But “never” is a dangerous bet for infrastructure with long lifespans. Financial systems, identity infrastructure, archival signatures: these need security guarantees that extend decades into the future.

Current Paths Forward

Hash-based systems. STARKs and FRI rely only on collision-resistant hashing. Since hash functions resist Shor (no hidden periodic structure) and Grover only provides quadratic speedup, STARKs are the current practical choice for post-quantum proofs. Their large proof sizes limit some applications, but they work today.

Lattice-based commitments. Replace Pedersen commitments with schemes based on Module-LWE or similar lattice problems. Why do lattices resist quantum attacks? Because they lack the hidden periodic structure that Shor exploits. The problem “find a short vector in this high-dimensional lattice” has no known abelian group structure for QFT to extract.

The approach: commit to a polynomial $f(X)$ by encoding its coefficients as a lattice point. The “hardness of finding short vectors” ensures binding (can’t open to a different polynomial). Noise flooding or rejection sampling provides hiding.

The algebraic structure is richer than hashes: you can perform homomorphic operations on commitments (add them, sometimes multiply). This enables sum-check-style protocols without the overhead of pure symmetric-key approaches. But there’s a catch: the noise in LWE grows with operations. After too many homomorphic steps, noise overwhelms signal. Managing this requires larger parameters, meaning larger commitments and proofs. Current constructions run 10-100× slower than hash-based alternatives for equivalent security.

Symmetric-key SNARKs. Build entirely from symmetric primitives: hashes, block ciphers, nothing else. The MPC-in-the-head paradigm (Ligero, Limbo, and descendants) follows this path.

The key insight: instead of algebraic assumptions, simulate an MPC protocol inside the prover’s head. The prover imagines $n$ virtual parties holding secret shares of the witness. These virtual parties execute an MPC protocol that verifies the computation. The prover commits (via hash) to all parties’ views: their inputs, randomness, and messages exchanged.

The verifier challenges: “reveal the views of parties $i$ and $j$.” The verifier checks that those two views are consistent with honest execution. If the prover cheated, at least one pair of parties has inconsistent views. With enough challenges, cheating is detected with high probability.

Why is this post-quantum? The only cryptographic primitive is the hash-based commitment to parties’ views. No discrete log, no pairings, no lattices. Security reduces to collision resistance of the hash function.

Why is it slow? The MPC protocol has communication overhead. Even efficient protocols require $O(|C|)$ work per multiplication gate, where $|C|$ is the circuit size. The prover must simulate all of this. Ligero improved this with linear-time proving via interleaved Reed-Solomon codes, but constants remain large, typically 10-100× slower than algebraic SNARKs.

See Chapter 24 for the underlying MPC techniques.

Open Problems

The post-quantum SNARK landscape faces three interrelated challenges. First, lattice-based polynomial commitments remain 10-100× slower than hash-based alternatives. Can we close this gap while maintaining rigorous security? Second, security reductions are often loose; the concrete security is much worse than asymptotic claims suggest. Tighter reductions would either increase confidence or reveal that larger parameters are needed. Third, the transition period creates its own problem: can we build hybrid systems secure against both classical and quantum adversaries without paying twice the cost?

The post-quantum transition will reshape the SNARK landscape, but it operates on a timescale of years to decades. Meanwhile, a different revolution is already underway: the race to prove arbitrary computation.

zkVMs: The Universal Prover

The Vision

Every proof system we’ve studied requires translating the computation into a constraint system: R1CS, AIR, PLONKish gates. This translation is a specialized craft. Experts hand-optimize circuits for months; a single bug invalidates the work. The barrier to entry is enormous.

zkVMs invert this relationship. Instead of adapting computations to proof systems, adapt proof systems to computations. Compile any program to a standard virtual machine (RISC-V, EVM, WASM) and prove correct execution. The zkVM handles memory reads and writes, branching and loops, function calls and returns. Write your logic in Rust or any high-level language. Compile to the target ISA. Prove execution. No circuit engineering required.

The Current Race

The zkVM landscape has stratified into distinct architectural approaches, each with different tradeoffs.

Jolt (a16z). The sum-check purist. Built entirely on multilinear polynomials and the Lasso lookup argument from Chapter 20. The philosophy: “Just One Lookup Table.” Implement CPU instructions via lookups into structured tables rather than hand-crafted constraint systems. As of mid-2025, achieves over 1 million RISC-V cycles per second on a 32-core CPU, with ~50KB proof sizes (an order of magnitude smaller than STARK-based alternatives). The lookup-centric architecture sidesteps quotient polynomials and grand products. A streaming prover is under development that will prove arbitrarily long executions in under 2GB RAM, enabling mobile proving with minimal recursion overhead.

RISC Zero. The production workhorse. STARK-based with FRI commitments over the Baby Bear field, targeting RISC-V. Uses continuations to split large computations into bounded segments (~$10^6$ cycles), proves each with STARK, then aggregates via recursion. Final proofs wrap in Groth16 for cheap on-chain verification. The Bonsai network provides prover-as-a-service infrastructure, abstracting away proof generation entirely. R0VM 2.0 (April 2025) reduced Ethereum block proving from 35 minutes to 44 seconds. The “dual-engine” strategy: Bonsai for hosted enterprise proving, Boundless for a decentralized proof marketplace.

Note on continuations: Continuations are a specific flavor of recursion. Instead of proving the entire computation history at each step, you prove only the current segment plus a commitment to the previous segment’s final state (a memory root or hash). This lets you pause and resume computation at arbitrary points, which is critical for programs that run longer than a single proof cycle allows. Think of it as a checkpoint system: each segment proves “I started from this checkpoint and reached that checkpoint,” rather than “I verified everything that came before me.”

SP1 (Succinct). The precompile optimizer. Cross-table lookup architecture with a flexible precompile system that accelerates common operations (signature verification, hashing) by 5-10× over raw RISC-V. A precompile is essentially a “cheat code” for the VM: instead of executing a SHA-256 hash step-by-step through thousands of RISC-V instructions, the VM recognizes the operation and delegates it to a specialized, hand-optimized sub-circuit. Think of it as a GPU inside the CPU specifically for heavy cryptographic math. SP1 Hypercube (2025) moved from STARKs to multilinear polynomials, achieving real-time Ethereum proving: 99.7% of L1 blocks proven in under 12 seconds on 16 GPUs. First general-purpose zkVM to eliminate proximity gap conjectures. The team estimates a real-time prover cluster could be built for ~$100K in hardware.

Zisk (Polygon spinoff). The latency minimizer. Spun out of Polygon’s zkEVM team (led by co-founder Jordi Baylina) in June 2025, with all Polygon zkEVM IP transferred to the new entity. Built on RISC-V 64, designed from the ground up for low-latency distributed proving. Features a 1.5GHz zkVM execution engine, highly parallelized proof generation, GPU-optimized code, and advanced aggregation circuits. The architecture targets real-time Ethereum block proving via massive parallelization across prover clusters.

zkWASM and others. Target WebAssembly, enabling proofs of browser-compatible code. Useful when the goal is proving execution of existing web applications rather than purpose-built programs.

The competition is fierce and productive. Notice the convergence: multiple teams moving toward multilinear polynomials (away from univariate STARKs), real-time proving as the target, and precompiles for common operations. Techniques developed for one system often transfer to others.

Architectural Insights from Production Systems

Several zkVM design patterns have emerged that generalize beyond specific implementations:

The stack machine insight. Traditional CPU architectures distinguish registers (fast, few) from memory (slow, large). In ZK circuits, this distinction vanishes: both register access and memory access are polynomial lookups with identical cost. Valida exploited this by eliminating general-purpose registers entirely, using a stack machine instead. The simplification reduces per-cycle constraint count and CPU state complexity. Any zkVM designer should ask: does our architecture carry assumptions from physical hardware that don’t apply in ZK?

Segment-based proving. Long computations face a memory wall: proving $10^9$ cycles requires holding intermediate state for $10^9$ steps. RISC Zero’s approach: split execution into bounded segments (~$10^6$ cycles each), prove each segment independently, then aggregate via recursive composition (lift/join). Peak prover memory stays bounded regardless of total computation length. This is a general pattern: unbounded computation can always be factored into bounded pieces with recursive aggregation.

Challenge-based memory arguments. Memory consistency can be verified two ways: Merkle trees (commit to memory state, prove access via authentication paths) or algebraic challenges (accumulate memory operations into fingerprint polynomials, verify consistency via Schwartz-Zippel). The Merkle approach requires hashing inside the circuit, which is expensive. The challenge approach (used in SP1, related to Twist-and-Shout from Chapter 20) uses only field arithmetic. For memory-heavy workloads, the difference is 10×+.

The precompile pattern. Some operations appear frequently and have specialized efficient circuits: SHA256, Keccak, ECDSA, pairing computations. Rather than interpreting these through the general VM, zkVMs expose “precompiles,” direct circuit implementations that run 10-100× faster than interpreted execution. The trade-off: each precompile adds engineering complexity and circuit size. The design question: which operations justify dedicated circuits? Current heuristics target cryptographic primitives, as they’re both expensive (many constraints per operation) and common (appear in most blockchain applications).

Open Problems

The zkVM overhead problem dominates. Current systems run 100-1000× slower than native computation. A program that runs in 1 second requires 100-1000 seconds to prove. Can we approach 10×? Approaching 1× seems impossible with current techniques, but every generation of proof systems has surprised us.

Memory efficiency presents a related challenge. A 4GB address space means $2^{32}$ potential memory cells, far too many to commit individually. Virtual polynomial techniques (Chapter 20) help, but scaling to gigabytes of working memory remains challenging.

Then there’s the precompile selection problem. Adding dedicated circuits for common operations (hashing, signatures) improves performance 10-100× for those operations, but each precompile requires engineering effort. Current systems hand-pick based on blockchain workloads. General-purpose proving may need different choices. Can we automate precompile discovery, identifying hot operations and generating specialized circuits?

A note on circuit-friendly signatures: The ECDSA verification bottleneck has driven adoption of EdDSA over “embedded” curves like BabyJubJub, curves whose base field equals the scalar field of the outer proving curve (typically BN254 or BLS12-381). EdDSA verification becomes native field arithmetic rather than expensive non-native simulation. This design pattern, choosing cryptographic primitives for circuit efficiency, recurs throughout ZK system design.

Finally, parallelization: most zkVMs are inherently sequential, each instruction depending on the previous state. But physical computation increasingly relies on parallelism. How do we prove parallel programs efficiently? How do we exploit prover parallelism for sequential programs?

The zkVM race demonstrates a recurring pattern: performance improvements come from both algorithms (better proof systems, smarter memory arguments) and systems engineering (GPU kernels, distributed proving, careful memory management). The field is young enough that order-of-magnitude gains still come regularly.

But speed means nothing if the proofs are wrong.

Formal Verification

The Invisible Bugs

A bug in a ZK system is uniquely dangerous. Unlike a crash, which announces itself loudly, a soundness bug operates in silence. An attacker exploits it to forge proofs; the verifier accepts; the system behaves as though everything is fine. By the time the compromise is discovered, the damage is done.

High-profile vulnerabilities have been found in deployed systems: missing constraint checks that allowed witnesses to satisfy constraints they shouldn’t, incorrect range assumptions that permitted overflow attacks, field confusion bugs where values were interpreted in the wrong field. These are not hypothetical risks. They have happened. They will happen again.

Current Efforts

Several approaches are gaining traction. Verified compilers prove that compilation from a high-level circuit language to low-level constraints preserves semantics. If the compiler is verified, bugs in the source circuit remain bugs in the source circuit, not silent specification violations introduced by translation.

Machine-checked formal soundness proofs (in Coq, Lean, Isabelle) establish that the protocol is sound by construction. A formal proof eliminates entire classes of bugs.

Static analysis tools detect common vulnerability patterns in circuit code: unconstrained variables, degree violations, missing range checks. These catch bugs before deployment rather than after exploitation.

Open Problems

The verification challenge is the gaps between verified components. You might verify the compiler but not the runtime, the protocol but not the implementation, the circuit but not the witness generator. Bugs hide at the boundaries. End-to-end verification, covering the entire stack from source code to final proof, remains an open problem.

Automated bug finding offers a complementary approach. Can we build fuzzers or symbolic executors specifically designed for ZK circuits? The search space is enormous, but constraint systems have structure that might enable efficient exploration.

Perhaps the hardest challenge: verification of optimized implementations. The fastest code uses hand-tuned assembly, GPU kernels, FPGA bitstreams. These are inherently hard to verify. How do we maintain security guarantees when performance demands low-level optimization?

Formal verification addresses correctness; the next frontier is raw performance.

Hardware Acceleration

The Computational Bottleneck

Prover computation is dominated by a few operations: multi-scalar multiplication (computing $\sum_i s_i \cdot G_i$ for scalar and point vectors), number-theoretic transforms (converting between coefficient and evaluation representations), and hash evaluations (for FRI, Merkle trees, Fiat-Shamir).

These workloads share a common structure: massive parallelism with minimal branching. CPUs are optimized for the opposite: branch prediction, cache locality, and general-purpose control flow. This mismatch creates opportunity.

Current Approaches

GPUs provide 10-100× speedup for MSM and NTT, which are massively parallel by nature. Libraries like cuSNARK and ICICLE offer CUDA implementations. FPGAs offer energy efficiency and latency advantages, with the ability to implement custom arithmetic units for specific field sizes. ASICs represent the ultimate optimization: custom chips designed specifically for ZK proving. Several companies are developing these, betting that proof generation will become a significant computational market.

Open Problems

Memory bandwidth increasingly dominates. Large circuits require gigabytes of data; memory transfer between CPU and GPU often exceeds computation time. Minimizing data movement requires new algorithms designed with memory hierarchy in mind.

This suggests a deeper opportunity: algorithm-hardware co-design. Current proof systems were designed for CPUs. What if we designed protocols that explicitly exploit GPU or ASIC parallelism? The FFT is elegant but assumes certain memory access patterns. A proof system designed around GPU memory hierarchy might look very different.

Decentralized proving poses its own challenges. How do we distribute proving across many machines without trusted coordination? Recursive composition provides one answer (each machine proves a piece, proofs aggregate), but the coordination overhead is substantial.

Hardware acceleration speeds up individual provers. But often the bottleneck isn’t proving one thing fast enough; it’s proving many things at all.

Aggregation and Batching

The Verification Problem

Blockchain applications face a scaling challenge on the verification side. A rollup might process millions of transactions. A privacy system might have millions of users. Each transaction, each credential, each computation generates a proof. Verifying them individually costs $O(n)$ time. For large $n$, verification becomes the bottleneck even if each individual proof is fast.

Current Techniques

Three main approaches have emerged. Recursive aggregation proves that you verified $n$ proofs; the aggregation proof attests to all underlying proofs, making verification cost constant regardless of $n$. Batch verification uses clever randomization to check $n$ proofs with work sublinear in $n$; pairing-based proofs can sometimes batch to roughly 2× the cost of a single verification. Proof compression wraps a large proof (STARK) in a small proof (Groth16); the inner proof is never revealed, and the outer proof serves as a constant-size attestation.

Open Problems

Recursion adds significant prover overhead since proving verification is expensive. Can we aggregate proofs without proving verification? Some approaches use algebraic structure (e.g., aggregating KZG openings), but general solutions remain elusive.

Incremental aggregation poses another challenge. Given an aggregate of $n$ proofs, how do we add proof $n+1$ without recomputing from scratch? Naive recursion requires touching all previous proofs.

Cross-system aggregation may be the most ambitious goal: can we aggregate proofs from different proof systems into a single attestation? A Groth16 proof and a STARK proof combined without exposing either?

Aggregation addresses the question of how to combine proofs after they exist. But there’s an earlier bottleneck, one that often dominates total proving time.

Witness Generation

The Hidden Cost

Discussions of prover efficiency focus on the cryptographic work: computing commitments, running sum-check, evaluating polynomials. But there’s a step before that. The witness includes not just the prover’s secret input but all intermediate values in the computation. For a circuit with $n$ gates, the witness has $O(n)$ elements. Computing these elements (executing the circuit) can take longer than the proving step itself.

The Hidden Iceberg

Academic papers report “prover time” as the time spent on cryptographic operations: MSMs, FFTs, hashes. But the full cost of generating a proof includes witness generation, and this is often the larger piece.

Think of an iceberg. The cryptographic prover is the visible tip above water. Witness generation is the massive bulk beneath the surface. A paper might report “proving takes 10 seconds” while silently omitting that witness generation took 60 seconds. The total time to produce a proof is 70 seconds, not 10.

This matters because witness generation and proving have different scaling properties. Proving can be parallelized across GPUs; witness generation is often sequential and memory-bound. Optimizing the cryptographic prover by 10× helps less than you’d expect if witness generation dominates.

This hidden cost explains why benchmarks sometimes mislead. A proof system might advertise “10 million constraints per second,” but if witness generation runs at 1 million constraints per second, the advertised speed is unreachable.

Current Approaches

Several techniques help. Parallel witness generation computes independent portions simultaneously. Witness streaming generates values on-demand rather than materializing everything upfront, reducing peak memory. Witness compression stores only a subset of values and recomputes others when needed, trading computation for space.

Open Problems

For zkVMs, witness generation from traces is a key bottleneck. The execution trace (the sequence of states the CPU passed through) already exists. Translating it into the format the prover needs is expensive. This translation often dominates end-to-end proving time.

Incremental witness update matters for interactive applications. When the input changes slightly, can we update the witness without recomputing entirely? For applications where inputs evolve over time (games, simulations, collaborative editing), this could unlock new use cases.

Witness generation is a systems problem: engineering, not cryptography. But it increasingly determines real-world performance. The most elegant protocol means nothing if witness generation is the bottleneck.

All the techniques we’ve discussed (field choice, post-quantum assumptions, zkVMs, verification, hardware, aggregation, witness generation) serve as building blocks for applications. The most demanding application on the horizon may be machine learning.

Privacy-Preserving Machine Learning

The Intersection

Zero-knowledge proofs enable tantalizing possibilities for machine learning: prove that a model was trained correctly on a claimed dataset without revealing the training data; prove that an inference was computed correctly without revealing the model or the input; prove that a model satisfies certain properties (fairness, robustness) without revealing the model itself.

Privacy-preserving ML could transform healthcare (prove diagnosis without revealing patient data), finance (prove creditworthiness without revealing financial history), and countless other domains.

The Current State

Proof of training remains prohibitively expensive. A GPT-scale model has billions of parameters; training involves trillions of operations. Current SNARKs cannot handle computations of this scale, not remotely close.

Proof of inference is more tractable. Small neural networks (thousands to tens of thousands of parameters) have been proven. But even here, the overhead is substantial: 100× or more compared to native inference.

Open Problems

Scaling to large models is the central challenge. Can we prove inference for models with millions of parameters? The constraint count is daunting, but perhaps structure (repeated layers, sparse activations) can be exploited.

Non-linearities create a specific bottleneck. ReLU, softmax, and other activation functions are expensive in arithmetic circuits. Approximating them efficiently, or designing “ZK-friendly” architectures that use amenable non-linearities, could unlock significant improvements.

Current approaches often require hand-optimized circuits for specific model architectures. Model-agnostic techniques that work for any architecture without manual optimization remain elusive.

ZK is not the only path to privacy-preserving ML. Fully homomorphic encryption (FHE) enables encrypted inference: the model owner computes on encrypted data without seeing the inputs. The trust model differs: FHE hides inputs from the server, while ZK proves correctness to the client. Hybrid approaches combining ZK and FHE are under active research. See Chapter 26 for a broader discussion of how ZK relates to FHE and other programmable cryptography primitives.

Privacy-preserving ML is perhaps the most ambitious application of ZK proofs. Success would require advances across nearly every frontier we’ve discussed: better proof systems, faster hardware, efficient witness generation, formal verification. It’s a stress test for the entire field.

We’ve surveyed the practical frontiers. But the practical advances rest on theoretical foundations, and those foundations have their own open questions.

Theoretical Foundations

Fundamental Questions

Beyond the engineering challenges, deep theoretical questions remain open.

What is the optimal soundness error for a given proof size? We have constructions, but we lack matching lower bounds. Perhaps dramatically better systems are possible; perhaps we’re already close to optimal. Tight bounds would tell us whether to keep searching or to focus elsewhere.

Does deep recursion preserve knowledge soundness? Current theory suggests the security reduction degrades with recursion depth. Is this inherent, or an artifact of our proof techniques? The answer matters for the recursive composition that underpins modern zkVMs.

Can we build succinct proofs without any computational assumption, purely from interaction and randomness? The theoretical complexity of IP (interactive proofs) suggests limits, but the exact boundaries remain unclear.

What if multiple provers, each with partial information, jointly convince a verifier? Multi-prover SNARKs might enable new efficiency tradeoffs or trust models.

Connections to Broader Theory

ZK proofs touch fundamental questions in theoretical computer science.

SNARK techniques have implications for complexity theory: questions about circuit lower bounds, algebraic computation, and the structure of NP. Progress on SNARKs sometimes yields progress on these foundational questions, and vice versa.

The assumptions underlying SNARKs (knowledge assumptions, generic group model) are stronger than standard cryptographic assumptions. Are they actually true? Their validity is a matter of ongoing debate.

Information theory asks: what are the fundamental limits of proof compression? How much can you prove with how few bits? These questions connect to the deepest problems in theoretical computer science.

These questions matter beyond intellectual curiosity. Tight lower bounds would tell us whether current systems are close to optimal or whether dramatically better constructions await discovery. The answers will shape the field’s long-term trajectory.

Closing Perspective

The frontiers we’ve surveyed span a remarkable range: from the algebraic structure of binary fields to the engineering of GPU kernels, from quantum threat models to the economics of decentralized proving markets. What unites them is a common goal: making proofs smaller, faster, and more trustworthy.

The field is young. Systems that seemed optimal five years ago have been superseded. Techniques dismissed as impractical have become standard. The gap between theory and practice has narrowed faster than anyone predicted.

Some patterns emerge from the chaos. Post-quantum concerns are driving a shift toward hash-based systems. zkVMs are becoming the default abstraction for provable computation. Multilinear polynomials are displacing univariate encodings. Hardware acceleration is transitioning from optional to essential. Formal verification is gaining recognition as necessary, not nice-to-have.

But predictions are dangerous in a field moving this fast. The most important development of the next five years may not appear on any current research agenda. It may come from an unexpected connection between existing techniques, or from a problem domain no one is currently targeting.

What we can say with confidence: the fundamental primitives work. Sum-check, polynomial commitments, and recursive composition have proven themselves. The remaining questions are about optimization, about engineering, about scaling to real-world demands. Those are the kinds of questions that, historically, get solved.

But ZK proofs are not the only approach to computing on secrets. They’re the first to reach satisfying practicality, but they’re part of a larger landscape: fully homomorphic encryption, program obfuscation, and the convergence of techniques from multiple branches of programmable cryptography. The next chapter steps back to see where ZK fits in that broader picture, and where the paths are beginning to merge.

Keyboard shortcuts

Minimizing Trust, Maximizing Truth