A simplified version of the quantum OTOC$^{(2)}$ problem
Abstract: This note presents a simplified version of the OTOC${(2)}$ problem that was recently experimentally implemented by Google Quantum AI and collaborators. We present a formulation of the problem for growing input size and hope this spurs further theoretical work on the problem.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
Overview
This paper introduces a simpler version of a new quantum computing challenge that Google Quantum AI recently tested. The main idea is to define a clear, checkable task for a quantum computer that seems hard for regular (classical) computers. Instead of asking a quantum computer to “sample” random outputs (which are hard to verify), this task asks it to compute a specific number: how strongly two actions on a quantum system affect each other when the system is deeply “mixed” by a random circuit.
Key Objectives
The paper aims to:
- Define a straightforward, scalable problem a quantum computer can solve: estimate a number written as , where .
- Explain why this number behaves differently depending on how “deep” the random circuit is, and why that makes the problem interesting.
- Give reasons to believe this problem is easy for quantum computers but hard for classical computers, especially at certain circuit depths (“middle” depths).
- Provide a clean version of the problem that’s easier for theorists to study than the exact hardware setup used in the experiment.
How the Problem Works (Methods and Approach)
Think of a quantum computer as a big set of tiny switches called qubits, arranged on a grid. The computer applies “gates” (actions) to pairs of qubits to mix up their states. The paper uses:
- A random circuit on an grid of qubits.
- Two simple actions: (like flipping a specific qubit with a Pauli ) and (checking a specific qubit with a Pauli ).
The key quantity is based on how and interact after the system is mixed by :
- First, define . You can think of and like “unmixing then remaking” the state around the action , and as a measurement action somewhere else.
- The main task (called the “(2) problem”) is to estimate the number to within a small error.
Why this number? If the circuit isn’t deep, the actions don’t “reach” each other and don’t interfere, so the number is exactly 1. If the circuit is very deep and thoroughly scrambles the system, the number is close to 0. In-between, it varies in interesting, instance-specific ways.
How do you estimate it on a quantum computer? In simple terms:
- Prepare the all-zeros state .
- Apply to get .
- Measure the first qubit to estimate . Repeating this enough times gives an estimate with small error. The number of gates needed roughly scales like the size of the circuit , and the number of repeats scales like if you want error (there are standard tricks, like amplitude estimation, that can reduce repeats if deeper circuits are available).
Main Findings and Why They Matter
- At low circuit depths, there is no “interaction” between the two actions ( and ), so the answer is 1. At very high depths, the circuit scrambles everything and the answer is near 0. In the middle, the values differ significantly from “fully random,” creating a strong, instance-specific signal.
- The paper (and the related experiment) gives evidence that estimating at these middle depths is hard for classical algorithms. In particular, some popular classical methods (like Monte Carlo) run into “sign problems,” which get worse as you consider higher powers like .
- The authors conjecture that, for large grids and certain depths proportional to the grid size , there is no classical algorithm that can efficiently estimate this quantity to small error.
Why is this important? Unlike sampling problems, this is a specific number you can aim to verify. Even if classical verification is tough, other quantum devices (or future, less noisy ones) could check the result, and in physics simulations you can compare with nature itself.
Implications and Potential Impact
- This task is a step toward practical demonstrations of quantum advantage that are more verifiable than pure sampling.
- It could guide the development of new quantum algorithms and theory, especially around how information spreads (“scrambles”) in quantum systems.
- It provides a clean benchmark problem for researchers, potentially leading to better understanding of where classical methods fail and quantum methods shine.
Key Terms Explained
- Qubit: The basic unit of quantum information, like a tiny switch that can be 0, 1, or a mix of both at once.
- Gate: An operation that changes the state of one or more qubits.
- Random circuit: A sequence of gates chosen randomly, which tends to mix or “scramble” the system.
- Pauli and : Very simple gates; flips a qubit (like turning 0 into 1), and checks its phase (a kind of directional property of its state).
- Unitary and : A unitary is a reversible transformation; is its exact undoing.
- Expectation value : The average measurement outcome you’d get if you repeated an experiment many times.
- “Light-cone”: How far the effect of an action can spread after several layers of gates; if it doesn’t reach another action, they don’t interfere.
- Haar-random: “Truly uniform” randomness over all possible unitary transformations; like picking a completely random shuffle.
- Monte Carlo and sign problem: A classical random-sampling technique; “sign problems” mean positive and negative contributions cancel in messy ways, making estimates noisy and inefficient.
How This Simplified Problem Differs From the Real Experiment
The experiment used:
- 65 qubits on an irregular grid and 23 layers of two-qubit gates.
- A measurement that is Pauli , and a that acts on three qubits with a Pauli .
- A specific gate set (iSWAP-like two-qubit gates plus random single-qubit gates), not fully Haar-random.
- A harder quantity than : they subtract an easier “diagonal” part and assess performance using a correlation measure (similar to Pearson correlation) rather than a uniform additive error. The authors suspect that achieving an additive error around would already be very challenging for classical algorithms.
In short, the paper strips away hardware details to define a clean, general problem that theorists can analyze, while the experiment tackled a tougher, real-world version and found signs of quantum advantage.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a single, consolidated list of what remains uncertain or unexplored in the paper and what future work could concretely address.
- Formal average-case hardness: No rigorous proof that estimating to additive error is classically hard on average over the stated circuit ensembles; provide a proof under standard complexity assumptions or reductions (e.g., to known average-case hard problems) and specify the precise distributional assumptions required.
- Precise hardness regime in 2D grids: The claim that hardness occurs for (with ) lacks a concrete threshold and finite-size scaling analysis; characterize the critical depth and its dependence on , including fluctuations and scaling of the signal as grows.
- Distribution mismatch with experiment: The theoretical ensemble uses Haar-random 2-qubit gates, whereas the experiment uses fixed iSWAP-like 2-qubit gates plus specific single-qubit randomness; determine whether hardness and signal properties provably transfer across these ensembles and quantify any robustness to gate-set changes.
- Role of butterfly and measurement operators: The theory uses single-qubit and , while the experiment uses a 3-qubit ; analyze how the locality, support size, and choice of affect signal strength, scrambling onset, and classical hardness.
- From to : It is stated that some classical methods can efficiently approximate (1) () in certain cases; establish whether there exist classical algorithms that also approximate (2) () efficiently for broad parameter regimes, and delineate the boundary between tractable and intractable regimes.
- Quantitative characterization of instance-to-instance fluctuations: The conjecture of inverse-polynomial fluctuations at intermediate depth is not proven; derive rigorous bounds (mean, variance, concentration) for across the ensemble and identify the regime where fluctuations are reliably detectable.
- Exact mapping to unitary -designs: The paper suggests moments quantify deviation from a $2k$-design but provides no formal connection; formalize the relationship between and -design properties and determine the minimal needed to render classical approximation hard.
- Higher-moment scaling and sign problems: Claims that Monte Carlo methods face worsening sign problems as increases are heuristic; prove lower bounds on variance/complexity of classical Monte Carlo estimators for growing , or construct instances that amplify these sign problems.
- Choice of error metric: The experiment optimizes a Pearson correlation/SNR metric rather than uniform additive error; develop a precise translation between SNR/correlation metrics and additive-error guarantees for and validate whether corresponds to the reported empirical hardness.
- Robustness to noise and SPAM errors: The theoretical formulation ignores hardware noise; quantify how realistic noise (gate errors, decoherence, SPAM) biases/varies the estimator for , and specify error-mitigation strategies with provable guarantees on bias and overhead.
- Practical quantum resource estimates: The gate complexity claims per circuit repetition and samples (or with amplitude estimation) lack detailed resource accounting; provide full-depth, qubit, coherence-time, and error-budget analyses for scalable implementations, including the cost of preparing and applying or .
- Verification pathways: The paper motivates “quantum verifiability” but does not propose concrete cross-check protocols; design and analyze inter-device verification schemes (e.g., cross-platform consistency tests, randomized benchmarking variants) that can detect systematic errors in estimating .
- Explicit spectral/light-cone analysis: The commuting-to-scrambling transition is described qualitatively; compute exact light-cone growth, Lieb–Robinson-type bounds, and commutator norms for the chosen gate ensemble to precisely locate the onset of non-commutation and quantify operator spreading.
- Instance selection and anti-concentration: For average-case hardness, clarify how instances are sampled to avoid trivial near-zero signals and ensure anti-concentration of ; derive anti-concentration bounds and certify that typical instances yield measurable signal.
- Relationship to Random Circuit Sampling: The connection between this function-estimation task and RCS-style hardness is not formalized; investigate whether known RCS hardness assumptions imply hardness for , or whether new assumptions are required.
- Mixed-state variant: The proposed extension to the maximally mixed state, , is not analyzed; determine whether this variant is easier/harder to estimate classically or quantumly and whether it admits more robust verification under noise.
- Alternative classical baselines: Beyond Monte Carlo, the paper does not assess tensor-network, Clifford+noise approximations, quasi-probability methods, or machine-learning surrogates; systematically benchmark and bound these methods for the stated ensembles and depth regimes.
- Precise relationship to “harder quantity” in the experiment: The experiment subtracts to isolate a harder component; formalize the computational relationship between raw and the subtracted quantity, including whether hardness proofs (or evidence) for one imply hardness for the other.
- Finite-size effects at : The experiment’s non-square connectivity and specific layout may affect scrambling and estimator variance; analyze how irregular graphs alter the depth threshold, signal strength, and classical simulability compared to ideal square grids.
- Generalization across geometries and dimensions: The theory focuses on 2D grids; extend the analysis to 1D chains, higher-dimensional lattices, and non-local connectivities and characterize how geometry impacts hardness and the commuting-to-scrambling transition.
- Sensitivity to gate distributions: Within the “Haar-random 2-qubit gate” assumption, quantify sensitivity to deviations (biased angles, restricted subgroups, correlated randomness) and identify minimal randomness needed to retain hardness.
- Tight bounds on sample complexity: The estimator uses samples (or via amplitude estimation) without variance analysis; derive tight bounds on estimator variance and optimal sampling strategies (e.g., importance sampling, control variates) for this observable.
- Explicit hardness assumptions: The conjecture lacks an explicit complexity-theoretic framework (e.g., non-collapse of the polynomial hierarchy, IQP hardness, average-case conjectures); state and justify the minimal set of assumptions under which classical hardness would follow.
Practical Applications
Below is an overview of practical applications suggested by the paper’s findings and methods, organized into near-term (deployable now) and longer-term (requiring further development). Each item notes relevant sectors, likely tools/products/workflows, and key assumptions or dependencies affecting feasibility.
Immediate Applications
- Benchmarking and calibration of quantum hardware via C⁴-based correlators
- Sector: hardware, software
- Application: Use the simplified (2) problem (estimating ⟨0ⁿ|C⁴|0ⁿ⟩) to score device performance across depth/size, tune entangling gates, detect crosstalk, and identify the onset of scrambling.
- Tools/workflows: “OTOC Benchmark Suite” implementing brickwork circuits, automated runs at varied depths, device scoring via additive error or correlation metrics; integration into device calibration pipelines.
- Assumptions/dependencies: Access to 50–100+ qubits with coherence over d≈Θ(ℓ), reliable preparation of |0ⁿ⟩, mapping B and M to distant qubits on available topology, stable gate calibration, measurement fidelity.
- Cross-device “quantum verifiability” of function outputs
- Sector: industry (QC providers), academia
- Application: Run identical (2) instances across different quantum platforms to verify consistency of expectation values, enabling trust without classical verification; supports claims of quantum advantage in function computation.
- Tools/workflows: Cross-platform instance registry, transpilation templates for brickwork circuits, Pearson correlation/SNR reporting; dashboards for inter-device comparisons.
- Assumptions/dependencies: Comparable gate sets or robust transpilation, reproducible shallow-to-intermediate depths, consensus error metrics (e.g., ε≈10⁻³), coordinated access to multiple devices.
- Validation of analog quantum simulators using digital correlator checks
- Sector: academia/physics labs
- Application: Digitally estimate OTOC-like moments on a gate-based QC and compare against analog simulators (e.g., cold atoms, trapped ions) to validate operator spreading and scrambling.
- Tools/workflows: Experimental protocols mapping analog dynamics to effective circuits; selection of butterfly/measurement operators; cross-lab data sharing.
- Assumptions/dependencies: Feasible mapping from analog evolution to circuit instances, controlled noise and decoherence at intermediate depths, synchronized timing and calibration.
- Stress-testing classical Monte Carlo and tensor-network methods (sign-problem benchmarks)
- Sector: academia, HPC software
- Application: Use (2) and higher-moment instances as benchmark datasets to evaluate classical algorithms’ robustness to sign problems, spurring algorithmic improvements.
- Tools/workflows: Public instance suites with ground-truth QC data; evaluation harnesses with standardized metrics (ε targets, SNR, runtime scaling).
- Assumptions/dependencies: Availability of QC data at challenging depths, reproducible instance generation, fair baselines, compute resources for classical runs.
- Pseudorandom unitary quality assessment (design-proximity scoring)
- Sector: software (compilers, circuit synthesis), security
- Application: Measure deviations of ⟨C{2k}⟩ from Haar-random values to grade pseudo-random unitary generators and circuit synthesis strategies; inform compiler passes that aim for design-like scrambling.
- Tools/workflows: “Unitary-design score” within circuit toolchains, regression tests for PRU libraries, CI hooks checking higher moments.
- Assumptions/dependencies: Practical k (e.g., k=2) measurable on current devices, reliable error mitigation, consistent baselines for Haar deviation.
- Education and training modules on quantum scrambling and OTOCs
- Sector: education
- Application: Classroom/lab exercises using the simplified setup to teach operator spreading, light-cone effects, and measurement of correlators; cloud QC labs.
- Tools/workflows: Jupyter notebooks, cloud QC integrations, visualizations of ⟨C²⟩ vs ⟨C⁴⟩ behaviors across depth, guided parameter sweeps.
- Assumptions/dependencies: Access to modest- to mid-scale cloud QCs or accurate simulators for shallow depths; prebuilt circuit templates.
- Communication standards for quantum advantage claims (function-computation metrics)
- Sector: policy, industry communications
- Application: Report advantage using additive-error targets (e.g., ε≈10⁻³) and correlation/SNR metrics rather than raw sampling outputs; improves transparency and reproducibility.
- Tools/workflows: Reporting guidelines, standardized plots and tables, public repositories of instances and results.
- Assumptions/dependencies: Community buy-in on metrics; alignment with journal and conference standards; availability of peer-verifiable datasets.
Long-Term Applications
- Certified “Quantum Verified Advantage” programs and standards
- Sector: policy/standards, industry
- Application: Formal certification regimes using (2) and higher-moment tests to validate beyond-classical performance under average-case hardness assumptions.
- Tools/products: Open certification test suites, auditing services, compliance badges.
- Assumptions/dependencies: Rigorous average-case hardness results, robust cross-platform comparability, governance and consensus.
- Manufacturing QA for quantum devices via scrambling metrics
- Sector: hardware manufacturing
- Application: Use OTOC-based acceptance tests during production to ensure consistent entangling capacity and noise characteristics across batches.
- Tools/workflows: Automated factory test rigs, parametric dashboards tracking ⟨C{2k}⟩ across device units.
- Assumptions/dependencies: Scalable measurement protocols, uniform device topology, sufficient coherence for Θ(ℓ) depths.
- Hybrid verification for practical quantum simulations (materials, chemistry)
- Sector: materials, energy, healthcare (drug discovery)
- Application: Embed correlator measurements within simulation workflows to verify aspects of dynamics (e.g., operator growth, correlation decay) against experiments or alternate QCs, increasing trust in simulation outputs.
- Tools/workflows: Workflow nodes for correlator estimation, data fusion with experimental observables, error-mitigated amplitude estimation.
- Assumptions/dependencies: Mappings from physical Hamiltonians to circuits, larger and cleaner QCs or error correction, domain-relevant choices of B and M.
- Cryptographic primitives/infrastructure leveraging unitary design hardness
- Sector: security/cryptography
- Application: Explore PRNGs, commitments or proof-of-work-like tasks based on hardness of estimating ⟨C{2k}⟩; potential “quantum advantage cryptography” constructs.
- Tools/products: APIs offering correlator-based challenges, libraries with security reductions.
- Assumptions/dependencies: Formal reductions and average-case hardness proofs, careful threat modeling, access to QCs for generation/verification.
- Fault-tolerant amplitude estimation pipelines for precision correlators
- Sector: software, hardware
- Application: Leverage O(1/ε) scaling to achieve high-precision correlator measurement at scale (n≫10²), enabling metrology-grade studies of scrambling and design proximity.
- Tools/workflows: Error-corrected circuits, advanced amplitude estimation, resource estimation and scheduling.
- Assumptions/dependencies: Mature fault tolerance (logical qubits, T-gate budgets), long coherent depths, effective error mitigation.
- Large-scale studies of quantum chaos, thermalization, and phase transitions
- Sector: academia (condensed matter, high-energy), energy materials
- Application: Systematically map operator spreading across architectures and interaction graphs, identify transitions, and inform design of strongly correlated materials or devices.
- Tools/workflows: High-throughput correlator campaigns, data-driven modeling, integration with theoretical frameworks.
- Assumptions/dependencies: Scalable devices, reproducible higher-moment estimation, sustained funding for large experiments.
- Cloud products: “OTOC-as-a-Service” and benchmarking APIs
- Sector: software/cloud
- Application: Offer managed services for correlator estimation, device scoring, and PRU certification; integrate with DevOps for quantum workloads.
- Tools/products: Public endpoints with instance catalogs, SLAs on ε, dashboards for customers.
- Assumptions/dependencies: Stable QC backends, standardization of instances and metrics, market demand.
- Cross-platform transpilation and comparability tooling
- Sector: software tooling
- Application: Toolchains that preserve correlator properties while mapping brickwork circuits (with B, M placement) to diverse hardware topologies, enabling fair comparisons.
- Tools/workflows: Gate-set aware transpilers, fidelity-preserving layouts, equivalence testing suites.
- Assumptions/dependencies: Mature compiler infrastructure, detailed device models, continual calibration data.
- Advancements in classical algorithms mitigating sign problems
- Sector: academia/HPC, materials
- Application: New Monte Carlo or tensor-network methods inspired by (2) might reduce sign problems in broader domains (e.g., finite-density QFT, strongly correlated electrons).
- Tools/workflows: Algorithm research, benchmark sharing, cross-validation with QC data.
- Assumptions/dependencies: Significant research progress; access to large compute; iterative feedback with QC benchmarks.
Notes on general assumptions shared across applications:
- Average-case classical hardness remains valid for relevant parameter regimes (n large, d≈Θ(ℓ), ε=1/poly(n)), and ε≈10⁻³ is challenging for classical methods.
- Adequate qubit counts, connectivity matching 2D grids, and coherence for intermediate depths.
- Reliable measurement and state preparation; error mitigation or correction where needed.
- Community consensus on reporting metrics (additive error vs correlation/SNR) and standardized instance distributions.
Glossary
- Additive error: An absolute-error tolerance indicating the allowed difference between an estimate and the true value. "estimate (to some additive error ) the expectation of some simple single-qubit observable "
- Amplitude estimation: A quantum algorithm that accelerates estimation of amplitudes/expectations using phase estimation, improving sample complexity. "(If deeper circuits are available this can be improved to using amplitude estimation.)"
- Average-case quantum advantage: Quantum performance exceeding classical methods on typical random instances, not just worst-case inputs. "(an expectation value problem that appears to have average-case quantum advantage)"
- BQP-completeness: A classification indicating problems as hard as the most difficult in BQP (Bounded-Error Quantum Polynomial time). "Worst-case hardness, such as BQP-completeness, is insufficient by itself"
- Brickwork pattern: A layered, staggered arrangement of two-qubit gates across a qubit grid. "laid out in a brickwork pattern of 4 layers"
- Butterfly operator: In scrambling/OTOC studies, the localized operator whose evolution probes the spread of information. "where (called the ``butterfly'' operator) could be the Pauli operator"
- Gate complexity: The count of quantum gates needed to run an algorithm or implement a circuit. "A quantum algorithm can solve this problem with gate complexity "
- Haar-random: Drawn uniformly from the Haar measure over the unitary group; the standard notion of randomness for quantum operations. "We sample Haar-random 2-qubit gates"
- Haar-random value: The value expected if an operator behaves like a draw from the Haar distribution (i.e., a random unitary). "exhibit deviations from the Haar-random value."
- iSWAP-like gates: A family of two-qubit gates related to iSWAP, commonly used in superconducting qubit devices. "involves fixed 2-qubit gates (``iSWAP-like gates'')"
- Light-cone: The causally reachable region for an operator under finite-depth circuit evolution. "if the depth of is too low and the light-cone of does not reach "
- Maximally mixed state: The uniform (maximum-entropy) state over an n-qubit Hilbert space, with density matrix . "the maximally mixed state, $\Tr(C^{2k})/2^n$"
- Measurement operator: The operator representing the observable measured to extract outcomes after evolution. " (the ``measurement'' operator) could be the Pauli operator"
- Monte Carlo: Stochastic numerical methods that estimate quantities via random sampling. "classical algorithms based on Monte Carlo appear to encounter ``sign problems''"
- Out-of-Time-Order Correlator (OTOC): A correlator that quantifies scrambling by comparing operators in non-chronological (out-of-time) order. "where OTOC stands for Out-of-Time-Order Correlator."
- Pauli X: A single-qubit Pauli operator that flips computational basis states; a fundamental quantum operation. "Pauli operator"
- Pauli Z: A single-qubit Pauli operator that applies a relative phase to computational basis states. "Pauli operator"
- Pearson correlation: A normalized measure of linear correlation between two data sets. "equivalent to a correlation measure called Pearson correlation"
- Quantum advantage: Evidence that quantum devices perform a task faster or more accurately than classical computers. "demonstration of quantum advantage using sampling problems"
- Quantumly verifiable: Verification by another quantum computer or by physical systems (‘nature’) rather than by classical proofs. "will be quantumly verifiable in the manner described above"
- Random Circuit Sampling: Sampling bitstrings from the output distribution of random quantum circuits. "experimentally implemented the Random Circuit Sampling problem"
- Random unitary: A unitary operator behaving like a draw from a uniform distribution over the unitary group. "behaves like a random unitary"
- Signal-to-noise ratio: The ratio quantifying the strength of a signal relative to background noise. "a signal-to-noise ratio is computed"
- Time-ordered correlator: A correlator with operators ordered chronologically, used to study dynamics and responses. "well-studied time-ordered correlator "
- Unitary: An operator with , representing reversible quantum evolution. "a random -qubit unitary "
Collections
Sign up for free to add this paper to one or more collections.