Papers
Topics
Authors
Recent
Search
2000 character limit reached

Affine Normal Directions via Log-Determinant Geometry: Scalable Computation under Sparse Polynomial Structure

Published 1 Apr 2026 in math.OC, math.AG, math.DG, and math.NA | (2604.01163v1)

Abstract: Affine normal directions provide intrinsic affine-invariant descent directions derived from the geometry of level sets. Their practical use, however, has long been hindered by the need to evaluate third-order derivatives and invert tangent Hessians, which becomes computationally prohibitive in high dimensions. In this paper, we show that affine normal computation admits an exact reduction to second-order structure: the classical third-order contraction term is precisely the gradient of the log-determinant of the tangent Hessian. This identity replaces explicit third-order tensor contraction by a matrix-free formulation based on tangent linear solves, Hessian-vector products, and log-determinant gradient evaluation. Building on this reduction, we develop exact and stochastic matrix-free procedures for affine normal evaluation. For sparse polynomial objectives, the algebraic closure of derivatives further yields efficient sparse kernels for gradients, Hessian-vector products, and directional third-order contractions, leading to scalable implementations whose cost is governed by the sparsity structure of the polynomial representation. We establish end-to-end complexity bounds showing near-linear scaling with respect to the relevant sparsity scale under fixed stochastic and Krylov budgets. Numerical experiments confirm that the proposed MF-LogDet formulation reproduces the original autodifferentiation-based affine normal direction to near machine precision, delivers substantial runtime improvements in moderate and high dimensions, and exhibits empirical near-linear scaling in both dimension and sparsity. These results provide a practical computational route for affine normal evaluation and reveal a new connection between affine differential geometry, log-determinant curvature, and large-scale structured optimization.

Summary

  • The paper introduces a reduction of costly third-order contraction terms to a matrix-free, second-order structure using log-determinant identities.
  • It leverages stochastic Hutchinson trace estimation and Krylov methods to achieve near-linear complexity in high-dimensional, sparse polynomial settings.
  • Empirical results confirm near machine-precision accuracy and scalable performance, enabling practical integration of affine-invariant optimization techniques.

Scalable Affine Normal Directions via Log-Determinant Geometry in Sparse Polynomial Optimization

Overview

This paper introduces a computational framework for evaluating affine normal directions, an affine-invariant geometric construct relevant to optimization, using a log-determinant geometric representation. The main technical advance is a reduction of the classical third-order contraction term—previously the main computational bottleneck—to a second-order structure, thereby enabling scalable, matrix-free evaluation in high-dimensional and sparse polynomial optimization settings. The authors rigorously analyze the cost and approximation properties of the approach, demonstrate near machine-precision agreement with explicit autodifferentiation-based formulas, and empirically establish near-linear complexity in both ambient dimension and polynomial sparsity.

Log-Determinant Reduction of the Affine Normal

Affine normal directions arise from affine differential geometry and provide descent directions for optimization that are affine-invariant, in contrast to Euclidean gradient or Newton directions which capture only second-order local curvature. The classical representation involves third-order derivatives and costly tensor contractions, which is prohibitive in high dimensions. The central mathematical result is the identification:

fpqfpqi=ilogdetHT,f^{pq}f_{pqi} = \partial_i \log \det H_T,

where HTH_T is the tangent Hessian block, fpqf^{pq} its inverse, and fpqif_{pqi} the third derivative tensor. This log-determinant identity (see Lemma 2) enables the replacement of explicit third-order contractions with matrix-free, second-order operator evaluations—specifically, gradients of the log-determinant of HTH_T.

Implication: Affine normal computation is thus reorganized around Hessian–vector products, tangent space projections, Krylov subspace linear solves, and stochastic trace estimation, rather than explicit formation of large derivative tensors. Figure 1

Figure 1: Accuracy of MF-LogDet relative to the original AD-based affine normal computation. The normalized direction error remains around 10910^{-9}, resolving computational bottlenecks associated with third-order terms.

Matrix-Free Methods for High-Dimensional Sparse Polynomial Objectives

For polynomial objectives, all derivatives remain within the same monomial family, and the structure of sparsity can be exploited for computational efficiency. The authors develop explicit matrix-free computational kernels for:

  • Gradient and Hessian evaluation via sparse monomial contractions.
  • Efficient computation of Hessian–vector products (Prop. 3).
  • Evaluation of directional third-order contractions for required trace and log-determinant computations (Prop. 4).

This enables the affine normal direction at a point xx to be computed via:

  1. Construction of an orthonormal tangent basis TT to the level set.
  2. Assembly of the tangent Hessian operator HT=T2f(x)TH_T = T^\top \nabla^2 f(x) T.
  3. Matrix-free estimation of tlogdetHT\nabla_t \log \det H_T through stochastic Hutchinson trace estimation, requiring only Hessian–vector and third-order directional evaluations.
  4. Solution of a shifted tangent linear system via Krylov methods.

These steps can all be parallelized and executed efficiently on modern hardware. Figure 2

Figure 2: Speedup of affine normal computation across dimensions, defined by HTH_T0; substantial gains manifest once dimensionality increases beyond moderate values.

Figure 3

Figure 3: Average runtime per affine normal evaluation for AD and MF-LogDet as a function of the dimension HTH_T1. While AD is competitive in very low dimensions, its runtime grows much more rapidly.

Stochastic Trace Estimation and Complexity Analysis

The computation of the log-determinant gradient is accomplished by stochastic Hutchinson trace estimation—approximating HTH_T2 via randomized Rademacher probe averages. Each probe involves a tangent linear system solve and a third-order directional evaluation, both implemented with matrix-free kernels.

Theoretical analysis (Theorem 7) yields:

  • Overall Complexity: HTH_T3, where HTH_T4 is the probe count, HTH_T5 the Krylov iterations, HTH_T6 the number of monomials, and HTH_T7 the average support size. For bounded HTH_T8 and HTH_T9, this yields linear complexity fpqf^{pq}0 in ambient dimension.
  • Error Analysis: Stability theorems rigorously bound the propagation of errors from stochastic trace estimation and inexact Krylov solves through to the normalized affine normal direction (Corollaries 8 and 9). Figure 4

    Figure 4: Effect of the Hutchinson probe count fpqf^{pq}1 on affine normal accuracy. Increasing fpqf^{pq}2 yields a monotonic improvement in the normalized direction error, aligning with the variance bounds of the estimator.

    Figure 5

    Figure 5: Runtime of the stochastic trace approximation as a function of fpqf^{pq}3 matches the theoretical linear cost model, confirming anticipated scalability properties.

    Figure 6

    Figure 6: Runtime comparison between stochastic Hutchinson trace approximation and exact evaluation; substantial runtime reductions for the stochastic variant at moderate probe counts and high dimension.

Numerical Experiments

Extensive experiments address four principal questions:

  1. Agreement with Explicit AD Formulas: MF-LogDet achieves fpqf^{pq}4 normalized directional error against AD-based affine normal computation, confirming its exactness.
  2. Stochastic Trace Accuracy-Cost Tradeoffs: With moderate probe counts (fpqf^{pq}5–fpqf^{pq}6), normalized direction errors are well below fpqf^{pq}7 even in large-scale problems; runtime scales linearly in fpqf^{pq}8 and large savings over exact evaluation materialize at high fpqf^{pq}9 and moderate fpqif_{pqi}0.
  3. Scaling in Dimension: For polynomial sparse quartics with fpqif_{pqi}1 and bounded fpqif_{pqi}2, empirical fits give runtime exponent near unity (1.01), matching theoretical predictions.
  4. Scaling in Sparsity: At fixed dimension, scaling with structural sparsity parameter fpqif_{pqi}3 yields observed exponent fpqif_{pqi}4, robustly supporting linearity. Figure 7

Figure 7

Figure 7: Average runtime per MF-LogDet affine normal evaluation as a function of the dimension fpqif_{pqi}5; log-log fitting confirms linear scaling.

Figure 8

Figure 8: Operator counts per MF-LogDet affine normal evaluation remain essentially constant as fpqif_{pqi}6 increases, confirming operator-level stability.

Figure 9

Figure 9: Runtime of one MF-LogDet affine normal evaluation versus the sparsity scale fpqif_{pqi}7; log-log fit slope approximately fpqif_{pqi}8 supports near-linear scaling in sparsity.

Implications, Limitations, and Future Directions

The approach renders practical, for the first time, the computation of affine normal directions in high-dimensional settings, especially for structured objectives with polynomial and sparse structure. By leveraging algebraic closure of monomial derivatives and matrix-free computational primitives, the method aligns the per-iteration complexity with that of matrix-free Newton or natural gradient approaches, with an additional (controlled) stochastic trace cost.

Practical implications include:

  • Viable integration of affine-invariant optimization algorithms into large-scale, structure-exploiting applications in machine learning, computational geometry, and physical systems inference.
  • Enabling of further theoretical analysis of affine-invariant methods in high dimensions, given the new class of tractable computational primitives.

Limitations: The main regimen of scalability is for sparse polynomial objectives; extension to more general function classes or to settings with less exploitable algebraic structure remains open. Per-iteration cost may exceed that of Newton-CG if the stochastic trace budget fpqif_{pqi}9 is large or the problem is poorly conditioned.

Future Directions:

  • Development and analysis of preconditioners for tangent linear systems tailored to polynomial sparsity or block-structure.
  • Deployment of the MF-LogDet framework in large-scale machine learning pipelines (e.g., deep learning or kernel machines with polynomial kernels).
  • Extension to non-polynomial but structured objectives, such as trigonometric polynomials or functions on manifolds via coordinate charts.
  • GPU and distributed implementations to further exploit operator-level parallelism.

Conclusion

This work resolves the computational challenge of evaluating affine normal directions in high-dimensional, sparse polynomial optimization. By establishing an exact second-order reduction via log-determinant geometry and developing a matrix-free, stochastic trace-based computational framework, the authors demonstrate both theoretical validity and practical scalability, verified by detailed numerical experiments. These contributions establish new connections between affine differential geometry, log-determinant curvature, and scalable structured optimization, paving the way for practical deployment of affine-invariant algorithms in large-scale settings.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.