Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sequential Quadratic Hamiltonian Method

Updated 30 January 2026
  • The Sequential Quadratic Hamiltonian Method is a gradient-free iterative scheme that maximizes high-dimensional log-likelihood functions using coordinate-wise updates and quadratic penalty regularization.
  • It decomposes optimization into one-dimensional subproblems, eliminating the need for explicit gradient or Hessian calculations and ensuring computational efficiency.
  • Empirical studies and real-data applications demonstrate that SQH outperforms nonlinear conjugate gradient methods in accuracy and speed, particularly in cure rate models.

The Sequential Quadratic Hamiltonian (SQH) Method is a gradient-free iterative optimization scheme for the maximum likelihood estimation of model parameters, particularly designed for high-dimensional, smooth log-likelihood functions. SQH augments traditional trust-region methods with a quadratic Hamiltonian penalty, facilitating coordinate-wise maximization without requiring explicit gradient or Hessian calculations. This approach has been demonstrated to achieve superior statistical performance and computational efficiency in applications such as the Box-Cox transformation cure rate model for time-to-event data with cure fractions (Bui et al., 2 May 2025).

1. Mathematical Structure and Coordinate-Wise Optimization

Let θTadRd\theta \in T_{\rm ad} \subset \mathbb{R}^d denote the vector of parameters for which it is desired to maximize a smooth log-likelihood l(θ)l(\theta). The SQH method introduces an augmented Hamiltonian function at each iteration, defined by

Hϵ(θ;θ~)=l(θ)ϵθθ~22,ϵ>0,\mathcal{H}_\epsilon(\theta; \tilde\theta) = l(\theta) - \epsilon\,\|\theta - \tilde\theta\|_2^2, \quad \epsilon > 0,

where θ~\tilde\theta is the previous iterate and 2\|\cdot\|_2 denotes the Euclidean norm. The quadratic penalty serves to (a) stabilize updates by regularizing the step size and (b) enable separability across individual coordinates of θ\theta.

The next iterate θk+1\theta^{k+1} is obtained by

θk+1=argmaxθTadHϵk(θ;θk),\theta^{k+1} = \arg\max_{\theta \in T_{\rm ad}}\, \mathcal{H}_{\epsilon_k}(\theta; \theta^k),

which decomposes into a sequence of one-dimensional optimization problems: θik+1=argmaxviTi,ad{l(θ1k+1,,θi1k+1,vi,θi+1k,,θdk)ϵk(viθik)2},i=1,,d.\theta_i^{k+1} = \arg\max_{v_i \in T_{i,\rm ad}} \left\{ l\big( \theta_1^{k+1}, \ldots, \theta_{i-1}^{k+1}, v_i, \theta_{i+1}^k, \ldots, \theta_d^k \big) - \epsilon_k (v_i - \theta_i^k)^2 \right\}, \quad \forall i = 1, \dots, d. This coordinate-wise update requires only one-dimensional searches for each component, avoiding the explicit calculation of gradients or Hessians.

2. Algorithmic Framework and Convergence Guarantees

The SQH algorithm is formally described as:

  1. Coordinate-wise Maximization: Update each θi\theta_i sequentially using one-dimensional maximization of the local augmented Hamiltonian.
  2. Sufficient-Increase Test: Evaluate the log-likelihood gain Δl=l(θnew)l(θk)\Delta l = l(\theta^{\rm new}) - l(\theta^k) and the squared step norm Δθ2=θnewθk22\Delta\theta^2 = \|\theta^{\rm new} - \theta^k\|_2^2. If Δl<ρΔθ2\Delta l < \rho\,\Delta\theta^2, increase the penalty ϵkλϵk\epsilon_k \gets \lambda\,\epsilon_k and repeat the step; else decrease penalty ϵk+1ζϵk\epsilon_{k+1} \gets \zeta\,\epsilon_k and accept the update.
  3. Stopping Criterion: Stop when Δθ2<κ\Delta\theta^2 < \kappa or upon reaching the maximum number of iterations KmaxK_{\max}.

Convergence is underpinned by two key theorems:

  • Monotonicity: l(θk+1)l(θk)ϵkθk+1θk220l(\theta^{k+1}) - l(\theta^k) \geq \epsilon_k \|\theta^{k+1} - \theta^k\|_2^2 \geq 0, ensuring non-decreasing log-likelihood.
  • Stationarity: Under mild conditions, θk+1θk20\|\theta^{k+1} - \theta^k\|_2 \to 0 and l(θk+1)l(θk)0l(\theta^{k+1}) - l(\theta^k) \to 0 as kk \to \infty. Thus, any cluster point of (θk)(\theta^k) is a stationary point of l(θ)l(\theta).

Each iteration requires only dd one-dimensional optimizations and vector norm calculations. Function evaluations can leverage standard search methods such as Golden-section or Newton-Raphson if univariate smoothness is maintained.

3. Application to Box–Cox Transformation Cure Model

The SQH method was deployed for parameter estimation in the Box-Cox transformation cure model, where the population survival function Sp(yx)S_p(y \mid x) incorporates a Box–Cox index parameter α\alpha, covariate effects through ϕ(α,x)\phi(\alpha, x), and a parametric lifetime distribution F(y)F(y) (e.g., Weibull). The observed-data log-likelihood is

l(θ)=i=1n[δilogfp(yixi)+(1δi)logSp(yixi)],l(\theta) = \sum_{i=1}^n \left[ \delta_i \log f_p(y_i \mid x_i) + (1-\delta_i) \log S_p(y_i \mid x_i) \right],

with θ=(β,γ,α)\theta = (\beta, \gamma, \alpha) spanning regression, distribution, and transformation parameters.

Initial values are constructed using method-of-moments: inverting Kaplan–Meier estimates at covariate extremes to set (β0,β1,...)(\beta_0, \beta_1, ...), matching sample moments to Weibull parameters, and grid search for α0\alpha^0.

The coordinate-wise updates are performed as specified by the generic SQH scheme. No gradient or Hessian evaluations are required, only pointwise log-likelihood computations.

4. Comparative Analysis with Nonlinear Conjugate Gradient

Extensive Monte Carlo studies compared the SQH method to the nonlinear conjugate gradient (NCG) approach, itself documented to outperform EM-based estimators in this model context. Key outcomes include:

  • Bias and Root Mean Square Error (RMSE): For parameters (β0,β1,γ1,γ2,α)(\beta_0, \beta_1, \gamma_1, \gamma_2, \alpha), SQH systematically yields bias and RMSE substantially lower than NCG, with improvements observed even when α\alpha is at the domain boundary. Performance gains amplify as sample size increases.
    • Example: With n=150,(p01,p00)=(0.40,0.20),α=0.50n=150, (p_{01},p_{00})=(0.40,0.20), \alpha=0.50, the bias and RMSE for α^\hat{\alpha} are $0.053, 0.061$ (SQH) versus $0.109, 0.158$ (NCG).
  • Computational Efficiency: Aggregate CPU runtime for 500 replications at n=200n=200 is $3.31$ seconds (SQH) versus $30.72$ seconds (NCG). Across all tested settings, SQH is one to two orders of magnitude faster.

These advantages are attributed to SQH’s coordinate-wise, gradient-free updates and the avoidance of Hessian or conjugate direction calculations.

5. Real Data Illustration and Inference

SQH was applied to the Kirkwood et al. (1996) cutaneous-melanoma dataset, which after filtering yields n=417n=417 individuals (56% censored) and a four-level nodal covariate. The estimation process, using initial guesses generated via nonparametric and moment-matching approaches, yields the following maximum likelihood estimates: β^0=1.228,β^1=0.386,γ^1=0.561,γ^2=0.376,α^=0.051\hat\beta_0 = -1.228,\,\hat\beta_1 = 0.386,\,\hat\gamma_1 = 0.561,\,\hat\gamma_2 = 0.376,\,\hat\alpha = 0.051 with bootstrap standard errors (0.115,0.024,0.022,0.024,0.008)(0.115, 0.024, 0.022, 0.024, 0.008), all smaller than NCG-based counterparts.

Estimated cure fractions across nodal groups are

{p^0(1),p^0(2),p^0(3),p^0(4)}={0.653,0.536,0.402,0.266}\{\hat p_0(1), \hat p_0(2), \hat p_0(3), \hat p_0(4)\} = \{0.653,\,0.536,\,0.402,\,0.266\}

aligning with clinical expectations. Model fit diagnostics, including randomized-quantile residual analysis and Kolmogorov–Smirnov normality testing (p=0.933p=0.933), support model adequacy.

6. Theoretical Considerations and Practical Implications

The SQH method synthesizes elements of trust-region algorithms and Hamiltonian (Pontryagin-style) augmentation, resulting in an efficient, robust approach for coordinate-wise maximization of high-dimensional smooth objectives. Its computational cost per iteration is minimal (O(d)O(d) storage), with each step stabilizing progress and preventing diverging iterates.

A plausible implication is that the gradient-free nature of SQH, together with its separable updates and adaptive penalty mechanism, may render it particularly effective for models where complex or non-analytic gradient/Hessian expressions would preclude conventional quasi-Newton or NCG algorithms.

Empirical results in both simulated and practical datasets demonstrate that SQH achieves higher statistical accuracy and computational savings relative to established NCG methods. Its adoption enables rapid, precise estimation in cure rate and related models, expanding the methodological toolkit for practitioners in survival analysis and parametric modeling (Bui et al., 2 May 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sequential Quadratic Hamiltonian Method.