Papers
Topics
Authors
Recent
Search
2000 character limit reached

MatchBench: Feature Matcher Benchmark

Updated 29 January 2026
  • The paper introduces MatchBench, a benchmark that comprehensively evaluates feature matchers based on matching ability, correspondence sufficiency, and efficiency.
  • It reorganizes popular datasets like TUM, KITTI, and Strecha, covering both short-baseline and wide-baseline scenarios to mimic real-world conditions.
  • It employs a rigorous two-view pose estimation pipeline with metrics such as rotational and translational error to objectively assess matcher performance.

MatchBench is a benchmark designed to provide the first uniform, comprehensive evaluation of feature matchers in computer vision. Unlike previous benchmarks that focused solely on individual aspects such as feature detectors or descriptors, MatchBench directly assesses feature matchers—algorithms that output correspondences between image pairs—which are foundational for high-level applications including Structure-from-Motion (SfM) and Visual SLAM. The benchmark evaluates matchers along three primary axes: matching ability (geometric correctness), correspondence sufficiency (number of inlier correspondences), and efficiency (processing runtime). It encompasses diverse scenario types, supporting both short-baseline (SLAM/video) and wide-baseline (SfM) image pairs (Bian et al., 2018).

1. Dataset Organization and Scene Coverage

MatchBench repurposes and reorganizes sequences from established public datasets to ensure comprehensive coverage of various scene types:

  • TUM RGB-D (indoor, office settings)
    • 01-office: indoor, textured, short baseline.
    • 02-teddy: indoor, non-planar, short baseline.
    • 03-large-cabinet: indoor, weak-texture, short baseline.
  • KITTI Odometry (urban outdoor)
    • 04-kitti: street view, high resolution, short baseline.
  • Strecha SfM (urban buildings)
    • 05-castle: outdoor, wide baseline.
  • Subsampled wide-baseline TUM sequences
    • 06-office-wide, 07-teddy-wide, 08-large-cabinet-wide: increased viewpoint changes, up to 5 seconds apart.

Each sequence is characterized by the number of images, resolution, total image pairs, and scene attributes (e.g., planarity, texture richness). Short-baseline portions mimic video/odometry scenarios; wide-baseline portions reflect challenging SfM use cases.

2. Evaluation Metrics and Pose Estimation Pipeline

MatchBench employs a two-view pose estimation framework to objectively assess matching quality. The evaluation proceeds as follows:

  • Essential Matrix Estimation: Given correspondences CC and camera intrinsics KK:

Efive_point(C,K)or{Feight_point(C) E=K1FKE \gets \mathrm{five\_point}(C,K) \quad \text{or} \quad \begin{cases} F \gets \mathrm{eight\_point}(C) \ E = K^{-1} F K \end{cases}

  • Pose Decomposition: Relative pose T=(R,t)T = (R, t) is extracted from EE via SVD-based decomposition.
  • Pose Error Calculation:

    • Rotational error:

    er=(RgtR)e_r = \angle ( R_{gt}^\top R ) - Translational error:

    et=arccos(tgtttgtt)e_t = \arccos \left( \frac{ t_{gt}^\top t }{ \|t_{gt}\| \|t\| } \right ) - Combined error:

    e=max(er,et)e = \max(e_r,\, e_t ) - An image pair is a "correct match" if e<τe < \tau (τ\tau varies from 1° to 10°).

  • Matching Ability (Success Ratio & AUC of SP Curve):

SR(τ)={(i,j):eij<τ}total pairs\mathrm{SR}(\tau) = \frac{ |\{(i,j): e_{ij}<\tau\}| }{\text{total pairs}}

AUC=1TτTSR(τ),T={1°,,10°}\mathrm{AUC} = \frac{1}{|\mathcal{T}|} \sum_{\tau\in\mathcal{T}} \mathrm{SR}(\tau), \quad \mathcal{T} = \{1°, \ldots, 10°\}

  • Correspondence Sufficiency (AP Bar):

AP(τ)=1Mτ(i,j)MτCij\mathrm{AP}(\tau) = \frac{1}{|\mathcal{M}_\tau|} \sum_{(i,j)\in\mathcal{M}_\tau} |C_{ij}|

with τ=5°\tau=5° for practical reporting.

  • Efficiency: Mean runtime (CPU and/or GPU) including detection, matching, and geometric verification (e.g., RANSAC).

3. Experimental Protocol and Workflow

The evaluation protocol precisely mirrors realistic application settings. For short baselines, each video is fragmented (e.g., TUM at k=15k=15 frames, KITTI at k=5k=5) and frames 2...kk are matched against frame 1 of each segment. Wide-baseline protocols involve matching all pairs in the Strecha sequence and infrequently paired (every ≈5 seconds) frames in TUM.

Keypoint and Descriptor Extraction: Each matcher uses its canonical detector-descriptor pipeline (e.g., SIFT, SURF, ORB, BRISK, KAZE, AKAZE, DLCO, FREAK, BinBoost, LATCH, DAISY, ASIFT for CODE/RepMatch).

Nearest-Neighbor Matching: For floating-point descriptors, FLANN with Euclidean distance is used; for binary descriptors, brute-force matching with Hamming distance. Ambiguous matches are filtered using the ratio test (threshold 0.8).

Correspondence Selection: Sparse matchers (e.g., SIFT, DAISY) use OpenCV’s five-point algorithm with RANSAC for geometric verification. Rich matchers (CODE, RepMatch, GMS) employ their own, typically more sophisticated, pose estimators.

4. Evaluated Algorithms

MatchBench covers 16 distinct feature matching systems, summarized in the following table:

Name Key Detector/Descriptor Notable Property / Addition
SIFT DoG + 128D gradient Ratio test + FLANN + RANSAC
SURF Fast Hessian + 64D descriptor
ORB FAST+Harris + BRIEF Binary; high speed
BRISK AGAST + binary descriptor Memory efficient
KAZE Nonlinear scale space + M-SURF Robust to varying image structure
AKAZE Fast KAZE approx. Compact, fast
DLCO CNN-learned local feature Deep learning-based descriptor
FREAK Retina-inspired binary Biomimetic pattern
BinBoost Boosted binary code Learning-based binary
LATCH Patch-triplet descriptor Learning-based
DAISY Dense gradient-based Suited for dense matching
KVLD Virtual Line + semi-local check Extra geometric/photometric verification
GAIM Affine simulation + SURF Simulates view changes
CODE ASIFT + global optimization Rich matches, high cost
RepMatch Geometry-aware extension of CODE Suited for repetitive structures
GMS ORB + grid-based filtering Fast, effective in real time

5. Quantitative Results and Analysis

Matching Ability: On short-baseline tasks, GMS outperforms all other methods (SP AUC ≈0.51/0.61/0.25/0.96 on Seqs 01–04), followed by DLCO and KAZE. For wide baselines, RepMatch leads (AUC ≈0.54–0.77–0.43–0.47), followed by CODE and GMS. Sparse, classical descriptors (SIFT, SURF, ORB) underperform in wide-baseline, particularly in low-texture or geometrically complex scenes.

Correspondence Sufficiency: Rich matchers (e.g., CODE, RepMatch) produce >1,000 inliers, GMS achieves ~100–300, while classical matchers supply <200 inliers.

Efficiency: ORB and GMS demonstrate high efficiency (ORB: ≈48 ms/pair, GMS: ≈46 ms/pair on CPU/GPU). High-performing rich matchers are costly (RepMatch: ≈10,780 ms/pair for selection; CODE: ≈1,365 ms on GPU + 3,080 ms selection).

Scene Dependence: All matchers achieve high AUC (>0.87) on high-resolution, well-textured street scenes (Seq 04). Scene complexity and texture scarcity (e.g., indoor, non-planar) introduce larger performance disparities and highlight the strengths of global or learning-based methods.

Methodological Trade-offs:

  • KVLD and GAIM introduce geometric/photometric checks—offering modest benefits with considerable computational expense.
  • CODE and RepMatch perform global optimization for high robustness at the cost of speed.
  • GMS implements grid-based motion statistics, providing a favorable balance between speed and robustness.

6. Practical Guidelines

  • Real-time SLAM/Visual Odometry: ORB combined with GMS is recommended for short-baseline use cases, delivering robust matching at ≈45 ms/pair and sufficient inliers.
  • Offline Wide-baseline SfM: For maximal matching ability and correspondence sufficiency, RepMatch or CODE is optimal if runtime is not constrained; GMS provides efficient and adequate performance when <1 s/pair is required.
  • Memory/Compute-limited Platforms: Binary features such as ORB, BRISK, or AKAZE used with the ratio test yield efficient correspondences; GMS can be added for improved inlier selection.
  • Low-texture/Non-planar Scenes: Global optimization methods (RepMatch, CODE) or deep/local enhancements (DLCO, KAZE) are advantageous.

7. Open Challenges and Future Directions

Current benchmarking via pose-based verification does not address dense, per-pixel correspondence evaluation; thus, establishing high-precision, dense ground truth remains an open issue. Evaluation under severe illumination and appearance changes, such as day-night or weather variation, is lacking. There is a need for extension toward larger-scale, multi-camera datasets (e.g., Internet photo collections) with reliable structural ground truth. Reducing the computational bottleneck in RANSAC-style geometric verification, particularly via GPU acceleration, and developing end-to-end learned systems that unify detection, description, and matching, are proposed as key avenues for closing the gap between accuracy and speed in feature matching (Bian et al., 2018).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MatchBench.