Papers
Topics
Authors
Recent
Search
2000 character limit reached

ExoMiner++ 2.0: Advanced Exoplanet Vetting

Updated 28 January 2026
  • ExoMiner++ 2.0 is an advanced deep learning framework that automates the classification and vetting of exoplanet transit candidates from TESS 2-min and FFI data.
  • It employs a multi-branch convolutional architecture to process diverse diagnostic inputs and integrates large-scale, multi-source training for robust discrimination between genuine planetary signals and false positives.
  • The framework generates high-precision, ranked catalogs that streamline follow-up target selection and support comprehensive statistical analyses of exoplanet populations.

ExoMiner++ 2.0 is an enhanced deep learning framework for automated classification and vetting of exoplanet transit candidates, specifically applied to TESS (Transiting Exoplanet Survey Satellite) 2-minute cadence and Full-Frame Image (FFI) data. Building on the foundational ExoMiner and ExoMiner++ architectures, version 2.0 introduces expanded input diagnostics, refined domain adaptation mechanisms, and large-scale multi-source training, yielding robust discrimination between planetary signals, astrophysical false positives, and instrumental artifacts across multiple observation cadences. It produces public catalogs of scored and ranked transiting candidate events, optimized for follow-up target selection and population analyses (Valizadegan et al., 13 Feb 2025, Martinho et al., 21 Jan 2026).

1. Model Architecture and Diagnostic Inputs

ExoMiner++ 2.0 implements a multi-branch convolutional neural network that ingests diverse diagnostic representations produced by the TESS Data Validation (DV) pipeline. The architecture includes parallel processing pathways (“branches”) for specialized input types:

  • Transit-view flux branch: Processes five distinct flux views (primary, secondary, odd-even, and unfolded variability), formatted as XfluxR5×2×31X_{\rm flux}\in\mathbb{R}^{5\times 2\times 31} (number of views × [median, σ] × bins).
  • Difference-image branch: Operates on 5×55×555\times 55\times 55 pixel cutouts per TCE, encoding per-pixel difference image flux, neighbors’ magnitude-ratio map, and per-pixel SNR image. These highlight spatial photocenter shifts indicative of blends or contaminating binaries.
  • Periodogram branch: Ingests binned amplitude-frequency representations of the PDCSAP flux and a model periodogram, XR2×64X\in\mathbb{R}^{2\times 64}.
  • Full-orbit and trend branches: Each processes detrended, phase-folded or unfolded light curves with and without stellar variability, XR2×301X\in\mathbb{R}^{2\times 301} or XR1×301X\in\mathbb{R}^{1\times 301}.
  • Centroid motion branch: XR2×31X\in\mathbb{R}^{2\times 31} assesses motion-induced false positives.
  • Momentum-dump flag branch: Incorporates spacecraft attitude control event diagnostics.
  • Scalar branches: Input vectors of DV statistics (e.g., MES, SES, robust statistics, χ2\chi^2) and stellar parameters (e.g., TeffT_{\mathrm{eff}}, logg\log g, [Fe/H], RR_{\star}), totaling ≈40 scalars.

Each convolutional branch employs repeated blocks: strided or non-strided 1D/2D convolutions, batch normalization, parametric ReLU, and residual skip connections. Outputs are reduced by global average pooling and concatenated. The merged latent encoding passes through multiple fully connected (FC) layers (with LayerNorm and dropout), culminating in a single sigmoid unit producing a class probability.

Data preprocessing aligns all branch inputs to standardized shapes and statistics (mean subtraction, division by training-set standard deviation), including neighbor-star encoding (a local 11×1111\times11 neighborhood with min–max normalized magnitude ratios, clipped to [0,5][0,5]) (Valizadegan et al., 13 Feb 2025, Martinho et al., 21 Jan 2026).

2. Multi-source Training and Domain Adaptation

ExoMiner++ 2.0 is trained on a combined dataset spanning multiple sources and observation cadences:

  • Kepler (KOI) and TESS 2-min TCEs with highly reliable planet/false positive labels (CP, KP, EB, FP, NTP), extracted from ExoFOP, Prša EB catalog, and TESS-ExoClass.
  • TESS FFI TCEs (222,054 events as of the study): labels leveraged where available, with remaining events marked as unlabeled (UNK).
  • Examples per subclass in the combined training sample are tabulated in each publication (see Table 2 in (Martinho et al., 21 Jan 2026)).

The training protocol uses cross-validation stratified by target to avoid contamination between training and test folds, with greedy assignment to balance the number of planets across folds. Each fold consists of 10 independently initialized models (ensemble). Binary cross-entropy is the loss function: L=1Ni=1N[yilny^i+(1yi)ln(1y^i)]L = -\frac{1}{N}\sum_{i=1}^N\bigl[y_i\ln\hat{y}_i + (1-y_i)\ln(1-\hat{y}_i)\bigr] Kepler and TESS examples are weighted equally; no further domain-adversarial reweighting is required. Optimization employs Adam with a learning rate of 4.18×1054.18\times10^{-5}, dropout =0.0215=0.0215 in all FC layers, and early stopping on validation PR AUC. The model’s input/output adaptations, e.g., merging 20-point unfolded flux for 30-min cadence, and upscaling difference-image cutouts (from 11×1111\times11 to 55×5555\times55), enable generalization to FFI and various TESS cadences (Valizadegan et al., 13 Feb 2025, Martinho et al., 21 Jan 2026).

3. Performance and Evaluation Metrics

Performance is evaluated on held-out cross-validation folds via standard classification and ranking metrics:

Accuracy=TP+TNTP+FP+TN+FN Precision=TPTP+FP Recall=TPTP+FN F1=2Precision×RecallPrecision+Recall\begin{align*} \text{Accuracy} &= \frac{TP+TN}{TP+FP+TN+FN} \ \text{Precision} &= \frac{TP}{TP+FP} \ \text{Recall} &= \frac{TP}{TP+FN} \ F_1 &= 2\,\frac{\text{Precision}\times\text{Recall}}{\text{Precision}+\text{Recall}} \end{align*}

Additional metrics include PR AUC (area under the precision-recall curve), ROC AUC, precision at 95% recall, and recall at 95% precision.

Key results on TESS 2-min (from (Valizadegan et al., 13 Feb 2025), Table 7):

Metric Value
Precision 0.933
Recall 0.951
PR AUC 0.976
ROC AUC 0.998
Accuracy 0.992

Subclass recalls (2-min, at $0.5$ threshold):

Subclass Recall
KP 0.957
CP 0.944
BD 0.562
EB 0.997
FP 0.896
NTP 0.999

On combined multi-source (CV, from (Martinho et al., 21 Jan 2026), Table 3/4):

Model Precision Recall PR AUC ROC AUC Accuracy
ExoMiner++ 0.913±0.027 0.912±0.026 0.962±0.012 0.996±0.002 0.990±0.002
ExoMiner++2.0 0.922±0.019 0.865±0.019 0.958±0.006 0.995±0.002 0.988±0.002

ExoMiner++ 2.0 achieves similar or improved subclass recalls for KP, EB, and FP classes, with more conservative scoring in crowded fields—affecting overall recall but aiding false positive suppression. On FFI-only sets, ExoMiner++ 2.0 produces precision of 0.917 and recall of 0.854 at threshold 0.5, with PR AUC 0.952 and accuracy 0.952 (Martinho et al., 21 Jan 2026).

4. Vetting Catalogs and Data Products

The ExoMiner++ 2.0 framework has been used to compile large-scale vetting catalogs for both 2-min and FFI TESS TCEs:

  • Among 204,729 SPOC 2-min TCEs, 147,567 unlabeled events were scored by an ensemble of 10 models. At score >0.5>0.5, 7,330 were classified as Planet Candidates (PC) and 140,237 as False Positives (FP).
  • These PCs correspond to 1,868 existing TOIs, 69 prior CTOIs, and 50 newly proposed CTOIs (≥ 3 transits, unanimous PC classification across models) (Valizadegan et al., 13 Feb 2025).
  • For TESS FFI data, the 5-fold ensemble assigned scores to ≈151,600 unlabeled TCEs, with 2,831 candidates above 0.5, 926 above 0.9, and 447 above 0.95 (Martinho et al., 21 Jan 2026).

Catalogs include: TIC ID, TCE ephemeris, DV links, MES/SNR, TESS magnitude, RUWE, TOI cross-matching, and ensemble score statistics. This uniform, automated vetting reduces the manual burden on follow-up teams and enables systematic statistical studies.

5. Scientific Impact and Applications

ExoMiner++ 2.0 sharply reduces the size and contamination of the plausible candidate list. For example, in the 2-min catalog, the set of planet candidates was reduced from 2,506 (ExoFOP) to 1,797 via more aggressive FP downgrading—thereby removing approximately 30–40% of lower-reliability events. The ranking quality is high: for the top 1,000 PCs, precision is 0.99, and for the top 3,000, \approx0.977. All of the top 600 PCs are confirmed planets. This enables focused allocation of limited follow-up resources (Valizadegan et al., 13 Feb 2025).

Applying the model to FFI data (30-min cadence), ExoMiner++ 2.0 shows robust generalization, allowing for the discovery and prioritization of exoplanet candidates among millions of targets—previously infeasible with manual vetting. The model's more conservative scoring in contaminated (crowded or blended) fields addresses key failure modes of previous automated vetting. Catalogs derived from these applications form the basis for ongoing and future TESS, K2/K2+, and population studies (Martinho et al., 21 Jan 2026).

6. Limitations and Future Directions

Primary limitations arise from severe stellar blending (CROWDSAP 0.8\lesssim 0.8), low SNR events, and high-density stellar fields, where astrometric uncertainty degrades difference-image reliability. Potential enhancements include:

A plausible implication is that ExoMiner++ 2.0's generalizable representation learning and robust multi-source training regime set a precedent for future automated vetting pipelines in space-based photometric surveys, supporting both rapid candidate discovery and comprehensive statistical exoplanet population analyses.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ExoMiner++ 2.0.