Papers
Topics
Authors
Recent
Search
2000 character limit reached

ThermoPol Dataset: Materials & Imaging

Updated 3 February 2026
  • ThermoPol Dataset is a dual benchmark resource featuring a thermoelectric materials database and an LWIR shape-from-polarization imaging dataset.
  • ThermoPol v1.0 uses an LLM-driven extraction pipeline to compile over 7,000 compounds with standardized measurements and detailed provenance.
  • ThermoPol16 provides calibrated, real-world polarimetric data for passive 3D shape reconstruction, achieving low angular errors with both model- and learning-based approaches.

The ThermoPol Dataset refers to two distinct benchmark resources foundational to recent advances in thermoelectric materials informatics and long-wave infrared (LWIR) shape-from-polarization (SfP) research. The first, ThermoPol v1.0, is a large-scale, LLM-curated structured database of thermoelectric materials and their physical properties (Itani et al., 2024). The second, ThermoPol16, is the first real-world polarimetric dataset for LWIR-based SfP, enabling reliable passive shape reconstruction for optically challenging objects (Kitazawa et al., 23 Jun 2025). Both datasets exemplify recent trends in leveraging automation (via LLMs or physics-based simulation), rigorous calibration, and open access for accelerating research in complex materials and imaging domains.

1. ThermoPol v1.0: Thermoelectric Materials Database

ThermoPol v1.0 was constructed as a comprehensive database of 7,123 unique thermoelectric compounds, encompassing bulk materials, thin films, and nanostructures. Each entry includes the chemical formula, phase/morphology, crystallographic details (crystal system, lattice constants, space group), electronic and thermal transport properties, and a source DOI. Salient properties are standardized per the following schema:

Field Type Unit
chemical_formula String
compound_type String
seebeck_coefficient_S Float μV K⁻¹
S_measurement_temperature Float K
electrical_conductivity_sigma Float S m⁻¹
thermal_conductivity_kappa Float W m⁻¹ K⁻¹
power_factor_PF Float μW m⁻¹ K⁻²
figure_of_merit_ZT Float
... ... ...

All fields are standardized: missing values are represented as NULL or NaN, and units have been converted to consensus conventions (e.g., Seebeck in μV K⁻¹). Downstream users are encouraged to apply their own imputation, as no automatic statistical completion has been performed in v1.0. An experimental flag indicates whether property values are experimentally measured or theoretically predicted.

The construction pipeline, termed GPTArticleExtractor, involved: (1) filtering ∼20,000 DOIs from Elsevier journals by keyword; (2) full-text XML download and conversion; (3) automated parsing of plain text and tables using GPT-4-based prompts to structured JSON records; (4) unit standardization and data deduplication. Example usage demonstrates programmatic access, allowing compositional and performance-based filtering, for instance, retrieval of all entries with ZT>1.0ZT > 1.0.

Key derived quantities are defined as:

  • Seebeck coefficient: S=ΔV/ΔTS = \Delta V / \Delta T
  • Power factor: PF=S2σ\mathrm{PF} = S^{2}\sigma
  • Thermoelectric figure of merit: ZT=S2σTκZT = \frac{S^{2}\sigma T}{\kappa}

Licensing is under Creative Commons Attribution 4.0, providing immediate open access for materials informatics and machine learning work (Itani et al., 2024).

2. ThermoPol16: LWIR Shape-from-Polarization Benchmark

ThermoPol16 was designed to address the challenge of transparent-object shape reconstruction under visible light by shifting to the 8–14 μm LWIR spectrum, wherein dielectrics behave as blackbody emitters and reflectors. This enables passive inference of surface normals from polarimetric measurements. The dataset facilitates both direct, model-based normal estimation and data-driven methods reliant on synthetic-to-real bridging (Kitazawa et al., 23 Jun 2025).

The acquisition system consists of a FLIR Boson uncooled microbolometer (640×512 px, 12 bit, 60 fps, 18° FOV), a Thorlabs WP50H-B wire-grid polarizer (motorized rotation), reference blackbody targets, and system calibration using blackbody pairs at varying temperatures for absolute and polarimetric response.

Sixteen objects—spanning visible-transparent (glass, acrylic), translucent (plastics), and opaque (ceramics, phenolic, stone, wood) classes—are imaged at polarizer angles of 0°, 45°, 90°, and 135°, under controlled temperature differentials. Per scene, 8 raw images (4 object, 4 reference) and full ground-truth 3D geometry (Revopoint MINI2 scanner, MeshLab ICP alignment) are provided, enabling rigorous benchmarking. The mean angular error for surface normal recovery is 13.5° via model-based and 10.3° via learning-based approaches.

3. Data Models, Calibration, and Processing

ThermoPol v1.0 applies a workflow that employs LLMs for automated literature mining and data structuring. Following metadata stripping and conversion, JSON records are post-processed for unit consistency and robust error handling (e.g., outlier flagging, deduplication). No in-situ or ex-situ uncertainty analysis is incorporated in v1.0, though future releases plan to implement machine learning-based imputation and uncertainty quantification.

ThermoPol16 employs a rigorous calibration protocol. The pixel response is modeled by:

I(x,θ)=cMM(θ)s(x)+Ioff(x)I(x, \theta) = \mathbf{c}^\top \mathbf{M}M(\theta) \mathbf{s}(x) + I_{\mathrm{off}}(x)

where s(x)=[I,Q,U]\mathbf{s}(x)=[I, Q, U]^\top is the Stokes vector, MM (sensor Mueller matrix, parameterized by kk), M(θ)M(\theta) (ideal linear polarizer at angle θ\theta), c\mathbf{c} (system gain), and IoffI_{\mathrm{off}} (offset). Blackbody-captured calibration solves for c\mathbf{c} and MM; the offset is subtracted per angle. Unique to this benchmark is the modeling of dual emission/reflection in the Stokes domain, correcting for sensor polarization dependence and stray-light systematics.

Ground-truth normals are computed via surface mesh scanning and pixel alignment, supporting error metrics in standard angular terms.

4. Data Access, Formats, and Usage

ThermoPol v1.0 is distributed in CSV, JSON, and SQLite formats. Each record, represented either as a flat table row or structured JSON, contains all relevant compositional, property, and provenance data. Users may employ standard data science libraries (e.g., pandas) for querying and filtering, including high-throughput ML workflows for thermoelectric optimization. The dataset is licensed under CC-BY-4.0 and available at http://nemad.org/datasets/thermopol_v1.0.

ThermoPol16 is organized per-object as directories containing raw polarimetric TIFFs (scene/blackbody, per angle), calibration files (gain vector, Mueller matrix, offset), ground-truth meshes and normal maps, and metadata (object/material descriptors, temperature profiles, refractive index, timestamps). Optionally, full frame-rate image bursts are available for multi-frame denoising studies. The structure enables scripted access for both model-driven and learning-based pipeline evaluation.

5. Applications, Limitations, and Future Directions

Applications:

  • ThermoPol v1.0 is immediately suited for data-driven discovery and optimization of thermoelectric materials, supporting property prediction, surrogate model building, and exploratory analysis constrained by materials chemistry and structure (Itani et al., 2024).
  • ThermoPol16 enables research on passive 3D scanning of visually elusive objects (transparent, rough, multi-albedo) relevant in manufacturing, robotics, heritage preservation, and multimodal shape recovery, through both direct and ML-based approaches (Kitazawa et al., 23 Jun 2025).

Limitations:

  • ThermoPol v1.0 currently leaves missing data as NULL with no imputation; measurement uncertainties are sparsely annotated.
  • ThermoPol16 requires a temperature differential (object-environment) for meaningful polarization contrast (DoLP0\neq 0). The capture protocol, involving polarizer rotation and blackbody insertion, imposes 7\approx 7 seconds per scene. Multi-view and more diverse emissivity/refractive index scenarios require future additions.

A plausible implication is that both datasets could catalyze methodological advances in uncertainty-aware property prediction, robust polarimetric modeling, and domain-bridging for ML systems in their respective domains. Future releases of ThermoPol v1.0 are expected to integrate automated imputation and uncertainty quantification with ensemble models, while extensions of ThermoPol16 to multi-view capture and spatially-varying radiometric complexity would further enhance its value.

6. Comparative Significance and Context

ThermoPol v1.0 and ThermoPol16 demonstrate new paradigms in dataset creation for physical sciences. The former leverages LLM-driven automation to systematically address longstanding curation bottlenecks in materials informatics, offering reproducible, granular, and traceable materials property records (Itani et al., 2024). The latter inaugurates a benchmark for real-world, high-fidelity LWIR polarimetric imaging, which was previously hindered by inadequate physical modeling and limited access to ground-truth-aligned polarimetric data (Kitazawa et al., 23 Jun 2025). Both have set new standards for reproducibility, openness, and scientific extensibility in their respective application areas.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ThermoPol Dataset.