- The paper introduces ClimateQA, a tool that applies TCFD-inspired QA methods with RoBERTa to automate climate-related information extraction from corporate reports.
- It employs a mixed dataset of hand-labeled and scraped reports to fine-tune RoBERTa models, achieving up to 85.5% F1 score with notable efficiency gains.
- ClimateQA is deployed on Azure, providing sustainability analysts with a web interface to rapidly process and download key report segments.
This paper presents ClimateQA, a tool developed using NLP to automate the analysis of financial and sustainability reports for climate-related information (2011.08073). The primary goal is to help sustainability analysts efficiently identify relevant disclosures scattered across lengthy documents, reducing manual effort.
Problem:
Climate change poses significant financial risks, prompting companies to disclose climate-related information in Environmental, Social, and Governance (ESG) reports. However, these reports are often hundreds of pages long, lack standardized structure, and use varied terminology, making manual analysis time-consuming and inefficient. Current methods like keyword searches are often inadequate.
Approach:
The authors framed the task as a question-answering (QA) problem. They utilized the 14 questions recommended by the Task Force on Climate-related Financial Disclosures (TCFD) as prompts. The ClimateQA model takes a TCFD question and a sentence from a report as input and determines if the sentence answers the question.
Methodology:
- Data Collection:
- Unlabelled: 2,249 financial and sustainability reports were scraped from public sources like EDGAR and the Global Reporting Initiative database. Raw text was extracted using the Tika package. The paper mentions pre-training word embeddings on this corpus to capture financial jargon, although the final model uses a standard RoBERTa base.
- Labeled: A small set of reports previously hand-labeled by sustainability analysts using the TCFD questions was obtained.
- Dataset Creation: Positive examples were created by pairing TCFD questions with their corresponding labeled answer sentences. Negative examples were generated by pairing questions with sentences that did not answer them. This resulted in a highly imbalanced dataset, which was split into training, validation, and test sets based on company names to prevent data leakage. Stratified sampling was used to manage the imbalance, resulting in training/validation/test splits with specific numbers of positive and negative examples (e.g., 1500 positive / 15k negative for training).
- Model Selection & Training:
- The RoBERTa (Robustly Optimized BERT Pretraining Approach) architecture was chosen.
- Both RoBERTa-Base (125M parameters) and RoBERTa-Large (355M parameters) were evaluated.
- Models were fine-tuned on the labeled QA dataset.
Results:
- Model Performance: RoBERTa-Large achieved slightly higher F1 scores (Test F1: 85.5%) than RoBERTa-Base (Test F1: 82.0%). However, RoBERTa-Base was significantly faster to train (5 hours vs. 12 hours on a 12GB GPU) and required less memory. Due to the minor performance difference and significant efficiency gains, RoBERTa-Base was selected for the final tool.
- Generalization: A performance drop was observed between validation and test sets (average -9.7% F1 for RoBERTa-Base), indicating challenges in generalizing to unseen companies.
- Sector Variation: Performance varied by industry sector. The Energy sector showed the best results (Test F1: 89.8%), possibly due to more standardized reporting or boilerplate language in the training data for that sector. Materials & Buildings showed the largest drop between validation and test (-24.2%).
- Question Variation: Performance also varied significantly depending on the TCFD question. Questions about generic concepts like time frames (Question 4) performed poorly, while highly specific questions about risk management integration (Question 10) showed poor generalization. Questions about GHG emissions (Question 12) had high F1 scores despite being answered infrequently.
Practical Implementation: The ClimateQA Tool
The research resulted in a deployed tool aimed at end-users (sustainability analysts):
- Deployment: Hosted on Microsoft Azure.
- User Interface: A web application allows users to upload PDF reports.
- Processing Pipeline:
- Text extraction from PDF (using Tika).
- Text parsing and sentence splitting.
- Inference using the fine-tuned RoBERTa-Base model to identify sentences answering TCFD questions.
- 3. Results (identified sentences paired with questions) are stored in Blob Storage as a TSV file for user download.
1
2
3
4
5
6
7
8
9
10
11
12
|
User -> Web App -> Upload PDF -> Azure Blob Storage
|
V
Azure ML Pipeline Trigger
|
+---------------------------+---------------------------+-----------------------+
| 1. PDF Text Extraction | 2. Text Parsing/Splitting | 3. ClimateQA Inference|
| (Tika) | (Sentences -> TSV) | (RoBERTa-Base) |
+---------------------------+---------------------------+-----------------------+
|
V
Results (TSV) -> Azure Blob Storage -> User Download |
Future Work:
The authors plan to improve PDF text extraction, particularly for tables, potentially exploring commercial tools. They also aim to better integrate domain-specific financial LLMs and enhance the user interface with interactive visualization of results within the original documents.
In summary, the paper details the development and deployment of ClimateQA, an NLP-based tool using RoBERTa fine-tuned for question answering, to automate the extraction of climate-related information from corporate sustainability reports based on TCFD guidelines, addressing a practical need for analysts in the field.