FinMTEB: Finance Massive Text Embedding Benchmark

Published 16 Feb 2025 in cs.CL and cs.IR | (2502.10990v3)

Abstract: Embedding models play a crucial role in representing and retrieving information across various NLP applications. Recent advances in LLMs have further enhanced the performance of embedding models. While these models are often benchmarked on general-purpose datasets, real-world applications demand domain-specific evaluation. In this work, we introduce the Finance Massive Text Embedding Benchmark (FinMTEB), a specialized counterpart to MTEB designed for the financial domain. FinMTEB comprises 64 financial domain-specific embedding datasets across 7 tasks that cover diverse textual types in both Chinese and English, such as financial news articles, corporate annual reports, ESG reports, regulatory filings, and earnings call transcripts. We also develop a finance-adapted model, Fin-E5, using a persona-based data synthetic method to cover diverse financial embedding tasks for training. Through extensive evaluation of 15 embedding models, including Fin-E5, we show three key findings: (1) performance on general-purpose benchmarks shows limited correlation with financial domain tasks; (2) domain-adapted models consistently outperform their general-purpose counterparts; and (3) surprisingly, a simple Bag-of-Words (BoW) approach outperforms sophisticated dense embeddings in financial Semantic Textual Similarity (STS) tasks, underscoring current limitations in dense embedding techniques. Our work establishes a robust evaluation framework for financial NLP applications and provides crucial insights for developing domain-specific embedding models.

Abstract PDF Upgrade to Chat

Summary

The paper introduces FinMTEB, a benchmark with 64 datasets across seven financial NLP tasks, filling a gap in domain-specific evaluation.
It presents Fin-E5, a state-of-the-art finance-adapted embedding model fine-tuned with persona-based synthetic data for superior performance.
The study benchmarks 15 models and reveals that traditional dense embeddings underperform in financial semantic tasks, highlighting the need for domain-specific approaches.

FinMTEB: Finance Massive Text Embedding Benchmark

Introduction

The "FinMTEB: Finance Massive Text Embedding Benchmark" paper addresses the gap in existing benchmarks for text embedding models tailored to specialized domains, particularly finance. While general-purpose datasets have largely propelled the innovation in NLP applications, their applicability in the financial domain is limited due to domain-specific semantics and complex relationships inherent in financial texts. The authors introduce FinMTEB, a benchmark comprising 64 datasets across seven task types such as classification, clustering, retrieval, pair classification, reranking, summarization, and semantic textual similarity (STS) in both English and Chinese.

Figure 1: An overview of tasks and datasets used in FinMTEB. All the dataset descriptions and examples are provided in the Appendix.

Fin-E5 Model Development

The authors propose Fin-E5, a state-of-the-art finance-adapted embedding model that excels on FinMTEB. This model is fine-tuned from the e5-Mistral-7B-Instruct using a persona-based synthetic dataset, which enhances its ability to perform various financial embedding tasks. Fin-E5's design is rooted in the principle that domain-specific models vastly outperform general-purpose models in specialized tasks. The architecture draws on contemporary LLM-based embeddings fine-tuned for specific financial contexts, showing strong promise in tasks such as semantic similarity and text retrieval.

Benchmarks and Model Evaluation

The paper evaluates 15 prominent embedding models on FinMTEB, underscoring three critical insights:

Domain-Specific Advantage: Domain-specific models, including Fin-E5, significantly outperform general-purpose counterparts.
Benchmark Predictivity: Performance on general benchmarks poorly predicts success in financial tasks, highlighting the necessity of domain-specific evaluation metrics.
Dense vs Sparse Embeddings: Traditional Bag-of-Words (BoW) models unexpectedly surpass dense embeddings in financial STS tasks, suggesting limitations in current dense embedding methodologies for nuanced financial semantics.
Figure 2: Semantic similarity across all the datasets in FinMTEB benchmark.

Implementation and Training

The Fin-E5 model hinges on a persona-based data generation approach, utilizing a triplet structure of query, positive document, and negative examples to refine embedding accuracy. The training pipeline involves constructing a rich dataset from InvestLM's QA resources and synthesizing task-relevant context utilizing advanced LLMs like Qwen2.5-72B-Instruct. The model is trained using a contrastive learning paradigm, adapting the InfoNCE loss to distinguish positive contextual passages effectively amidst semantically similar distracting documents.

Experimental Results

The empirical evaluations reveal that Fin-E5 sets a new standard in domain-adapted text embeddings with rigorous stratification across seven financial tasks. The model significantly improves classification and retrieval performances, though dense embedding models like NV-Embed v2 also showcase competitive results. The finance-centric evaluation elucidates the performance uplift from domain adaptation, particularly in tasks demanding precise semantic alignment and complex numerical reasoning.

Implications and Future Directions

The introduction of FinMTEB establishes a foundational protocol for evaluating embedding performance in the financial domain, providing actionable insights for developing robust financial NLP solutions. The paper advocates for open-sourcing FinMTEB and Fin-E5, inviting the research community to leverage these resources for continued advancement in domain-focused NLP initiatives. Future work may focus on extending FinMTEB beyond English and Chinese to include multilingual datasets, thereby broadening the benchmark’s applicability to global financial markets.

Conclusion

This paper pioneers in introducing FinMTEB as a comprehensive benchmarking suite for financial text embeddings, demonstrating the pivotal role of domain adaptation in enhancing embedding models. Fin-E5 achieves state-of-the-art results on FinMTEB, advancing the dialogue around specialized text representations and their profound impact on extracting meaningful insights from financial narratives.

Markdown Report Issue