Papers
Topics
Authors
Recent
Search
2000 character limit reached

DISC-FIN-SFT: Expert Financial LLM Tuning

Updated 26 January 2026
  • DISC-FIN-SFT is a specialized dataset designed for financial-domain LLM adaptation, featuring 246K curated Chinese instruction-response pairs across consulting, NLP, computing, and retrieval tasks.
  • It employs multi-source data aggregation, translation, rigorous annotation, and negative sampling to ensure high-quality data and reproducible expert-level performance.
  • The dataset supports MEFF-based fine-tuning with LoRA adapters, leading to significant accuracy improvements in financial consulting, NLP, and computing benchmarks.

DISC-FIN-SFT encompasses both a specific instruction-tuning dataset that forms the backbone of the DISC-FinLLM Chinese financial LLM, and—more broadly as a label—designates the engineered approach to financial-domain LLM adaptation using divided, expert-driven instruction components. The construction of DISC-FIN-SFT is grounded in large-scale, curated, and annotated prompt–response pairs to train LLMs in discrete financial reasoning domains: consulting, NLP, computing, and retrieval-augmented generation. The architecture and impact of DISC-FIN-SFT are defined by its dataset scale, content design across categories, data provenance, annotation and quality assurance practices, released examples, license and reproducibility, and documented effects on LLM performance across benchmarks (Chen et al., 2023).

1. Corpus Scope, Scale, and Design

DISC-FIN-SFT is composed of approximately 246,000 Chinese-language instruction–response pairs, corresponding to nearly 10 million tokens. Each example is annotated for one of four sub-datasets: (1) consulting, (2) financial-NLP tasks, (3) financial computing, and (4) retrieval-augmented generation. The sub-dataset statistics are summarized as:

Sub-dataset Examples
Consulting 63,000
Financial-NLP Tasks 110,000
Financial Computing 57,000
Retrieval-Augmented 20,000
Total 246,000

All prompts and outputs are in Chinese. Some English datasets (FiQA, etc.) were translated and adapted via ChatGPT-assisted pipelines (Chen et al., 2023).

2. Instruction Categories and Content

DISC-FIN-SFT is engineered to instill four orthogonal skill sets in financial LLMs:

  1. Financial Consulting: Includes 63k examples of both single- and multi-turn QA, emulating financial advisor dialogue. Sources include translated/rephrased FiQA pairs, finance term definitions, and dialogue expansions using posts auto-extracted and paraphrased from JingGuan financial forums.
  2. Financial-NLP Tasks: Encompasses 110k instructions for tasks such as sentiment analysis, named entity/event extraction, classification, reading comprehension, and headline/keyword generation. Data originates from >10 public Chinese financial NLP datasets and a machine-generated QA dataset from over 87k documents. Tasks probe model capabilities in both zero-shot and few-shot template settings.
  3. Financial Computing: Contains 57k prompts for arithmetic, statistical, and symbolic computations relevant to financial scenarios. Each is annotated with explicit API call syntax—a variant of the Toolformer approach. Tasks are seeded from exam problems and augmented via algorithmic and chain-of-thought GPT prompting.
  4. Retrieval-Augmented Generation: Provides 20k prompts requiring synthesis over external financial references. The pipeline comprises question creation, document retrieval from an internal knowledge base, and answer construction that mimics retrieval-augmented language modeling. Negative sampling (inserting irrelevant documents) is used to confer robustness.

3. Data Sources, Preprocessing, and Annotation

Raw inputs are collected from multiple avenues:

  • Public corpora (FiQA, FPB, FNSC, FR-NER, OpenKG, CCKS, SmoothNLP, Minds14, Finance-alpaca-KG, C3).
  • Crawled content (JingGuan Q&A, East Money news and reports).
  • Hand-crafted finance and math exam seeds.
  • Large-scale machine-aided generation (GPT-driven translation, QA regeneration, and chain-of-thought expansion).

Preprocessing tasks include translation, prompt-and-response template creation (>20 per dataset), paragraph splitting, and annotation for explicit tool-commands. All data is deduplicated at the example level (Chen et al., 2023).

Annotation best practices enforce:

  • Adoption of a “Chinese financial expert” persona.
  • Culturally appropriate output phrasing.
  • Both zero-shot and few-shot demonstrations in classification/generation.
  • Multi-turn dialogue generation via iterative self-chat.
  • Human review of prompt templates and outputs.
  • Negative sampling in retrieval tasks to enforce evidence selection.

4. Representative Instruction–Response Examples

Sampling from each sub-dataset illustrates typical prompt–response structures:

  • Consulting: Single-turn QA prompt such as “用户: 利率是什么?”, with expert-level, definition-driven responses.
  • NLP Task: Sentiment analysis tasks on Chinese financial news snippets with outputs “正面” or “负面,” etc.
  • Computing: Explicit tool-call embedding within answers, e.g., calculator usage: “Mike答对题目数=10\times60\%= [Calculator(10*60/100)→6] 6题。”
  • Retrieval-Augmented Generation: Evidence-grounded industry or policy analysis which lists sub-sector investment rationales, explicitly referencing provided knowledge snippets (Chen et al., 2023).

5. Integration with Model Architecture and Fine-Tuning Approach

DISC-Fin-SFT is natively integrated into the Multiple Experts Fine-tuning Framework (MEFF) underlying DISC-FinLLM. MEFF divides skill learning among LoRA adapter modules: each sub-dataset fine-tunes its own adapter atop the Baichuan-13B base LLM. At inference, the runtime system loads and activates only the relevant adapter for consulting, NLP, computing, or retrieval tasks.

This modular decomposition supports:

  • Lightweight deployment by task (§ Figure 1 in (Chen et al., 2023))
  • Explicit control over model routing and skill composition
  • Disjoint, minimal-per-task adaptation parameters

6. Evaluation Results and Benchmark Performance

Quantitative evaluation demonstrates consistent model gains across financial benchmarks:

Benchmark / Task Base Accuracy DISC-Fin-SFT (Expert) Accuracy Δ (Absolute Gain)
FinCUGE (NLP, 6 tasks) 31.0% 40.0% +9.0pp
FIN-Eval (Consulting QA) 49.4% 51.6% +2.2pp
Financial Computing (manual) 0.12 (base) 0.35 (adapter) +0.23 abs
Retrieval QA (GPT-3.5 judged) +0.15 (per metric)

Across all tasks, LoRA adapters trained on the corresponding DISC-FIN-SFT subset deliver significant and generalizable accuracy improvements, while maintaining the base model's Chinese language modeling capability (Chen et al., 2023).

7. Licensing, Reproducibility, and Availability

DISC-FinLLM, including the full DISC-FIN-SFT dataset and MEFF configuration, is released at https://github.com/FudanDISC/DISC-FinLLM. The repository includes an open-source (MIT/Apache-style) license; users should consult the git repository for definitive licensing details. The dataset and adapters enable reproducible expert-driven adaptation of large Chinese LLMs for multi-skill financial tasks.


As established, DISC-FIN-SFT represents a corpus-driven, expert-fine-tuned methodology for engineering financial-domain LLMs, with documentation of both data construction and downstream performance impacts (Chen et al., 2023). All statistics, practices, and claims are as detailed in the original publication.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DISC-FIN-SFT.