Papers
Topics
Authors
Recent
Search
2000 character limit reached

DeepSeek-Coder-7B-Instruct-v1.5 Overview

Updated 8 January 2026
  • DeepSeek-Coder-7B-Instruct-v1.5 is a 7-billion-parameter decoder-only transformer model designed for advanced code generation, tabular verification, and structured reasoning.
  • It leverages a two-stage instruction tuning process, including an innovative Inverse-Instruct pipeline and pseudo-feedback optimization, to achieve competitive pass@1 scores.
  • The model supports efficient local deployment with INT8 quantization, low latency inference, and interpretable execution loops as demonstrated in the RePanda pipeline.

DeepSeek-Coder-7B-Instruct-v1.5 is a 7-billion-parameter, decoder-only transformer LLM developed as part of the DeepSeek-Coder series, targeting open-source code intelligence with strong instruction-following and cross-domain reasoning capabilities. Its architecture, training regimen, and application in advanced structured reasoning pipelines such as RePanda have established it as a competitive and interpretable LLM for code generation, tabular verification, and question answering, positioned at the intersection of code intelligence and data-centric NLP tasks.

1. Model Architecture and Pretraining

DeepSeek-Coder-7B-Instruct-v1.5 is implemented as a 32-layer, decoder-only transformer, with a hidden size of 4096, feed-forward dimension of 11,008 (SwiGLU activation), and 32-headed self-attention (each of size 128). Rotary position embeddings (RoPE) are applied in pre-layer-norm style. The model employs a 32k BPE vocabulary and, at full (FP32) precision, occupies ~27.6GB, though inference typically leverages mixed-precision or INT8 quantization (Guo et al., 2024).

Pretraining uses 2 trillion tokens derived from 87 programming languages, code-centric English, and Chinese corpora. Repository-level deduplication (not file-level) ensures continuity and broad context, complemented by aggressive filtering (long line, low-alpha, oversized HTML/JSON exclusion), dependency-based file concatenation (topological sort), and n-gram decontamination against evaluation sets (HumanEval, MBPP, GSM8K, MATH).

The learning task blends standard next-token prediction with fill-in-the-middle (FIM, 50% in PSM mode). Explicitly, next-token prediction minimizes

LNTP=t=1Tlogp(xtx<t)\mathcal{L}_{\mathrm{NTP}} = -\sum_{t=1}^T \log p(x_t|x_{<t})

and FIM targets masked spans: Lspan=iSlogp(xixS)\mathcal{L}_{\mathrm{span}} = -\sum_{i\in S}\log p(x_i|x_{\setminus S}) where SS is the "hole" to be recovered (Guo et al., 2024).

A RoPE interpolation phase extends contextual length, reliably supporting 16,000 tokens.

2. Instruction Tuning and Inverse-Instruct Pipeline

Instruction tuning is accomplished in two stages. First, the model is continued from DeepSeek-LLM-7B with a 2T-token code and general language mix—no FIM, single 4K window. Second, supervised fine-tuning is performed on ≈2B tokens of Alpaca-style, human-crafted instruction–response pairs: cross-entropy loss, cosine LR schedule (peak η\eta=1e-5), and batch size ~4M tokens. Dialogues are segmented by <|EOT|> delimiters (Guo et al., 2024).

A salient enhancement is the Inverse-Instruct pipeline (Wu et al., 2024):

  • Data Augmentation: Starting from an instruction–code dataset (evol-codealpaca-v1, 110K pairs), all code blocks are extracted, cleaned, and summarized by the base model into 10 diverse instructions per code snippet.
  • Self-Evaluation: Candidate instruction-code pairs are scored by forcing the model to judge code correctness for the instruction (YES/NO), using first-token logits to produce an LM-Score.
  • Selection and Retraining: Only the maximal-score instruction for each code snippet is retained and merged with the original dataset for further fine-tuning. This pipeline proceeds as: initial 2-epoch fine-tuning on the base set, generation of augmented instructions, 1 epoch on the synthetic set, then 2 further epochs on the original set. The final model, referred to as InverseCoder-DS, achieves 79.9% (76.8% docstring-removed) pass@1 on HumanEval+ (compared to 74.4% pre-inverse) (Wu et al., 2024).

3. Structured Tabular Reasoning and RePanda Application

Within the RePanda pipeline (Chegini et al., 14 Mar 2025), DeepSeek-Coder-7B-Instruct-v1.5 is fine-tuned on PanTabFact and PanWiki, two execution-based tabular datasets derived from TabFact and WikiTableQuestions, respectively:

  • PanTabFact: 88,299 (from 92,283, after error correction) table–claim–pandas-query triples. Queries are auto-generated and corrected for logic and syntax, ensuring 95.68% validity.
  • Fine-Tuning: AdamW, η0=2×104\eta_0=2\times10^{-4}, cosine decay, batch per GPU 4, 4 epochs, minimizing negative log-likelihood of query tokens: L(θ)=t=1TlogPθ(qtq<t,input).\mathcal{L}(\theta) = -\sum_{t=1}^T \log P_\theta(q_t|q_{<t},\,\text{input}).
  • Results: 84.09% accuracy on TabFact test; zero-shot direct (50.76%) and zero-shot pandas (51.82%) lag by >30%. Error-corrected queries enable interpretable, executable verification.

On WikiFact (an OOD conversion of WikiTableQuestions): RePanda achieves 84.72% accuracy without further tuning, exceeding direct classification (74.1%) and matching the much larger DeepSeek-Chat (671B, 85.39%), demonstrating effective distillation of structured reasoning from a foundation model 100× larger (Chegini et al., 14 Mar 2025). This robustness is attributed to execution-driven fine-tuning rather than direct supervision.

For tabular QA, PanWiki (1,200 QA pairs) enables 75.1% exact-match on WikiTableQuestions; despite small sample size, this is directly competitive with state-of-the-art systems (Chegini et al., 14 Mar 2025).

4. Practical Deployment, Inference, and Error Correction

DeepSeek-Coder-7B-Instruct-v1.5 supports efficient, local deployment in production and research frameworks:

  • Memory/Latency: ~7GB VRAM (parameters; +tokenization/execution overhead), 1–2s/query on a single RTX 3090, ~400–600 tokens/sec throughput on A100 with quantization. INT4/INT8 is viable for edge.
  • Execution Loop: Model returns pandas query in JSON; execution engine processes output, returns Boolean or answer.
  • Inference Error Handling: On query failure, the model is re-queried up to 4 times with the error message, resulting in valid code ~98% of the time (Chegini et al., 14 Mar 2025).
  • Throughput: HAI-LLM pipeline yields ~20–30 req/s at 1024 tokens on 1–2 A100s (Guo et al., 2024).

These operational characteristics allow for tool-augmented, interpretable tabular pipelines, as exemplified by RePanda's structured execution backbone.

5. Preference Optimization and Pseudo-Feedback Fine-Tuning

Preference optimization techniques, especially Direct Preference Optimization (DPO) with pseudo-feedback, have been applied to DeepSeek-Coder-7B-Instruct-v1.5 for further gains on code reasoning benchmarks (Jiao et al., 2024). Pseudo feedback is generated as follows:

  • Frontier LLM Labeling: GPT-4o generates 11 programs per prompt (5,000 competition APPs problems), filtered through gold test suites, serving as SFT and as pseudo test case generators.
  • Multi-Test-Case Self-Consistency: LLMs synthesize input suites; candidate solutions' output distributions are majority-voted for pseudo-gold; solutions are rewarded by consistency.
  • Formal Objective: Preference pairs (yw,yl)(y_w, y_l) derived from consistency-based reward r(y)r(y), with DPO loss: LDPO(θ)=Ex,(yw,yl)[logσ(βlog[πθ(ywx)/πref(ywx)]βlog[πθ(ylx)/πref(ylx)])]L_{DPO}(\theta) = -E_{x, (y_w, y_l)} \big[ \log \sigma(\beta \log[\pi_\theta(y_w|x)/\pi_{ref}(y_w|x)] - \beta \log[\pi_\theta(y_l|x)/\pi_{ref}(y_l|x)]) \big]
  • Results: On LiveCodeBench, baseline pass@1 = 21.1. Successive tuning with SFT (GPT-4o) yields 22.9; PFPO with self-consistency reaches 24.6, a substantial absolute gain. Ablation studies confirm the impact of test suite width (number of pseudo cases per problem) as a crucial factor (Jiao et al., 2024).

A plausible implication is that reward signal granularity from pseudo test-cases, rather than gold correctness per se, governs improvement magnitude.

6. Strengths, Limitations, and Recommendations

Strengths

  • Demonstrates robust, instruction-driven code intelligence, matching or exceeding predecessor benchmarks in NL and code reasoning (Guo et al., 2024).
  • Open-source, with permissive Apache 2.0 licensing for commercial/R&D use (Guo et al., 2024).
  • Distillation via execution-based fine-tuning yields interpretability and OOD transfer, rivaling closed/foundation models of vastly greater scale (Chegini et al., 14 Mar 2025).
  • Pseudo-feedback preference optimization (PFPO) allows further self-improvement absent new human labels (Jiao et al., 2024).
  • Efficient deployment, INT8-quantizable and high-throughput on commodity and data-center hardware.

Limitations

  • 4K context in the v1.5 instruct variant restricts cross-file and long-context code completion relative to prior (16K) versions (Guo et al., 2024).
  • Tabular pipelines focus on single-table reasoning; no support for joins, relational tasks, or hierarchical/multi-table fact-checking.
  • All generated logic is constrained by the expressivity and limitations of the pandas API; advanced NL aggregation or extrinsic statistical tests must be implemented outside-model.
  • Error-correction is currently modular and not integrated with the model’s training process.
  • Performance on high-difficulty, few-shot, or compositional tasks remains below the leading proprietary LLMs.

Recommendations

  • For code-centric long context and cross-repo use, the 16K/FIM variant is preferred; for interleaved code/NL/math, v1.5 (4K) is optimal (Guo et al., 2024).
  • Deploy RePanda-augmented models for interpretable tabular fact verification or QA in settings valuing execution transparency (Chegini et al., 14 Mar 2025).
  • Consider PFPO for continued self-improvement: it is computationally tractable and scalable with only synthetic inputs (Jiao et al., 2024).

7. Future Directions

Potential future work identified across the DeepSeek-Coder-7B-Instruct-v1.5 pipeline includes:

  • Extension to SQL and multi-table relational fact verification, explicitly supporting database-style joins (Chegini et al., 14 Mar 2025).
  • Integrating weak and contrastive supervision to reduce dependence on high-quality LLM or gold-generated annotation.
  • Joint model–executor learning (end-to-end REPL-style feedback) to seamlessly couple generation and execution correction.
  • Evaluating cross-lingual tabular understanding via instruction-tuned multilingual datasets.
  • For preference optimization, increasing diversity and difficulty of synthetic test cases to avoid reward model saturation; process-level aggregation and selection may further enhance performance (Jiao et al., 2024).

DeepSeek-Coder-7B-Instruct-v1.5 stands as a highly influential node in the open-source LLM ecosystem, providing a versatile foundation for research in code intelligence, interpretable tabular reasoning, and methodologically transparent instruction-tuning.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DeepSeek-Coder-7B-Instruct-v1.5.