HyGenar: An LLM-Driven Hybrid Genetic Algorithm for Few-Shot Grammar Generation

Published 22 May 2025 in cs.AI and cs.PL | (2505.16978v2)

Abstract: Grammar plays a critical role in natural language processing and text/code generation by enabling the definition of syntax, the creation of parsers, and guiding structured outputs. Although LLMs demonstrate impressive capabilities across domains, their ability to infer and generate grammars has not yet been thoroughly explored. In this paper, we aim to study and improve the ability of LLMs for few-shot grammar generation, where grammars are inferred from sets of a small number of positive and negative examples and generated in Backus-Naur Form. To explore this, we introduced a novel dataset comprising 540 structured grammar generation challenges, devised 6 metrics, and evaluated 8 various LLMs against it. Our findings reveal that existing LLMs perform sub-optimally in grammar generation. To address this, we propose an LLM-driven hybrid genetic algorithm, namely HyGenar, to optimize grammar generation. HyGenar achieves substantial improvements in both the syntactic and semantic correctness of generated grammars across LLMs.

Abstract PDF Upgrade to Chat

Summary

HyGenar: An LLM-Driven Hybrid Genetic Algorithm for Few-Shot Grammar Generation

The paper presents a novel approach to optimizing grammar generation using LLMs, specifically focusing on few-shot grammar generation in Backus-Naur Form (BNF). In the context of grammar inference, the ability to infer grammars from a minimal set of examples—three positive and three negative—is explored. This approach seeks to highlight large language models' potential in syntactic and semantic correctness of grammar generation, with implications for broader natural language processing (NLP) and software engineering applications.

Objectives and Dataset

The primary aim is to assess and enhance the ability of LLMs to generate grammars based on limited data. The authors constructed a dataset comprising 540 grammar generation challenges, each with precisely three positive and three negative examples. This dataset serves as a benchmark to evaluate eight different LLMs' performance, specifically in few-shot scenarios.

Methodology

The authors introduce HyGenar, a hybrid algorithm combining the capabilities of LLMs and genetic algorithms. HyGenar adapts traditional genetic algorithm operators, such as crossover and mutation, with LLM-driven initializations and mutations. The methodology includes:

Fitness Evaluation: Assesses syntactic and semantic correctness to provide an adaptation mechanism within the genetic algorithm framework.
Selection and Crossover: Utilizes LLMs to generate candidate solutions that undergo evolution, typical in genetic algorithms.
Mutation: Employs both LLM-driven heuristics and local grammar transformations to improve candidate solutions' quality iteratively.

Evaluation Metrics

To comprehensively evaluate grammar generation quality, six metrics were designed:
- Syntax Correctness (SX): Measures how well generated grammars adhere to valid BNF syntax.
- Semantic Correctness (SE): Assesses whether grammars correctly accept positive examples and reject negative examples.
- Diff, OF, OG, and TU: Evaluate over-fitting, over-generalization, and utility of grammatical structures in parsing positive examples.

Findings

The study found that existing LLMs perform sub-optimally in few-shot grammar generation. However, HyGenar significantly improves both syntactic and semantic correctness across evaluated models. Most notably, syntax correctness increased by an average of 13.88% and semantic correctness by 16.5% compared to baseline approaches. These improvements were achieved without inducing significant over-fitting, as shown by the consistent Diff and OF metrics in the results.

Implications and Future Directions

Practically, enhancing LLMs' grammar generation capabilities has substantial implications for NLP systems in automating complex parsing tasks. Theoretically, this hybrid approach indicates a promising direction for integrating heuristic algorithms with machine learning models, potentially applicable to various problem domains within AI. Future developments may focus on expanding this framework to support a broader range of formal grammars and increasing robustness to datasets with larger examples sets.

This paper contributes to understanding LLM potential in syntax-directed generation tasks and introduces a novel hybrid approach with significant practical relevance in automating grammar inference tasks.