Creative Beam Search: LLM-as-a-Judge For Improving Response Generation

Published 30 Apr 2024 in cs.AI, cs.CL, cs.HC, and cs.LG | (2405.00099v4)

Abstract: LLMs are revolutionizing several areas, including artificial creativity. However, the process of generation in machines profoundly diverges from that observed in humans. In particular, machine generation is characterized by a lack of intentionality and an underlying creative process. We propose a method called Creative Beam Search that uses Diverse Beam Search and LLM-as-a-Judge to perform response generation and response validation. The results of a qualitative experiment show how our approach can provide better output than standard sampling techniques. We also show that the response validation step is a necessary complement to the response generation step.

Abstract PDF HTML Upgrade to Chat

References (34)

Citations (4)

View on Semantic Scholar

Summary

The paper introduces Creative Beam Search (CBS), which integrates Diverse Beam Search with an LLM-as-a-Judge mechanism to enhance the creativity of generated responses.
CBS employs a multi-step process that mimics human creativity by using self-evaluation to select outputs based on subjective preference rather than mere probability.
Experimental results demonstrate that CBS improved creative preference by 45% compared to standard sampling, highlighting its potential in advancing computational creativity.

Creative Beam Search: LLM-as-a-Judge for Improving Response Generation

Introduction

The paper, "Creative Beam Search: LLM-as-a-Judge For Improving Response Generation," presents a novel methodology designed to bridge the gap between LLMs and human-like creativity. Traditional generative models often fall short in capturing elements of human creativity due to their inherent lack of intentionality and absence of a systematic creative process. The authors propose a method named Creative Beam Search (CBS) that leverages Diverse Beam Search (DBS) alongside LLM-as-a-Judge to enhance both response generation and validation phases. Through qualitative experiments, CBS is demonstrated to produce responses that are subjectively judged to be more creative than those generated by conventional sampling techniques.

Creative Beam Search Methodology

The Creative Beam Search method is inspired by the componential model of creativity, which involves steps such as task presentation, preparation, response generation, and response validation. CBS incorporates these steps by:

Using Diverse Beam Search to simulate the response generation phase, promoting diversity among generated solutions.
Implementing an LLM-as-a-Judge mechanism to conduct a self-evaluation and select the final output based on preference rather than mere probability maximization.

Response Generation: CBS employs Diverse Beam Search, partitioning the beam budget into groups to ensure diverse candidates. This method aims to go beyond traditional beam search's tendencies to converge on a narrow set of candidates, thereby fostering creativity-oriented diversity.

Response Validation: The CBS method uses LLM-as-a-Judge for self-assessment, allowing the model to rank candidates and mitigate positional bias through balanced position calibration. The candidate with the highest cumulative preference is selected as the final output.

Figure 1: The Creative Beam Search method. Given a user prompt (step 0), DBS samples K candidate solutions from a pre-trained LLM (step 1). Then, K evaluative prompts are composed by altering the order of the candidates and are passed to the model as inputs (step 2). The candidate with the most preferences is finally outputted.

Experiments and Results

The experimental setup assessed CBS with a 7B parameter variant of Llama 2, using a RLHF-tuned version. The evaluation involved graduate students providing prompts and selecting more creative outputs between CBS-generated and standard outputs.

Setup: The pre-trained model was initialized with constraints such as a beam budget of 8, a diversity scaling factor of 10, and a top-K candidate selection process for evaluation. Candidates were generated with Diverse Beam Search and self-evaluated via LLM-as-a-Judge to determine creativity.

Findings: CBS was preferred in 45% of cases, showing a noticeable improvement over standard sampling. Despite the similarities in outputs sometimes leading to difficulty in differentiation, the distinct advantage of self-evaluation in refining response creativity was evident.

Figure 2: The interface presented to the end-users during our experiment. After inserting a prompt with a creative request, two options are shown in a random order: the CBS output and the standard sampling output. The user is then asked to indicate which is the most creative in their opinion (or if the two options are too similar to decide).

The study also revealed that self-evaluation meaningfully altered the choice amongst candidates, with CBS achieving distinct outcomes compared to the naive application of DBS.

Figure 3: Percentage of end-users' preferences comparing when CBS output is equal to DBS output and when it is not.

Discussion

While CBS represents a step towards aligning generative models with creative processes, significant challenges persist. Diverse Beam Search's reliance on Hamming diversity might result in sequences that are still overly similar. The LLM-as-a-Judge paradigm, despite its advantages, does not emulate genuine intentional evaluation processes due to the inherent nature of LLMs lacking consciousness.

Future exploration could focus on extending this framework to more sophisticated LLMs or incorporating broader sets of diverse candidates for evaluation. Furthermore, aligning CBS with models specifically fine-tuned for creativity could provide deeper insights into potential gains in the field of computational creativity.

Conclusion

Creative Beam Search offers a promising approach towards incorporating creativity-oriented mechanisms in LLM response generation, as evidenced by qualitative preferences amongst users. Although challenges remain, the potential for CBS to enhance creative collaboration with AI systems suggests fruitful avenues for future research in computational creativity and generative modeling.