Controllable Abstraction in Summary Generation for Large Language Models via Prompt Engineering
Abstract: This study presents a controllable abstract summary generation method for LLMs based on prompt engineering. To address the issues of summary quality and controllability in traditional methods, we design a multi-stage prompt generation framework. This framework generates summaries with varying levels of abstraction by performing semantic analysis, topic modeling, and noise control on the input text. The experiment uses the CNN/Daily Mail dataset and provides a detailed analysis of different prompt lengths, data noise, and text types. The experimental results show that prompt length has a significant impact on the quality of generated summaries. Both very short and very long prompt tokens result in a decrease in summary quality. Data noise also negatively affects the summary generation process. As noise levels increase, the ROUGE-L score gradually decreases. Furthermore, different text types have varying effects on the model's ability to generate summaries. The model performs best when handling news texts, while its performance is worse when processing academic articles. This research provides new insights into improving summary generation using LLMs, particularly in how controlling prompt strategies and optimizing text preprocessing can enhance summary accuracy and controllability.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What is this paper about?
This paper is about making AI-written summaries easier to control. The authors show how to tell a LLM exactly what kind of summary you want—very short and high-level or more detailed—by carefully designing the instructions you give it (these instructions are called “prompts”). They also study what affects summary quality, like how long the prompt is, how noisy the input text is, and what kind of text (news, blogs, academic writing) the AI is summarizing.
What questions did the researchers try to answer?
They focus on a few simple questions:
- How can we guide an AI to write summaries with the level of detail we want (very brief vs. more detailed)?
- What’s the best way to design prompts so the AI stays accurate and clear?
- Does the length of the prompt help or hurt?
- How much do messy or noisy inputs damage the summary?
- Do some types of writing (like news or academic texts) get better summaries than others?
How did they do it?
Think of the method as a smart “prompt factory” that prepares the best instructions for the AI before it writes a summary.
Here is the basic process, in plain steps:
- Read and map the text: The system scans the input article to find important people, places, events, and how they connect. You can imagine this as drawing a “mind map” (they call it a semantic graph) of the key ideas.
- Pick the main topics: It figures out what the text is mostly about (topic modeling).
- Build a prompt that fits the goal: If the user wants a short, high-level summary, the prompt nudges the AI to be brief and abstract. If the user wants more details, the prompt pushes the AI to include specifics and context.
- Test and improve the prompt: The system tries a prompt, checks how good the summary is, and then tweaks the prompt to do better next time. This is like a “trial and reward” cycle (a simple way to explain reinforcement learning).
- Learn across tasks: The system can train on different types of summarizing tasks at once (multi-task learning), which helps it get better at adjusting summary style for different situations.
To measure how good the summaries are, they compare them to human-written summaries using:
- ROUGE-L and ROUGE-N: Do the AI summaries cover the same important parts as the human ones?
- BLEU: How much do the words and phrases match?
- TER (Translation Edit Rate): How many edits would you need to make the AI summary match the human one? Lower is better.
They ran experiments on the CNN/Daily Mail dataset, a large collection of news articles with human-written summaries.
What did they find, and why does it matter?
The key results are:
- Their method beats other systems: It got the best scores on ROUGE, BLEU, and had the lowest TER among the compared methods. In simple terms, their summaries matched human summaries more closely and needed fewer fixes.
- Prompt length really matters: Very short prompts don’t give enough guidance, and very long prompts overload the model. The “sweet spot” was around 30–40 tokens (words or word-pieces). This balance gives the AI enough direction without confusing it.
- Noise hurts performance: When the input article contains errors, extra junk, or mixed-up text, the summary quality drops steadily. Cleaner input leads to better summaries.
- Text type changes difficulty: The model did best on news articles (which are structured and focused), did okay on blogs (more casual and varied), and struggled more with academic articles (long, technical, and complex).
Why this matters:
- It shows a practical way to “dial in” the kind of summary you need using prompts, instead of retraining the whole model.
- It highlights simple rules of thumb—like keep prompts a reasonable length and clean your input text—that can noticeably improve results.
What is the bigger impact?
This research helps move AI summarizers from “generic writers” to “custom assistants” you can steer. That’s useful in many places:
- News apps can show quick, high-level summaries or more detailed ones depending on user preference.
- In law and finance, where accuracy and style matter, prompts can be tuned to keep only the most important parts while staying correct.
- In healthcare and education, different audiences (doctors, patients, students, teachers) can get summaries at the right level.
The paper also points to next steps: build systems that automatically choose the best prompt length for each text, clean up noisy inputs, and design prompts that handle tough writing (like academic papers) more reliably. With these improvements, AI summaries can become more trustworthy, flexible, and helpful in everyday life.
Collections
Sign up for free to add this paper to one or more collections.