Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories

Published 23 Jun 2025 in cs.SE and cs.AI | (2506.18824v2)

Abstract: LLM-based agents are increasingly employed to automate complex software engineering tasks, such as program repair and issue resolution. These agents operate by autonomously generating natural language thoughts, invoking external tools, and iteratively refining their solutions. Despite their widespread adoption, the internal decision-making processes of these agents remain largely unexplored, limiting our understanding of their operational dynamics and failure modes. In this paper, we present a large-scale empirical study of the thought-action-result trajectories of three state-of-the-art LLM-based agents: RepairAgent, AutoCodeRover, and OpenHands. We unify their interaction logs into a common format, capturing 120 trajectories and 2,822 LLM interactions focused on program repair and issue resolution. Our study combines quantitative analyses of structural properties, action patterns, and token usage with qualitative assessments of reasoning coherence and feedback integration. We identify key trajectory characteristics, such as iteration counts and token consumption, recurring action sequences, and the semantic coherence of thoughts, actions, and their results. Our findings reveal behavioral motifs and anti-patterns that distinguish successful from failed executions, providing actionable insights for improving agent design, including prompting strategies, failure diagnosis, and anti-pattern detection. We release our dataset and annotation framework to support further research on transparent and robust autonomous software engineering agents.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a novel methodology for parsing and statistically analyzing agent trajectories using metrics like token consumption and iteration count.
The paper identifies that successful trajectories feature balanced exploration and validation, while failures exhibit prolonged iterations and redundant actions.
The paper concludes that high coherence between an agent’s thought and action steps is critical for improving design and automating complex software tasks.

Understanding Software Engineering Agents

This essay provides a detailed summary of the paper titled "Understanding Software Engineering Agents: A Study of Thought-Action-Result Trajectories" (2506.18824). The paper examines the internal processes of LLM-based agents and their application in software engineering tasks, particularly focusing on the trajectories these agents create in automating tasks like program repair and issue resolution.

Introduction

The paper investigates LLM-based agents that autonomously perform complex software engineering tasks by iterating through thought-action-result cycles. Despite their effectiveness, the decision-making processes of these agents are not well understood, limiting improvements and reliability. The authors conducted a large-scale empirical study of the trajectories of three state-of-the-art agents: Repair, Auto, and Open. By analyzing over 2,822 interactions across 120 trajectories, the paper identifies key features that differentiate successful from unsuccessful executions and provides actionable insights for optimizing agent design.

Methodology

The authors present a novel methodology for analyzing agent trajectories, consisting of the following steps:

Trajectory Parsing and Representation: Trajectories are parsed into a standardized format, enabling consistent analysis across different agents.
Statistical Analysis: The authors compute metrics such as trajectory length, token consumption, and success rates to understand agent behavior patterns.
Categorizing Actions: Actions are categorized into eight distinct types, such as Explore, Locate, Search, etc., to facilitate sequence pattern mining.
Sequential Action Patterns: The authors mine frequent action sequences to identify agent decision-making patterns and operational motifs.
Semantic Relationships: Through open coding, semantic relationships between trajectory components are analyzed to assess coherence and consistency.
Figure 1: Input, Output, and Total tokens over successful trajectories and unsuccessful ones. The y-axis differs for each agent.

Results and Discussion

Trajectory-Level Properties

The findings indicate that unsuccessful trajectories often have longer iteration counts and consume more tokens, particularly in test-driven agents like Repair. Successful Auto trajectories are shorter, reflecting streamlined processes. The data visualization of token usage highlights how failed trajectories involve sustained complexity, while success may depend on input context.

Actions and Patterns of Action Sequences

The study categorizes and analyzes eight high-level action categories. The paper identifies action sequences characteristic of successful trajectories, such as balanced exploration and validation versus repetitive cycles in unsuccessful ones. Anti-patterns such as excessive repetition without progression, premature task termination, and failure to test fixes are noted.

Semantic Relationships

The paper highlights the importance of coherence between agent thoughts and actions. Successful agents demonstrate high thought-action alignment, while failures often involve misalignment and redundancy. Thought-thought relationships reveal the significance of follow-up and refinement strategies to avoid contradiction and wasted effort. Effective agents ensure results guide reasoning and subsequent actions.

Limitations and Future Research

The study notes potential subjectivity in annotation and the limited generalizability across agents and tasks. Expanding the dataset, refining classification techniques, and developing automated failure detection strategies could enhance future research in LLM agents.

Conclusion

The paper offers a thorough examination of LLM agents operating in software engineering contexts, identifying crucial behaviors and anti-patterns to guide improvements. By balancing exploration, explanation, and validation steps, agents can improve efficiency and reliability, serving as valuable tools in automating complex tasks. Further research should explore broader applications and automatic trajectory optimization techniques.