Research

Retrospective Harness Optimization： Improving LLM Agents via Self-Preference over Trajectory Rollouts

RHO：利用过往轨迹优化LLM智能体工具链的自监督方法

Jun 4, 2026arXiv.orgSignal 70

Original source

Evolving Agents in the Dark: Retrospective Harness Optimization via Self-Preference

arXiv.org

AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-truth validation sets, yet such labeled data is difficult to acquire in practical deployment settings. To address this problem, we introduce Retrospective Harness Optimization (RHO), a self-supervised method that optimizes the agent harness using only past trajectories. Specifically, RHO selects a diverse coreset of challenging tasks from past trajectories and re-solves them in parallel. The agent analyzes these rollouts using self-validation and self-consistency, then generates candidate harness updates and selects the most effective one by its own pairwise self-preference. We evaluate RHO across three diverse domains, spanning software engineering, technical work, and knowledge work. Notably, a single optimization round improves the pass rate on SWE-Bench Pro from 59% to 78% without any external grading. Furthermore, our analysis demonstrates that RHO effectively targets prior failure modes. As a result, the optimized harness alters the agent's behavior patterns and sustains higher accuracy during long-horizon sessions.

Open source

Why this matters

Recommended because

This is worth tracking because it is a concrete research signal, not just a passing headline. The source preview points to a research result, method, evaluation, dataset, or safety finding. For builders and operators, "Retrospective Harness Optimization： Improving LLM Agents via Self-Preference over Trajectory Rollouts" can be used as a checkpoint for technical due diligence, roadmap bets, agent design, and evaluation strategy. I keep this thread indexed so future searches around AI research papers, technical methods, and applied AI systems can land on a source-linked page instead of disappearing into a fast-moving feed from arXiv.org.

Builder readout

What to take from this signal

Context

"Retrospective Harness Optimization： Improving LLM Agents via Self-Preference over Trajectory Rollouts" is archived here as a source-linked AI signal from arXiv.org. The useful part is the connection between Retrospective, Harness, Optimization, Improving, LLM and technical due diligence, roadmap bets, agent design, and evaluation strategy, which makes the item more actionable than a normal feed headline. The source context says: AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-truth validation sets, yet such labeled data is difficult to acquire in practical deployment settings. To address this problem, we introduce Retrospective Harness Optimization (RHO), a self-supervised method that optimizes the agent harness using only past trajectories. Specifically, RHO selects a diverse coreset of challenging tasks from past trajectories and re-solves them in parallel. The agent analyzes these rollouts using self-validation and self-consistency, then generates candidate harness updates and selects the most effective one by its own pairwise self-preference. We evaluate RHO across three diverse domains, spanning software engineering, technical work, and knowledge work. Notably, a single optimization round improves the pass rate on SWE-Bench Pro from 59% to 78% without any external grading. Furthermore, our analysis demonstrates that RHO effectively targets prior failure modes. As a result, the optimized harness alters the agent's behavior patterns and sustains higher accuracy during long-horizon sessions.

Builder takeaway

For an AI builder, the main takeaway is to watch how this signal changes practical decisions around technical feasibility, evaluation design, safety limits, and product primitives. It can inform what to test next, which product surface to compare, and whether the underlying workflow is ready for real users.

Source context

arXiv.org remains the authoritative source for the original claim. This page adds a stable archive URL, a short builder interpretation, and related search language so the item can be found later when the original feed has moved on.

Search angles

Retrospective Harness Optimization： Improving LLM Agents via Self-Preference over Trajectory Rollouts Research context
arXiv.org AI research
Retrospective, Harness, Optimization, Improving, LLM builder takeaway
AI research papers, technical methods, and applied AI systems

This page keeps a source preview and a stable archive URL for search discovery. The original source remains authoritative.