Beyond Verifiable Rewards: Rubric-Based GRM for Reinforced Fine-Tuning SWE Agents
arXiv:2604.16335v1 Announce Type: new Abstract: Despite recent progress in Large Language Model (LLM) Agents for Software Engineering (SWE) tasks, end-to-end fine-tuning typically relies on verifiable terminal rewards such as whether all unit tests pass. While these binary signals reflect whether the final solution is correct, they provide little guidance for shaping intermediate behaviors during multi-step interactions, thereby limiting improvements in the overall quality of the resolution process. To address this, we introduce a rubric-based Generative Reward Model (GRM) that provides richer learning signals. The GRM is equipped with human-designed rubrics that indicate criteria for encouraging or discouraging specific behavioral patterns, and we leverage this feedback for high-quality training data collection via trajectory filtration. When used for Reinforced Fine-Tuning (RFT) on SWE Tasks, our approach outperforms terminal-score-only rejection sampling: it more effectively suppresses undesirable patterns while promoting beneficial ones, as confirmed by case analyses, and it ultimately improves final test accuracy.
Source: arXiv ML (cs.LG)
Top Stories
OpenAI helps Hyatt advance AI among colleagues
Hyatt deploys ChatGPT Enterprise across its global workforce, using GPT-5.4 and Codex to improve productivity, operations, and guest experiences.
Anthropic takes $5B from Amazon and pledges $100B in cloud spending in return
Amazon has made another circular AI deal: It's investing another $5 billion in Anthropic. Anthropic has agreed to spend $100 billion on AWS in return.
OpenAI ad partner now selling ChatGPT ad placements based on “prompt relevance”
Article URL: https://www.adweek.com/media/exclusive-leaked-deck-reveals-stackadapts-playbook-for-chatgpt-ads/ Comments URL: https://news.ycombinator.com/item?id=47840980 Points: 259 # Comments: 130
SetFlow: Generating Structured Sets of Representations for Multiple Instance Learning
arXiv:2604.16362v1 Announce Type: new Abstract: Data scarcity and weak supervision continue to limit the performance of machine learning models in many real-world applications, such as mammography, where Multiple Instance Learning (MIL) often offers the best formulation. While recent foundation models provide strong semantic representations out of the box, effective augmentation of such representations of MIL data remains limited, as existing methods operate at the instance level and fail to capture intra-bag dependencies. In this work, we introduce SetFlow, a generative architecture that models entire MIL bags (i.e., sets) directly in the representation space. Our approach leverages the flow matching paradigm combined with a Set Transformer-inspired design, enabling it to handle permutation-invariant inputs while capturing interactions between instances within each bag. The model is conditioned on both class labels and input scale, allowing it to generate coherent and semantically consistent sets of representations. We evaluate SetFlow on a large-scale mammography benchmark...
How to Ground a Korean AI Agent in Real Demographics with Synthetic Personas
Read more at Hugging Face Blog
Quick Bytes
NSA spies are reportedly using Anthropic’s Mythos, despite Pentagon feud — TechCrunch AI
CEO and CFO suddenly depart AI nuclear power upstart Fermi — TechCrunch AI
OpenAI’s existential questions — TechCrunch AI