NovaPulse AI Weekly

Your weekly dose of AI & Tech insights
2026-04-21

Beyond Verifiable Rewards: Rubric-Based GRM for Reinforced Fine-Tuning SWE Agents

arXiv:2604.16335v1 Announce Type: new Abstract: Despite recent progress in Large Language Model (LLM) Agents for Software Engineering (SWE) tasks, end-to-end fine-tuning typically relies on verifiable terminal rewards such as whether all unit tests pass. While these binary signals reflect whether the final solution is correct, they provide little guidance for shaping intermediate behaviors during multi-step interactions, thereby limiting improvements in the overall quality of the resolution process. To address this, we introduce a rubric-based Generative Reward Model (GRM) that provides richer learning signals. The GRM is equipped with human-designed rubrics that indicate criteria for encouraging or discouraging specific behavioral patterns, and we leverage this feedback for high-quality training data collection via trajectory filtration. When used for Reinforced Fine-Tuning (RFT) on SWE Tasks, our approach outperforms terminal-score-only rejection sampling: it more effectively suppresses undesirable patterns while promoting beneficial ones, as confirmed by case analyses, and it ultimately improves final test accuracy.

Source: arXiv ML (cs.LG)

Top Stories

OpenAI helps Hyatt advance AI among colleagues

Hyatt deploys ChatGPT Enterprise across its global workforce, using GPT-5.4 and Codex to improve productivity, operations, and guest experiences.

Read more at OpenAI Blog

Anthropic takes $5B from Amazon and pledges $100B in cloud spending in return

Amazon has made another circular AI deal: It's investing another $5 billion in Anthropic. Anthropic has agreed to spend $100 billion on AWS in return.

Read more at TechCrunch AI

OpenAI ad partner now selling ChatGPT ad placements based on “prompt relevance”

Article URL: https://www.adweek.com/media/exclusive-leaked-deck-reveals-stackadapts-playbook-for-chatgpt-ads/ Comments URL: https://news.ycombinator.com/item?id=47840980 Points: 259 # Comments: 130

Read more at Hacker News Best

SetFlow: Generating Structured Sets of Representations for Multiple Instance Learning

arXiv:2604.16362v1 Announce Type: new Abstract: Data scarcity and weak supervision continue to limit the performance of machine learning models in many real-world applications, such as mammography, where Multiple Instance Learning (MIL) often offers the best formulation. While recent foundation models provide strong semantic representations out of the box, effective augmentation of such representations of MIL data remains limited, as existing methods operate at the instance level and fail to capture intra-bag dependencies. In this work, we introduce SetFlow, a generative architecture that models entire MIL bags (i.e., sets) directly in the representation space. Our approach leverages the flow matching paradigm combined with a Set Transformer-inspired design, enabling it to handle permutation-invariant inputs while capturing interactions between instances within each bag. The model is conditioned on both class labels and input scale, allowing it to generate coherent and semantically consistent sets of representations. We evaluate SetFlow on a large-scale mammography benchmark...

Read more at arXiv ML (cs.LG)

How to Ground a Korean AI Agent in Real Demographics with Synthetic Personas

Read more at Hugging Face Blog

Quick Bytes

NSA spies are reportedly using Anthropic’s Mythos, despite Pentagon feudTechCrunch AI

CEO and CFO suddenly depart AI nuclear power upstart FermiTechCrunch AI

OpenAI’s existential questionsTechCrunch AI

Never miss an issue. Join tech professionals in 5 countries.

Subscribe Free