TDAD: Test-Driven Agentic Development – Reducing Regressions in AI Coding Agents

March 19, 20261 min read

AI coding agents promise to automate software fixes, but they frequently introduce regressions—breaking tests that previously passed. Benchmarks like SWE-bench emphasize resolution rates, overlooking this critical failure mode. Without targeted mitigation, agents undermine codebase stability, forcing developers to spend more time verifying changes than benefiting from automation.

This paper introduces TDAD (Test-Driven Agentic Development), an open-source tool and benchmark that uses abstract-syntax-tree (AST) based code-test graphs with weighted impact analysis. For a proposed code change, TDAD constructs a graph linking modified code to potentially affected tests, prioritizing those with highest impact scores. The agent then selectively verifies these tests using a GraphRAG workflow.

Evaluated on SWE-bench Verified:

Baseline regressions: 6.08% at test level.
TDAD: 1.82% (70% reduction).
Resolution rate: 24% → 32%.

Smaller models (Qwen3-Coder 30B) benefit most from contextual test guidance over procedural TDD prompts, which actually increased regressions to 9.94%. An auto-improvement loop pushed resolution to 60% on a subset with zero regressions.

For builders integrating agents into CI/CD pipelines, TDAD addresses a core reliability gap. Instead of full test suite runs post-agent edit (slow and wasteful), run targeted verification on high-impact tests. This accelerates iteration while maintaining coverage. Graph-based prioritization scales to large repos, and the open-source implementation (https://github.com/pepealonso95/TDAD) integrates via simple Python APIs.

Builder takeaway:
After agent-generated PRs, pipe diffs into TDAD's graph analyzer to select <20% of tests for regression checks—saving 80% compute. Experiment with GraphRAG for your repo's test-code coupling.

Source: Pepe Alonso — ArXiv cs.SE, March 2026

Get Updates

New posts on systems thinking, AI, and building things. No spam, unsubscribe anytime.

More in Research

Back to Research

ai research arxiv