AgentRel Benchmark

Run: 2026-04-16T19:38:17.705378+00:00·Judge: google/gemini-2.5-flash·Strategy: promptfoo

All sources combined

Questions

1240

Control Avg

2.22

With Skill Avg

2.69

Δ Delta

+0.47

✅ Pass

38%

+10pp vs ctrl

🟡 Partial

33%

❌ Fail

30%

Score by Category

Top Skills by Impact (Δ)

SkillControlWith SkillΔ ImpactPass%Pass↑Questions
monad/network-config0.754.30+3.5575%+75pp20
mantle/mantle-network-primer2.444.28+1.8478%+50pp18
dev-tooling/ethers-vs-viem2.884.00+1.1275%+37pp8
mantle/mantle-risk-evaluator🟡1.002.88+1.8838%+32pp16
mantle/mantle-address-registry-navigator🟡1.062.75+1.6938%+32pp16
mantle/mantle-smart-contract-deployer🟡0.923.00+2.0825%+25pp12
protocols/uniswap-v3-integration🟡0.882.13+1.2525%+25pp8
base/l2-dev🟡2.453.50+1.0555%+25pp40
ethereum/defi-math🟡2.333.08+0.7558%+25pp12
dev-tooling/hardhat-vs-foundry🟡3.253.50+0.2550%+25pp8

🌐 By Ecosystem (Overall)

Questions

236

Control Avg

3.20

With Skill Avg

3.43

Δ Delta

+0.23

✅ Pass

53%

+6pp vs ctrl

🟡 Partial

32%

❌ Fail

15%

Score by Category

Top Skills by Impact (Δ)

SkillControlWith SkillΔ ImpactPass%Pass↑Questions
dev-tooling/ethers-vs-viem2.884.00+1.1275%+37pp8
protocols/uniswap-v3-integration🟡0.882.13+1.2525%+25pp8
ethereum/defi-math🟡2.333.08+0.7558%+25pp12
dev-tooling/hardhat-vs-foundry🟡3.253.50+0.2550%+25pp8
standards/erc-account-standards🟡2.803.70+0.9050%+20pp10
defi/amm-lending-patterns4.174.50+0.3383%+16pp12
standards/erc-signature-standards4.404.70+0.3090%+10pp10
ethereum/ethskills-concepts🟡3.313.62+0.3158%+8pp26
security/oracle-price-manipulation🟡1.671.83+0.1625%+8pp12
standards/sdk-migration-guide🟡2.753.00+0.2545%+5pp20