AgentRel Benchmark
Run: 2026-04-16T19:38:17.705378+00:00·Judge: google/gemini-2.5-flash·Strategy: promptfoo
All sources combined
Questions
1240
Control Avg
2.22
With Skill Avg
2.69
Δ Delta
+0.47
✅ Pass
38%
+10pp vs ctrl
🟡 Partial
33%
❌ Fail
30%
Score by Category
Top Skills by Impact (Δ)
| Skill | Control | With Skill | Δ Impact | Pass% | Pass↑ | Questions |
|---|---|---|---|---|---|---|
| monad/network-config✅ | 0.75 | 4.30 | +3.55 | 75% | +75pp | 20 |
| mantle/mantle-network-primer✅ | 2.44 | 4.28 | +1.84 | 78% | +50pp | 18 |
| dev-tooling/ethers-vs-viem✅ | 2.88 | 4.00 | +1.12 | 75% | +37pp | 8 |
| mantle/mantle-risk-evaluator🟡 | 1.00 | 2.88 | +1.88 | 38% | +32pp | 16 |
| mantle/mantle-address-registry-navigator🟡 | 1.06 | 2.75 | +1.69 | 38% | +32pp | 16 |
| mantle/mantle-smart-contract-deployer🟡 | 0.92 | 3.00 | +2.08 | 25% | +25pp | 12 |
| protocols/uniswap-v3-integration🟡 | 0.88 | 2.13 | +1.25 | 25% | +25pp | 8 |
| base/l2-dev🟡 | 2.45 | 3.50 | +1.05 | 55% | +25pp | 40 |
| ethereum/defi-math🟡 | 2.33 | 3.08 | +0.75 | 58% | +25pp | 12 |
| dev-tooling/hardhat-vs-foundry🟡 | 3.25 | 3.50 | +0.25 | 50% | +25pp | 8 |