๐ฅ๏ธ
Best Viewed on Desktop
The technical documentation and app previews are optimized for desktop viewing.
โ Back to Home๐
Evaluation
Empirical evaluation on 53 DeFi contracts. RQ1: Performance, RQ2: Accuracy, RQ3: Coverage, RQ4: GAEV effectiveness.
Dataset (Table 5)
Total contracts53
Lines of Solidity12,000+
Known vulns18 DVD + real exploits
Ground truthClaude + white-hat expert
RQ1: Performance (Table 6)
| Config | Time | Speedup | Notes |
|---|---|---|---|
| Sequential | 162.0 min | 1.0ร | Baseline |
| Parallel (28) | 23.1 min | 7.0ร | Ray orchestration |
| + LLM agents | 26.4 min | 6.1ร | +3.3 min for 6 agents |
| + GAEV | 29.8 min | 5.4ร | +3.4 min for exploit gen |
| Slither alone | 1.2 min | โ | Fastest (60% findings) |
| Mythril alone | 12.3 min | โ | 9% findings |
| Drain modules | 8.5 min | โ | 9.7% findings |
RQ2: Accuracy (Table 7)
| System | Precision | Recall | F1 | FP Rate |
|---|---|---|---|---|
| Slither only | 0.38 | 0.72 | 0.50 | 62% |
| Mythril only | 0.45 | 0.61 | 0.52 | 55% |
| All tools (union) | 0.43 | 0.76 | 0.55 | 57% |
| Zentinel-audit v4.3 | 0.89 | 0.83 | 0.86 | 26% |
| Improvement | +107% | +9% | +56% | -54% |
RQ3: Findings by Category (Table 10 โ 669 total)
| Category | Crit | High | Med | Low | Total |
|---|---|---|---|---|---|
| Reentrancy | 3 | 8 | 12 | 5 | 28 |
| Flash Loan | 5 | 12 | 8 | 2 | 27 |
| Access Control | 4 | 18 | 35 | 22 | 79 |
| Price Oracle | 4 | 15 | 18 | 8 | 45 |
| Integer Issues | 1 | 8 | 42 | 65 | 116 |
| Logic Errors | 2 | 22 | 48 | 35 | 107 |
| Gas/DoS | 0 | 5 | 45 | 78 | 128 |
| ZK/Bridge/L2 | 2 | 8 | 15 | 10 | 35 |
| Other | 0 | 8 | 42 | 54 | 104 |
| Total | 21 | 104 | 265 | 279 | 669 |
Limitations (ยง11)
Internal
Single benchmark (DVD v4)
LLM non-determinism (low temp mitigates)
Ground truth: Claude + author โ no independent rater
External
Solidity focus only
GPT-4o dependency
Tool versions: specific tested
Practical
LLM cost: $0.02โ0.05/contract
API dependency required
40% exploits need manual adjustment