by suyoumo · Codex Skill · ★ 688
ClawProBench Transparent live-first benchmark harness for evaluating model capability inside the OpenClaw runtime. 102 active scenarios, 162 catalog scenarios, deterministic grading, and OpenClaw-native coverage. ClawProBench focuses on real OpenClaw execution with deterministic grading, structured reports, and benchmark-profile selection. The default ranking path is the profile; broader active coverage remains available through , , , and . The current worktree inventory reports active scenarios and total catalog scenarios ( incubating) via and .
| Stars | 688 |
| Forks | 50 |
| Language | Python |
| Category | Codex Skill |
| License | Apache-2.0 |
| Quality Score | 53.296/100 |
| Last Updated | 2026-05-19 |
| Created | 2025-03-02 |
| Platforms | python |
| Est. Tokens | ~209k |
Explore other popular codex skill tools:
ClawProBench is ClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.. It is categorized as a Codex Skill with 688 GitHub stars.
ClawProBench is primarily written in Python. It covers topics such as agent, benchmark, evaluation.
You can find installation instructions and usage details in the ClawProBench GitHub repository at github.com/suyoumo/ClawProBench. The project has 688 stars and 50 forks, indicating an active community.
ClawProBench is released under the Apache-2.0 license, making it free to use and modify according to the license terms.