by suyoumo · Codex Skill · ★ 340
OpenClawProBench Transparent live-first benchmark harness for evaluating model capability inside the OpenClaw runtime. 102 active scenarios, 162 catalog scenarios, deterministic grading, and OpenClaw-native coverage. OpenClawProBench focuses on real OpenClaw execution with deterministic grading, structured reports, and benchmark-profile selection. The default ranking path is the profile; broader active coverage remains available through , , , and . The current worktree inventory reports active scenarios and total catalog scenarios ( incubating) via and .
| Stars | 340 |
| Forks | 26 |
| Language | Python |
| Category | Codex Skill |
| License | Apache-2.0 |
| Quality Score | 53.296/100 |
| Last Updated | 2026-04-11 |
| Created | 2025-03-02 |
| Platforms | python |
| Est. Tokens | ~104k |
These tools work well together with OpenClawProBench for enhanced workflows:
Explore other popular codex skill tools:
OpenClawProBench is OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.. It is categorized as a Codex Skill with 340 GitHub stars.
OpenClawProBench is primarily written in Python. It covers topics such as agent, benchmark, evaluation.
You can find installation instructions and usage details in the OpenClawProBench GitHub repository at github.com/suyoumo/OpenClawProBench. The project has 340 stars and 26 forks, indicating an active community.
OpenClawProBench is released under the Apache-2.0 license, making it free to use and modify according to the license terms.