OpenClawProBench

About OpenClawProBench

OpenClawProBench Transparent live-first benchmark harness for evaluating model capability inside the OpenClaw runtime. 102 active scenarios, 162 catalog scenarios, deterministic grading, and OpenClaw-native coverage. OpenClawProBench focuses on real OpenClaw execution with deterministic grading, structured reports, and benchmark-profile selection. The default ranking path is the profile; broader active coverage remains available through , , , and . The current worktree inventory reports active scenarios and total catalog scenarios ( incubating) via and .

agent benchmark evaluation harness leaderboard llm openclaw

Quick Facts

Stars	340
Forks	26
Language	Python
Category	Codex Skill
License	Apache-2.0
Quality Score	53.296/100
Last Updated	2026-04-11
Created	2025-03-02
Platforms	python
Est. Tokens	~104k

Compatible Skills

These tools work well together with OpenClawProBench for enhanced workflows:

claw-eval — semantic(0.49)+complementary+rare_topics+same_lang+similar_pop+shared_platform (67%)
tau2-bench — semantic(0.24)+complementary+rare_topics+same_lang+similar_pop+shared_platform (63%)
MCPBench — semantic(0.31)+complementary+rare_topics+same_lang+similar_pop+shared_platform (60%)
ollama-benchmark — semantic(0.28)+complementary+rare_topics+same_lang+similar_pop+shared_platform (59%)
WildClawBench — semantic(0.38)+complementary+same_lang+similar_pop+shared_platform (58%)

More Codex Skill Tools

Explore other popular codex skill tools:

openclaw ⭐ 374.0k
hermes-agent ⭐ 162.8k
ui-ux-pro-max-skill ⭐ 57.6k
graphify ⭐ 51.5k
open-design ⭐ 49.7k
awesome-openclaw-skills ⭐ 49.2k
cherry-studio ⭐ 46.1k
siyuan ⭐ 44.1k
nanobot ⭐ 43.0k
system_prompts_leaks ⭐ 40.5k

View all Codex Skill tools →

Popular Python Agent Tools

AutoGPT ⭐ 184.5k · Agent Tool
hermes-agent ⭐ 162.8k · Codex Skill
langflow ⭐ 148.7k · Agent Tool
open-webui ⭐ 138.2k · MCP Server
skills ⭐ 137.5k · Claude Skill

Frequently Asked Questions

What is OpenClawProBench?

OpenClawProBench is OpenClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.. It is categorized as a Codex Skill with 340 GitHub stars.

What programming language is OpenClawProBench written in?

OpenClawProBench is primarily written in Python. It covers topics such as agent, benchmark, evaluation.

How do I install or use OpenClawProBench?

You can find installation instructions and usage details in the OpenClawProBench GitHub repository at github.com/suyoumo/OpenClawProBench. The project has 340 stars and 26 forks, indicating an active community.

What license does OpenClawProBench use?

OpenClawProBench is released under the Apache-2.0 license, making it free to use and modify according to the license terms.

View on GitHub → Browse Codex Skill tools