ClawProBench

by suyoumo · Codex Skill · ★ 688

About ClawProBench

ClawProBench Transparent live-first benchmark harness for evaluating model capability inside the OpenClaw runtime. 102 active scenarios, 162 catalog scenarios, deterministic grading, and OpenClaw-native coverage. ClawProBench focuses on real OpenClaw execution with deterministic grading, structured reports, and benchmark-profile selection. The default ranking path is the profile; broader active coverage remains available through , , , and . The current worktree inventory reports active scenarios and total catalog scenarios ( incubating) via and .

agentbenchmarkevaluationharnessleaderboardllmopenclaw

Quick Facts

Stars688
Forks50
LanguagePython
CategoryCodex Skill
LicenseApache-2.0
Quality Score53.296/100
Last Updated2026-05-19
Created2025-03-02
Platformspython
Est. Tokens~209k

More Codex Skill Tools

Explore other popular codex skill tools:

View all Codex Skill tools →

Popular Python Agent Tools

Frequently Asked Questions

What is ClawProBench?

ClawProBench is ClawProBench is a live-first benchmark harness for evaluating LLM agents in the OpenClaw runtime with deterministic grading and repeated-trial reliability.. It is categorized as a Codex Skill with 688 GitHub stars.

What programming language is ClawProBench written in?

ClawProBench is primarily written in Python. It covers topics such as agent, benchmark, evaluation.

How do I install or use ClawProBench?

You can find installation instructions and usage details in the ClawProBench GitHub repository at github.com/suyoumo/ClawProBench. The project has 688 stars and 50 forks, indicating an active community.

What license does ClawProBench use?

ClawProBench is released under the Apache-2.0 license, making it free to use and modify according to the license terms.

View on GitHub → Browse Codex Skill tools