by SeraphimSerapis · Agent Tool · ★ 67
Tool-calling quality benchmark for LLM serving stacks. 65+ deterministic scenarios testing multi-turn orchestration, safety boundaries, and structured output. Supports vLLM, LiteLLM, and llama.cpp.
| Stars | 67 |
| Forks | 7 |
| Language | Python |
| Category | Agent Tool |
| License | MIT |
| Quality Score | 36.2/100 |
| Last Updated | 2026-05-20 |
| Created | 2026-04-17 |
| Platforms | python |
| Est. Tokens | ~99k |
These tools work well together with tool-eval-bench for enhanced workflows:
Explore other popular agent tool tools:
tool-eval-bench is Tool-calling quality benchmark for LLM serving stacks. 65+ deterministic scenarios testing multi-turn orchestration, safety boundaries, and structured output. Supports vLLM, LiteLLM, and llama.cpp.. It is categorized as a Agent Tool with 67 GitHub stars.
tool-eval-bench is primarily written in Python.
You can find installation instructions and usage details in the tool-eval-bench GitHub repository at github.com/SeraphimSerapis/tool-eval-bench. The project has 67 stars and 7 forks, indicating an active community.
tool-eval-bench is released under the MIT license, making it free to use and modify according to the license terms.