tool-eval-bench

by SeraphimSerapis · Agent Tool · ★ 67

About tool-eval-bench

Tool-calling quality benchmark for LLM serving stacks. 65+ deterministic scenarios testing multi-turn orchestration, safety boundaries, and structured output. Supports vLLM, LiteLLM, and llama.cpp.

Quick Facts

Stars67
Forks7
LanguagePython
CategoryAgent Tool
LicenseMIT
Quality Score36.2/100
Last Updated2026-05-20
Created2026-04-17
Platformspython
Est. Tokens~99k

Compatible Skills

These tools work well together with tool-eval-bench for enhanced workflows:

  • llm-use — semantic(0.18)+complementary+same_lang+similar_pop+shared_platform (51%)
  • ContextPilot — semantic(0.18)+complementary+same_lang+similar_pop+shared_platform (51%)

More Agent Tool Tools

Explore other popular agent tool tools:

View all Agent Tool tools →

Popular Python Agent Tools

Frequently Asked Questions

What is tool-eval-bench?

tool-eval-bench is Tool-calling quality benchmark for LLM serving stacks. 65+ deterministic scenarios testing multi-turn orchestration, safety boundaries, and structured output. Supports vLLM, LiteLLM, and llama.cpp.. It is categorized as a Agent Tool with 67 GitHub stars.

What programming language is tool-eval-bench written in?

tool-eval-bench is primarily written in Python.

How do I install or use tool-eval-bench?

You can find installation instructions and usage details in the tool-eval-bench GitHub repository at github.com/SeraphimSerapis/tool-eval-bench. The project has 67 stars and 7 forks, indicating an active community.

What license does tool-eval-bench use?

tool-eval-bench is released under the MIT license, making it free to use and modify according to the license terms.

View on GitHub → Browse Agent Tool tools