by Agent-Mem-Tools · Agent Tool · ★ 29
agent-eval-ts Evaluation framework for TypeScript AI agents — define suites, run batch evaluations, and report accuracy, latency, cost, and more. What is this? helps you measure and compare AI agent behavior: exact output checks, semantic similarity (bag-of-words cosine by default, or your own embeddings), JSON Schema validation, tool-call sequences, latency, token usage, and cost logging. It runs locally, produces JSON / Markdown / HTML / JUnit reports, supports optional LLM-as-judge (OpenAI-compatible), caching, multi-model comparison, and regression detection against a saved baseline.
| Stars | 29 |
| Forks | 192 |
| Language | TypeScript |
| Category | Agent Tool |
| License | MIT |
| Quality Score | 63.9/100 |
| Last Updated | 2026-05-21 |
| Created | 2026-04-10 |
| Platforms | docker, node |
| Est. Tokens | ~7k |
These tools work well together with agent-eval-ts for enhanced workflows:
Explore other popular agent tool tools:
agent-eval-ts is Agent evaluation & benchmarking for TypeScript: test suites, LLM metrics, caching, OpenAI-compatible judge, JUnit/HTML/MD reports, Docker, GitHub Actions.. It is categorized as a Agent Tool with 29 GitHub stars.
agent-eval-ts is primarily written in TypeScript. It covers topics such as ai-agents, benchmarking, evaluation.
You can find installation instructions and usage details in the agent-eval-ts GitHub repository at github.com/Agent-Mem-Tools/agent-eval-ts. The project has 29 stars and 192 forks, indicating an active community.
agent-eval-ts is released under the MIT license, making it free to use and modify according to the license terms.