by microsoft · Agent Tool · ★ 34
A11y LLM Evaluation Harness and Dataset This is a research project to evaluate how well various LLM models generate accessible HTML content. Problem LLMs currently generate code with accessibility bugs, resulting in blockers for people with disabilities and costly re-work and fixes downstream. Goal Create a public test suite which can be used to benchmark how well various LLMs generates accessible HTML code. Eventually, it could also be used to help train models to generate more accessible code by default.
| Stars | 34 |
| Forks | 5 |
| Language | Python |
| Category | Agent Tool |
| License | MIT |
| Quality Score | 50.948/100 |
| Last Updated | 2026-05-07 |
| Created | 2025-09-24 |
| Platforms | python |
| Est. Tokens | ~70k |
Explore other popular agent tool tools:
a11y-llm-eval is An eval tool to benchmark how well LLMs generate accessible HTML. It is categorized as a Agent Tool with 34 GitHub stars.
a11y-llm-eval is primarily written in Python.
You can find installation instructions and usage details in the a11y-llm-eval GitHub repository at github.com/microsoft/a11y-llm-eval. The project has 34 stars and 5 forks, indicating an active community.
a11y-llm-eval is released under the MIT license, making it free to use and modify according to the license terms.