vllm-mlx

About vllm-mlx

vLLM-MLX vLLM-like inference for Apple Silicon - GPU-accelerated Text, Image, Video & Audio on Mac Overview vllm-mlx brings native Apple Silicon GPU acceleration to vLLM by integrating: MLX: Apple's ML framework with unified memory and Metal kernels mlx-lm: Optimized LLM inference with KV cache and quantization mlx-vlm: Vision-language models for multimodal inference mlx-audio: Speech-to-Text and Text-to-Speech with native voices mlx-embeddings: Text embeddings for semantic search and RAG Features Multimodal - Text, Image, Video & Audio in one platform Native GPU acceleration on Apple Silicon...

anthropic apple-silicon audio-processing claude-code computer-vision image-understanding inference llm machine-learning macos

Quick Facts

Stars	1,230
Forks	173
Language	Python
Category	MCP Server
License	Apache-2.0
Quality Score	52.57/100
Open Issues	59
Last Updated	2026-05-21
Created	2025-12-06
Platforms	claude-code, mcp, python
Est. Tokens	~810k

Compatible Skills

These tools work well together with vllm-mlx for enhanced workflows:

mlx-omni-server — semantic(0.33)+complementary+shared_fw(openai)+rare_topics+same_lang+similar_pop+shared_platform (78%)
Toolio — semantic(0.29)+complementary+rare_topics+same_lang+similar_pop+shared_platform (64%)
claude-stt — semantic(0.24)+complementary+rare_topics+same_lang+similar_pop+shared_platform (62%)
mlx-llm — semantic(0.36)+complementary+rare_topics+same_lang+similar_pop+shared_platform (62%)
PyVision — semantic(0.23)+complementary+rare_topics+same_lang+similar_pop+shared_platform (62%)

More MCP Server Tools

Explore other popular mcp server tools:

n8n ⭐ 189.3k
ECC ⭐ 187.4k
everything-claude-code ⭐ 186.6k
dify ⭐ 142.3k
open-webui ⭐ 138.2k
gemini-cli ⭐ 104.5k
awesome-mcp-servers ⭐ 86.1k
servers ⭐ 86.1k
ragflow ⭐ 79.0k
lobehub ⭐ 77.5k

View all MCP Server tools →

Popular Python Agent Tools

AutoGPT ⭐ 184.5k · Agent Tool
hermes-agent ⭐ 162.8k · Codex Skill
langflow ⭐ 148.7k · Agent Tool
open-webui ⭐ 138.2k · MCP Server
skills ⭐ 137.5k · Claude Skill

Frequently Asked Questions

What is vllm-mlx?

vllm-mlx is OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX bac. It is categorized as a MCP Server with 1.2k GitHub stars.

What programming language is vllm-mlx written in?

vllm-mlx is primarily written in Python. It covers topics such as anthropic, apple-silicon, audio-processing.

How do I install or use vllm-mlx?

You can find installation instructions and usage details in the vllm-mlx GitHub repository at github.com/waybarrios/vllm-mlx. The project has 1.2k stars and 173 forks, indicating an active community.

What license does vllm-mlx use?

vllm-mlx is released under the Apache-2.0 license, making it free to use and modify according to the license terms.

View on GitHub → Browse MCP Server tools