by waybarrios · MCP Server · ★ 1.2k
vLLM-MLX vLLM-like inference for Apple Silicon - GPU-accelerated Text, Image, Video & Audio on Mac Overview vllm-mlx brings native Apple Silicon GPU acceleration to vLLM by integrating: MLX: Apple's ML framework with unified memory and Metal kernels mlx-lm: Optimized LLM inference with KV cache and quantization mlx-vlm: Vision-language models for multimodal inference mlx-audio: Speech-to-Text and Text-to-Speech with native voices mlx-embeddings: Text embeddings for semantic search and RAG Features Multimodal - Text, Image, Video & Audio in one platform Native GPU acceleration on Apple Silicon...
| Stars | 1,230 |
| Forks | 173 |
| Language | Python |
| Category | MCP Server |
| License | Apache-2.0 |
| Quality Score | 52.57/100 |
| Open Issues | 59 |
| Last Updated | 2026-05-21 |
| Created | 2025-12-06 |
| Platforms | claude-code, mcp, python |
| Est. Tokens | ~810k |
These tools work well together with vllm-mlx for enhanced workflows:
Explore other popular mcp server tools:
vllm-mlx is OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX bac. It is categorized as a MCP Server with 1.2k GitHub stars.
vllm-mlx is primarily written in Python. It covers topics such as anthropic, apple-silicon, audio-processing.
You can find installation instructions and usage details in the vllm-mlx GitHub repository at github.com/waybarrios/vllm-mlx. The project has 1.2k stars and 173 forks, indicating an active community.
vllm-mlx is released under the Apache-2.0 license, making it free to use and modify according to the license terms.