vllm-mlx

by waybarrios · MCP Server · ★ 1.2k

About vllm-mlx

vLLM-MLX vLLM-like inference for Apple Silicon - GPU-accelerated Text, Image, Video & Audio on Mac Overview vllm-mlx brings native Apple Silicon GPU acceleration to vLLM by integrating: MLX: Apple's ML framework with unified memory and Metal kernels mlx-lm: Optimized LLM inference with KV cache and quantization mlx-vlm: Vision-language models for multimodal inference mlx-audio: Speech-to-Text and Text-to-Speech with native voices mlx-embeddings: Text embeddings for semantic search and RAG Features Multimodal - Text, Image, Video & Audio in one platform Native GPU acceleration on Apple Silicon...

anthropicapple-siliconaudio-processingclaude-codecomputer-visionimage-understandinginferencellmmachine-learningmacos

Quick Facts

Stars1,230
Forks173
LanguagePython
CategoryMCP Server
LicenseApache-2.0
Quality Score52.57/100
Open Issues59
Last Updated2026-05-21
Created2025-12-06
Platformsclaude-code, mcp, python
Est. Tokens~810k

Compatible Skills

These tools work well together with vllm-mlx for enhanced workflows:

  • mlx-omni-server — semantic(0.33)+complementary+shared_fw(openai)+rare_topics+same_lang+similar_pop+shared_platform (78%)
  • Toolio — semantic(0.29)+complementary+rare_topics+same_lang+similar_pop+shared_platform (64%)
  • claude-stt — semantic(0.24)+complementary+rare_topics+same_lang+similar_pop+shared_platform (62%)
  • mlx-llm — semantic(0.36)+complementary+rare_topics+same_lang+similar_pop+shared_platform (62%)
  • PyVision — semantic(0.23)+complementary+rare_topics+same_lang+similar_pop+shared_platform (62%)

More MCP Server Tools

Explore other popular mcp server tools:

View all MCP Server tools →

Popular Python Agent Tools

Frequently Asked Questions

What is vllm-mlx?

vllm-mlx is OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX bac. It is categorized as a MCP Server with 1.2k GitHub stars.

What programming language is vllm-mlx written in?

vllm-mlx is primarily written in Python. It covers topics such as anthropic, apple-silicon, audio-processing.

How do I install or use vllm-mlx?

You can find installation instructions and usage details in the vllm-mlx GitHub repository at github.com/waybarrios/vllm-mlx. The project has 1.2k stars and 173 forks, indicating an active community.

What license does vllm-mlx use?

vllm-mlx is released under the Apache-2.0 license, making it free to use and modify according to the license terms.

View on GitHub → Browse MCP Server tools