A
Today's approaches to serving models are static, resource-blind, and AI-unaware, preventing easy observability and scaling of AI!

Introduction to llm-d

An open-source, Kubernetes-native framework for distributed LLM inference

Introduction to llm-d