Overview
The Vector Inference Platform servesis LLM'sa new service provided by the AI Engineering team, rolling out in early 2026. This new platform will be used to host large state-of-the-art language models, which anyone in the Vector community can use freely and aneasily.
Unlike previous efforts to provide inference services on Vector's compute environment, this new platform will be a production-grade, always-available service. forUsers vectorwill researchersnot be expected to bring up their own models via Slurm jobs, or worry about time limits; these models will remain persistently online.
The source code and staff.advanced Githubtechnical source can be found at VectorInstitute/inference-platform
A new namedocumentation for this project isare pending. Some ideas for names below:
Quill (QueryInference forLLM's)Quip (QueryInferencePlatform)Inference PlatformSplit (ServingPlatform forInferenceavailable onText)the InferenceGithubEnginepage: Inference Service
The current tech stack for this project:https://github.com/VectorInstitute/inference-platform
Kubernetes: On premises clusterFlux: Application deploymentKuberay: LLM servingvLLM: LLM enginePrometheus: Metric monitoring