Overview

The Vector Inference Platform ~~serves~~is ~~LLM's~~a new service provided by the AI Engineering team, rolling out in early 2026. This new platform will be used to host large state-of-the-art language models, which anyone in the Vector community can use freely and aneasily.

~~HTTP~~

Unlike previous efforts to provide inference services on Vector's compute environment, this new platform will be a production-grade, always-available service. ~~for~~Users ~~vector~~will ~~researchers~~not be expected to bring up their own models via Slurm jobs, or worry about time limits; these models will remain persistently online.

The source code and ~~staff.~~advanced ~~Github~~technical ~~source can be found at~~ ~~VectorInstitute/inference-platform~~

~~A new name~~documentation for this project isare ~~pending. Some ideas for names below:~~

~~Quill (~~Qu~~ery~~ I~~nference for~~ LL~~M's)~~

~~Quip (~~Q~~uery~~ I~~nference~~ P~~latform)~~

~~Inference Platform~~

~~Split (~~S~~erving~~ P~~latform for~~ I~~nference~~available on T~~ext)~~

the

~~Inference~~Github ~~Engine~~

page:

~~Inference Service~~

~~The current tech stack for this project:~~https://github.com/VectorInstitute/inference-platform

~~Kubernetes: On premises cluster~~

~~Flux: Application deployment~~

~~Kuberay: LLM serving~~

~~vLLM: LLM engine~~

~~Prometheus: Metric monitoring~~