Skip to main content

Overview

The Vector Inference Platform servesis LLM'sa new service provided by the AI Engineering team, rolling out in early 2026. This new platform will be used to host large state-of-the-art language models, which anyone in the Vector community can use freely and aneasily.

HTTP

Unlike previous efforts to provide inference services on Vector's compute environment, this new platform will be a production-grade, always-available service. forUsers vectorwill researchersnot be expected to bring up their own models via Slurm jobs, or worry about time limits; these models will remain persistently online.

The source code and staff.advanced Githubtechnical source can be found at VectorInstitute/inference-platform

A new namedocumentation for this project isare pending. Some ideas for names below:

  • Quill (Query Inference for LLM's)
  • Quip (Query Inference Platform)
  • Inference Platform
  • Split (Serving Platform for Inferenceavailable on Text)
  • the
  • InferenceGithub Engine
  • page:
  • Inference Service

The current tech stack for this project:https://github.com/VectorInstitute/inference-platform

  • Kubernetes: On premises cluster
  • Flux: Application deployment
  • Kuberay: LLM serving
  • vLLM: LLM engine
  • Prometheus: Metric monitoring