Skip to main content

Overview

The Vector Inference Platform is a new service provided by the AI Engineering team, rolling out in early 2026. This new platform will be used to host large state-of-the-art language models, which anyone in the Vector community can use freely and easily.

Unlike previous efforts to provide inference services on Vector's compute environment, this new platform will be a production-grade, always-available service. Users will not be expected to bring up their own models via Slurm jobs, or worry about time limits; these models will remain persistently online.

The source code and advanced technical documentation for this project are available on the Github page: https://github.com/VectorInstitute/inference-platform.

As of February 2026, 3 different models are already available:

The AI Engineering team will make more models available as this service matures and we get access to better hardware. We encourage any feedback or new model requests via our slack channel at #vector-inference-platform.