Overview
The Vector Inference Platform is a new service provided by the AI Engineering team, rolling out in early 2026. This new platform will be used to host large state-of-the-art language models, which anyone in the Vector community can use freely and easily.
Unlike previous efforts to provide inference services on Vector's compute environment, this new platform will be a production-grade, always-available service. Users will not be expected to bring up their own models via Slurm jobs, or worry about time limits; these models will remain persistently online.
The source code and advanced technical documentation for this project are available on the Github page: https://github.com/VectorInstitute/inference-platform