Overview

The Vector Inference Platform serves LLM's and an HTTP service for vector researchers and staff. Github source can be found at VectorInstitute/inference-platform

A new name for this project is pending. Some ideas for names below:

Quill (Query Inference for LLM's)
Quip (Query Inference Platform)
Inference Platform
Split (Serving Platform for Inference on Text)
Inference Engine
Inference Service

The current tech stack for this project:

Kubernetes: On premises cluster
Flux: Application deployment
Kuberay: LLM serving
vLLM: LLM engine
Prometheus: Metric monitoring