Skip to main content

Overview

The Vector Inference Platform serves LLM's and an HTTP service for vector researchers and staff. Github source can be found at VectorInstitute/inference-platform

A new name for this project is pending. Some ideas for names below:

  • Quill (Query Inference for LLM's)
  • Quip (Query Inference Platform)
  • Inference Platform
  • Split (Serving Platform for Inference on Text)
  • Inference Engine
  • Inference Service

The current tech stack for this project:

  • Kubernetes: On premises cluster
  • Flux: Application deployment
  • Kuberay: LLM serving
  • vLLM: LLM engine
  • Prometheus: Metric monitoring