Overview
The Vector Inference Platform serves LLM's and an HTTP service for vector researchers and staff. Github source can be found at VectorInstitute/inference-platform
A new name for this project is pending. Some ideas for names below:
- Quill (Query Inference for LLM's)
- Quip (Query Inference Platform)
- Inference Platform
- Split (Serving Platform for Inference on Text)
- Inference Engine
- Inference Service
The current tech stack for this project:
- Kubernetes: On premises cluster
- Flux: Application deployment
- Kuberay: LLM serving
- vLLM: LLM engine
- Prometheus: Metric monitoring