Overview
The Vector Inference Platform serves LLM's and an HTTP service for vector researchers and staff.
A new name for this project is pending. Some ideas for names below:
- Quill (Query Inference for LLM's)
- Quip (Query Inference Platform)
- Inference Platform
- Split (Serving Platform for Inference on Text)
- Inference Engine
- Inference Service
The current tech stack for this project:
- Kubernetes: On premises cluster
- Flux: Application deployment
- Kuberay: LLM serving
- vLLM: LLM engine
- Prometheus: Metric monitoring