Hardware Requirements
Atolio runs entirely inside your infrastructure. There is no Atolio-hosted SaaS layer. This page covers hardware sizing for on-premise, airgapped, and bring-your-own-Kubernetes (BYOK) deployments, where you supply the cluster and node hardware.
Atolio also supports managed deployments on AWS, Azure, and GCP, where Atolio’s Terraform provisions sized node groups for you. For per-cloud instance types, quotas, and networking detail, see the deployment guides for AWS, Azure, and GCP.
Overview
A deployment requires:
- General compute nodes for application services and the search database.
- At least one GPU node for the embedding service. Production requires GPU-backed embedding; CPU-only is for proof-of-concept or staging.
- Persistent block storage for Vespa content (index) nodes.
- S3-compatible object storage (MinIO, Ceph RGW, or any S3-compatible store) for assets such as user avatars and shared logs.
- An ingress gateway for HTTPS. Inbound connectivity is required for connectors that receive webhook updates.
Query-time LLM inference is deployed independently of the Atolio cluster. It can run entirely on-premise for fully sovereign deployments, on an OpenAI-compatible endpoint you operate, or via an external managed provider. Sizing the inference layer is outside the scope of this page. Only embedding requires an in-cluster GPU.
The major workloads:
| Workload | Role |
|---|---|
| Vespa | The core index. Scales horizontally for failover and replication. |
| GPU Embedder | Calculates dense vector embeddings at index and query time. |
| Core Services | Ingest pipeline, API, web UI, control plane. |
| Connectors | One microservice per source system; fetches documents, metadata, users, and permissions. |
| Apache Tika | Extracts text from PDFs, Office documents, and similar files. |
Three things drive sizing: active connector count, indexed corpus size, and concurrent users. The figures below are baselines. Atolio’s deployment team will check them against your expected scale before you provision.
Pod-level Resource Requirements
This is the canonical sizing view. Use it to size worker nodes in your Kubernetes or OpenShift cluster.
| Service | Quantity | CPU Cores | Min Memory | Disk | GPU |
|---|---|---|---|---|---|
| Vespa Content | 2+ | 8 | 64 GB | 500+ GB stateful PVC / 30 GB stateless | No |
| Vespa Admin | 3 | 2 | 6 GB | 30 GB stateful PVC / 60 GB stateless | No |
| Vespa Container | 2+ | 8 | 16 GB | 30 GB stateless | No |
| Tika | 1 | 16 | 32 GB | 20 GB stateless | No |
| Embedder | 1 | 4 | 16 GB | 20 GB stateless | Yes |
| Connectors × N | 1+ | 4 | 4 GB | 20 GB stateless | No |
| System Services | 3 | 4 | 16 GB | 20 GB stateless | No |
The 2 Vespa content pods and 3 Vespa admin pods must each run on physically separate machines (admin pods can be co-located with other services). The remaining services can share nodes wherever capacity exists.
Cluster Requirements
Atolio runs on any conformant Kubernetes 1.33+ cluster, including airgapped environments. OpenShift (including ROSA) is fully supported and tested; vanilla Kubernetes, Rancher/RKE2, and EKS-Anywhere style installs work as well.
BYOK fits when data-residency or compliance rules require infrastructure inside your perimeter, when the cluster is airgapped, when you already operate a hardened Kubernetes platform, or when the LLM, embedder, index, and connectors must all run on customer-owned hardware.
What Atolio provides
- Helm charts:
atolio-db(Vespa) andatolio-svc(Lumen application services). - Container images suitable for an internal or airgapped registry.
- Deployment runbooks and direct support during install.
What you provide
- A Kubernetes or OpenShift cluster sized per the Pod-level Resource Requirements, with the role split below.
- A CSI block storage class for Vespa PVCs.
- An S3-compatible object store (MinIO, Ceph RGW, or any S3-compatible store).
- A reachable container registry plus credentials. For airgapped installs, your internal mirror.
- An ingress controller and TLS certificate (or cert-manager).
- An OIDC provider for SSO. Atolio supports Okta, Microsoft Entra ID, Google, and Keycloak.
Node role split
| Role | Min nodes | Per-node target | Notes |
|---|---|---|---|
| Vespa Content | 2 | 8 vCPU / 64 GB RAM / 500 GB SSD | Physically separate machines. Stateful; PVCs must survive node loss. |
| Vespa Admin | 3 | 2 vCPU / 8 GB RAM | Physically separate machines. Hosts the Vespa configserver (~30 GB on the OS disk). |
| Vespa Container | 1+ | 8 vCPU / 16 GB RAM | Stateless query/feed dispatch. Scale horizontally with query load. |
| Lumen Services | 2+ | 4 vCPU / 16 GB RAM | Stateless. Hosts core services, Tika, and connector pods. |
| Embedding Services | 1+ | 4 vCPU / 16+ GB RAM + 1× GPU (16 GB+) | x86_64. NVIDIA T4, A4000, A4500, A10, A10G, or L4 with 16–24 GB VRAM. |
x86_64 is the typical on-prem architecture. ARM64 images are also published and supported.
Airgapped deployments
In airgapped mode, all components run inside your perimeter, with no outbound traffic at runtime. Atolio’s deployment team can supply an offline image bundle and matching runbook on request.
Storage
| Volume | Default Size | Notes |
|---|---|---|
| Vespa content data disk (per node) | 500 GB | Indexed documents and embedding vectors. Raise for large corpora or higher-dimensional models. |
| Vespa content / container node OS disk | 30 GB | OS and local Vespa logs. |
| Vespa admin node OS disk | 60 GB | Accommodates the Vespa configserver (up to ~30 GB). |
| Lumen services / embedding node OS disk | 20 GB | Standard OS disk. |
| Object storage bucket | n/a | Two buckets per deployment (shared and private). Any S3-compatible store. No fixed quota. |
Encryption-at-rest is provided by your CSI driver / storage class. Mark Vespa content and admin StorageClasses and PVCs as Retain so their data survives chart redeploys.
GPU Requirements
The embedder is the only in-cluster service that requires a GPU. It generates dense vector embeddings for indexed content; running on GPU keeps ingest throughput up during large connector backfills.
- GPU: NVIDIA T4 with 16 GB VRAM is the baseline. Higher-end accelerators (A4000, A4500, A10, A10G, L4) with 16–24 GB VRAM are also supported.
- Driver: install the NVIDIA GPU operator (or the OpenShift NVIDIA GPU operator) so GPU nodes expose the
nvidia.com/gpuresource. - Query-time inference: hosted independently (on-premise or external). See Overview.
For proof-of-concept or staging clusters where CPU-only embedding is acceptable, set embedder.useGPU=false in the Lumen Helm values and omit the GPU node pool.
Networking
Atolio runs in a single VPC/VNet/VLAN with public and private subnets across multiple failure domains.
- CIDR: any non-overlapping
/20+ range works. - Subnets: at least
/22(1,024 IPs) per private subnet to avoid IPv4 exhaustion as pod counts grow. - Failure domains: at least three, for HA across racks, hypervisors, or zones.
- Ingress: any ingress controller works (NGINX, HAProxy, OpenShift Router, F5 BIG-IP). Inbound HTTPS is required for webhook-driven connectors.
- Egress: outbound HTTPS to source-system APIs, the LLM endpoint, and the container registry. Airgapped deployments host the LLM and registry internally.
Scaling Guidance
Sizing increases with:
- Document count and size: more or larger documents (e.g. PDFs, design files) → more Vespa content nodes or higher per-node memory.
- Embedding dimensionality: higher-dimensional models pressure Vespa content memory. Plan on 128 GB+ per content node for high-dimensional models.
- Concurrent users / query rate: scale Vespa container nodes and Lumen services horizontally.
- Connector throughput: heavy connectors (Slack, GitHub at scale, large Drive estates) increase ingest CPU and GPU embedding throughput.
- HA posture: for stricter SLAs, run 3+ Vespa content nodes and 3+ Lumen services pods.
For larger deployments, Atolio’s deployment team can run a pre-deployment content scan against an expected connector to estimate document inventory and recommend node count, memory tier, and disk size before provisioning.
Customizing the Defaults
Every node count and disk size is exposed as a Helm value in the Vespa and Lumen charts. Commonly tuned settings cover:
- Lumen services replica count and resource requests.
- Vespa content node count and content disk size.
- Vespa container node count.
- Embedding services replica count, GPU toggle, and resource requests.
- OpenShift-specific knobs: security context constraints, image pull secrets, storage class overrides.
If you need a configuration outside the documented defaults (for example, an internal procurement standard, a specific Kubernetes distribution, or an unusual storage driver), share the target hardware and platform with your Atolio support team so they can confirm compatibility: architecture, base image, GPU drivers, and CSI driver.