Hardware Requirements

Cluster-level pod sizing, node defaults, GPU, storage, and networking requirements for an on-premise, airgapped, or bring-your-own-Kubernetes (BYOK) Atolio deployment.

Atolio runs entirely inside your infrastructure. There is no Atolio-hosted SaaS layer. This page covers hardware sizing for on-premise, airgapped, and bring-your-own-Kubernetes (BYOK) deployments, where you supply the cluster and node hardware.

Atolio also supports managed deployments on AWS, Azure, and GCP, where Atolio’s Terraform provisions sized node groups for you. For per-cloud instance types, quotas, and networking detail, see the deployment guides for AWS, Azure, and GCP.

Overview

A deployment requires:

General compute nodes for application services and the search database.
At least one GPU node for the embedding service. Production requires GPU-backed embedding; CPU-only is for proof-of-concept or staging.
Persistent block storage for Vespa content (index) nodes.
S3-compatible object storage (MinIO, Ceph RGW, or any S3-compatible store) for assets such as user avatars and shared logs.
An ingress gateway for HTTPS. Inbound connectivity is required for connectors that receive webhook updates.

Query-time LLM inference is deployed independently of the Atolio cluster. It can run entirely on-premise for fully sovereign deployments, on an OpenAI-compatible endpoint you operate, or via an external managed provider. Sizing the inference layer is outside the scope of this page. Only embedding requires an in-cluster GPU.

The major workloads:

Workload	Role
Vespa	The core index. Scales horizontally for failover and replication.
GPU Embedder	Calculates dense vector embeddings at index and query time.
Core Services	Ingest pipeline, API, web UI, control plane.
Connectors	One microservice per source system; fetches documents, metadata, users, and permissions.
Apache Tika	Extracts text from PDFs, Office documents, and similar files.

Three things drive sizing: active connector count, indexed corpus size, and concurrent users. The figures below are baselines. Atolio’s deployment team will check them against your expected scale before you provision.

Pod-level Resource Requirements

This is the canonical sizing view. Use it to size worker nodes in your Kubernetes or OpenShift cluster.

Service	Quantity	CPU Cores	Min Memory	Disk	GPU
Vespa Content	2+	8	64 GB	500+ GB stateful PVC / 30 GB stateless	No
Vespa Admin	3	2	6 GB	30 GB stateful PVC / 60 GB stateless	No
Vespa Container	2+	8	16 GB	30 GB stateless	No
Tika	1	16	32 GB	20 GB stateless	No
Embedder	1	4	16 GB	20 GB stateless	Yes
Connectors × N	1+	4	4 GB	20 GB stateless	No
System Services	3	4	16 GB	20 GB stateless	No

The 2 Vespa content pods and 3 Vespa admin pods must each run on physically separate machines (admin pods can be co-located with other services). The remaining services can share nodes wherever capacity exists.

Cluster Requirements

Atolio runs on any conformant Kubernetes 1.33+ cluster, including airgapped environments. OpenShift (including ROSA) is fully supported and tested; vanilla Kubernetes, Rancher/RKE2, and EKS-Anywhere style installs work as well.

BYOK fits when data-residency or compliance rules require infrastructure inside your perimeter, when the cluster is airgapped, when you already operate a hardened Kubernetes platform, or when the LLM, embedder, index, and connectors must all run on customer-owned hardware.

What Atolio provides

Helm charts: atolio-db (Vespa) and atolio-svc (Lumen application services).
Container images suitable for an internal or airgapped registry.
Deployment runbooks and direct support during install.

What you provide

A Kubernetes or OpenShift cluster sized per the Pod-level Resource Requirements, with the role split below.
A CSI block storage class for Vespa PVCs.
An S3-compatible object store (MinIO, Ceph RGW, or any S3-compatible store).
A reachable container registry plus credentials. For airgapped installs, your internal mirror.
An ingress controller and TLS certificate (or cert-manager).
An OIDC provider for SSO. Atolio supports Okta, Microsoft Entra ID, Google, and Keycloak.

Node role split

Role	Min nodes	Per-node target	Notes
Vespa Content	2	8 vCPU / 64 GB RAM / 500 GB SSD	Physically separate machines. Stateful; PVCs must survive node loss.
Vespa Admin	3	2 vCPU / 8 GB RAM	Physically separate machines. Hosts the Vespa configserver (~30 GB on the OS disk).
Vespa Container	1+	8 vCPU / 16 GB RAM	Stateless query/feed dispatch. Scale horizontally with query load.
Lumen Services	2+	4 vCPU / 16 GB RAM	Stateless. Hosts core services, Tika, and connector pods.
Embedding Services	1+	4 vCPU / 16+ GB RAM + 1× GPU (16 GB+)	x86_64. NVIDIA T4, A4000, A4500, A10, A10G, or L4 with 16–24 GB VRAM.

x86_64 is the typical on-prem architecture. ARM64 images are also published and supported.

Airgapped deployments

In airgapped mode, all components run inside your perimeter, with no outbound traffic at runtime. Atolio’s deployment team can supply an offline image bundle and matching runbook on request.

Storage

Volume	Default Size	Notes
Vespa content data disk (per node)	500 GB	Indexed documents and embedding vectors. Raise for large corpora or higher-dimensional models.
Vespa content / container node OS disk	30 GB	OS and local Vespa logs.
Vespa admin node OS disk	60 GB	Accommodates the Vespa configserver (up to ~30 GB).
Lumen services / embedding node OS disk	20 GB	Standard OS disk.
Object storage bucket	n/a	Two buckets per deployment (shared and private). Any S3-compatible store. No fixed quota.

Encryption-at-rest is provided by your CSI driver / storage class. Mark Vespa content and admin StorageClasses and PVCs as Retain so their data survives chart redeploys.

GPU Requirements

The embedder is the only in-cluster service that requires a GPU. It generates dense vector embeddings for indexed content; running on GPU keeps ingest throughput up during large connector backfills.

GPU: NVIDIA T4 with 16 GB VRAM is the baseline. Higher-end accelerators (A4000, A4500, A10, A10G, L4) with 16–24 GB VRAM are also supported.
Driver: install the NVIDIA GPU operator (or the OpenShift NVIDIA GPU operator) so GPU nodes expose the nvidia.com/gpu resource.
Query-time inference: hosted independently (on-premise or external). See Overview.

For proof-of-concept or staging clusters where CPU-only embedding is acceptable, set embedder.useGPU=false in the Lumen Helm values and omit the GPU node pool.

Networking

Atolio runs in a single VPC/VNet/VLAN with public and private subnets across multiple failure domains.

CIDR: any non-overlapping /20+ range works.
Subnets: at least /22 (1,024 IPs) per private subnet to avoid IPv4 exhaustion as pod counts grow.
Failure domains: at least three, for HA across racks, hypervisors, or zones.
Ingress: any ingress controller works (NGINX, HAProxy, OpenShift Router, F5 BIG-IP). Inbound HTTPS is required for webhook-driven connectors.
Egress: outbound HTTPS to source-system APIs, the LLM endpoint, and the container registry. Airgapped deployments host the LLM and registry internally.

Scaling Guidance

Sizing increases with:

Document count and size: more or larger documents (e.g. PDFs, design files) → more Vespa content nodes or higher per-node memory.
Embedding dimensionality: higher-dimensional models pressure Vespa content memory. Plan on 128 GB+ per content node for high-dimensional models.
Concurrent users / query rate: scale Vespa container nodes and Lumen services horizontally.
Connector throughput: heavy connectors (Slack, GitHub at scale, large Drive estates) increase ingest CPU and GPU embedding throughput.
HA posture: for stricter SLAs, run 3+ Vespa content nodes and 3+ Lumen services pods.

For larger deployments, Atolio’s deployment team can run a pre-deployment content scan against an expected connector to estimate document inventory and recommend node count, memory tier, and disk size before provisioning.

Customizing the Defaults

Every node count and disk size is exposed as a Helm value in the Vespa and Lumen charts. Commonly tuned settings cover:

Lumen services replica count and resource requests.
Vespa content node count and content disk size.
Vespa container node count.
Embedding services replica count, GPU toggle, and resource requests.
OpenShift-specific knobs: security context constraints, image pull secrets, storage class overrides.

If you need a configuration outside the documented defaults (for example, an internal procurement standard, a specific Kubernetes distribution, or an unusual storage driver), share the target hardware and platform with your Atolio support team so they can confirm compatibility: architecture, base image, GPU drivers, and CSI driver.